Sunday, November 2, 2014

Multiplexed IO

Programs typically services a few file descriptors concurrently (e.g. terminal, file, pipe to communicate with other programs).  Without threading, each process can only wait on 1 fd at a time.  One solution is to use non-blocking IO but this is not efficient.

Multiplexed IO allows the process to block on multiple fd.  The process will sleep and be notified when any one of them could be read or write without blocking.

select()
Introduced by 4.2BSD, it uses 3 fd lists - readfds, writefds and exceptfds.  fd in the readfds list are those that are waiting for data to read without blocking. Likewise for writefds.  Exceptfds is to monitor for exception of out-of-band data (applicable to socket only) is available.  Upon return, each set is modified to contain only the fd that is ready for the specific IO type.  The fds list is really a bit mask.  Thus select() has a parameter to indicate the highest fd in the list (so it can calculate how long the bit mask should be).

select() can pass in a timeout structure which specifies the time for select to return even if there is no ready fd by then.  Setting the time zero causes select() to return immediately with any fd that are ready.

fd lists are build using macro:

fd_set writefds
FD_ZERO(&writefds) - remove all fds
FD_SET(fd, &writefds) - add a fd, e.g. FD_SET(STDIN_FILENO, &readfds)
FD_CLR(fd, &writefds) - remove a fd
FD_ISSET(fd, &writefds) - test if a fd is set, e.g. if (FD_ISSET(STDIN_FILENO, &readfds)) {...}

POSIX defines its won equivalent to select(), called pselect().  The call signature is slightly different but it uses the same macro to build and test the fd lists.

Difference between pselect() and select()
(1) pselect() uses a different timeout structure - timespec.  Timespec uses second and nanosecond while timeval (used in select) uses second and microsec.  However, neither call is accurate even on microsecond level.

(2) When select returns, the value in timeval is undefined.  So it must be reinitialized before the next select call.  pselect does not modify timespec and so no need to reinitialize in successive calls.

(3) pselect has an additional parameter called sigmask, which uses to address a race condition between waiting on the file descriptor and signal.  for example, a process checks for a flag to be set by the signal handle before issue the select() call.  If the signal arrives (which wakes up the handler to se the flag) after the check and before the select() call, the process may be blocked indefinitely and never response to the set flag.  segmask resolve this issue by providing a list of signal for pselect to block.  When the signal is block, it will not be handled by the handler until it is unblocked at end of the pselect().  In other words, the signmask serialize the pselect() and the signal handler.

poll()
System V introduced the poll() which addresses some deficiencies of select().  Instead of 3 fd list, pool uses an array of pollfd structure.

struct pollfd {
    int fd; /* fd */
    short events; /* events to watch */
    short revents; /* return the events observed */
};

Events to watch are
POLLIN - data to read
POLLOUT - write will not block

POLLRDNORM - normal data to read = POLLIN
POLLRDBAND - priority data to read (socket?)
POLLPRI - urgent data to read

POLLWRNORM - writing normal data will not block = POLLOUT
POLLWRBAND
POLLWR

POLLMSG - a SIGPOLL message is available

Return events include the above and the following too:
POLLER - fd has encountered error
POLLHUP - fd has hang up
POLLINVAL - fd is invalid

In comparison,

POLLIN | POLLPRI = select() read event
POLLOUT | POLLWRBAND = select() write event


poll() uses a timeout parameter up to milli-second precision.  A zero value makes poll() return immediately.  A negative value makes poll() to wait indefinitely until an event is observered.  An example to use poll

struct pollfd fds[2];
fds[0].fd = STDIN_FILENO;
fds[1].fd = STDOUT_FILENO;

fds[0].events = POLLIN;
fds[1].events = POLLOUT;

ret = poll(fds, 2, 5*1000);
if (fds[0].revents & POLLIN) {...};

Linux also offer ppoll() that is similar to pselect() with a timespec and sigmask. ppoll is not POSIX but a Linux specific call.

No comments: