Shell 2: Race conditions and process communication
Interrupts, polling & blocking
In the context of the kernel, we mostly discussed interrupts (and exceptional control transfer) as a mechanism for enforcing process isolation. This goal doesn’t apply within a single process. But interrupts have other advantages, including performance advantages. For instance, interrupts let a kernel handle hardware events with low latency and low CPU overhead.
Imagine an operating system that’s running two processes, a process that’s actively computing and a process that’s waiting for a disk read to complete. We want the waiting process to wake up as soon as the data is available. How can this be done without interrupts?
- The actively computing process could return control to the kernel every 1ms, so the kernel could check the disk status. But this is relatively high latency—the reading process will have to wait 0.5ms on average to discover the data.
- This latency can be reduced by polling more often. The actively computing process could return control to the kernel every 10µs! But this is high overhead: all the protected control transfers required to transfer control into and out of the kernel take time, and will reduce the work the computing process can complete.
Interrupts, in contrast, let us get both low latency and low overhead. The disk interrupts the CPU when the read completes—even if the computing process is actively computing. This interrupt is processed immediately, and the kernel can, if it likes, run the disk-reading process. This gives low latency and no polling.
Signals
The operating system kernel’s goal is to provide safe abstractions for all useful hardware features. This includes interrupts, and signals are the C abstract machine’s abstraction for hardware interrupts.
A signal is an interrupt that is delivered to a process. Processes can
install signal handlers that “catch” the signal. A signal handler is
just a function—think exception
in WeensyOS—but the operating system
might call it at any time, including while the process is actively
computing, or while the process is blocked in a system call!
Signal handlers are installed with the sigaction
system call. We
provide a wrapper, set_signal_handler
, to make dealing with
sigaction
easier. Using sigaction
looks like this:
void signal_handler(int signal_number) {
// ...
}
int main() {
set_signal_handler(SIGCHLD, signal_handler);
` // Now the kernel will call `signal_handler` every time one of `
// this process’s children dies.
But signal handlers run in a really, really weird environment. The
kernel might call a handler at any moment, including right in the middle
of a malloc()
call that’s modifying global state! Most application
code, and most C library functions, aren’t written to survive being
called from signal handlers (the technical term is that they are not
reentrant). There is a specific list of “async-signal-safe” library
functions you may call from within a signal handler. Most of those
functions are wrappers for system calls; for instance, it is safe to
call read
or write
. (But it is not safe to call printf
!)
On Linux, you can see this list with man 7 signal
; on Mac, man 2 sigaction
.
For this reason, most signal handlers do very little work. They might
set a flag (a global variable of type volatile sig_atomic_t
), or
modify a global variable, or write something to a file descriptor.
The operating system uses signals to represent any asynchronous, unpredictable event a process might be interested in. Different signals are assigned different mnemonic names. Some of them are:
- SIGCHLD: a child died.
- SIGHUP: the terminal on which this process was running has closed.
- SIGINT: the user pressed Ctrl-C.
- SIGSEGV: a page fault happened that the kernel could not handle.
- SIGWINCH: the terminal window size changed.
- SIGFPE: divide by zero.
A signal has a default disposition, which is what happens to the
program if the signal is received and without a handler. Often the
signal will be ignored, as with SIGWINCH and SIGCHLD. Sometimes the
program will terminate, as with SIGSEGV. There are two signals that
cannot be handled; the one that matters most is SIGKILL, invoked as
kill -9
, which always terminates a process.
Signals and blocking system calls
Signals abort many long-running system calls. For instance, if a signal
is received during a blocking read
, the read
will return -1 with
error code EINTR, meaning “interrupted by signal.” The specific list of
interruptible system calls is given on Linux in the man 7 signal
page.
Signal interruption can be a pain, requiring loops like this to robustly read data:
ssize_t robust_read(int fd, char* buf, size_t amt) {
size_t pos = 0;
while (pos < amt) {
size_t nr = read(fd, buf + pos, amt - pos);
if (nr > 0) {
pos += nr;
} else if (nr == 0) {
break;
} else if (nr == -1 && errno == EINTR) {
/* just try again; signals are not errors! */
} else if (nr == -1 && pos == 0) {
return pos ? pos : -1;
}
}
return pos;
}
But sometimes signal interruption is critically important! It’s signal interruption that makes signals analogous to hardware interrupts, which in turn lets process code handle asynchromnous events with low latency and low overhead.
It looks like we can use signals to solve our blocking problem! Here’s
how waitblock.c
works:
set_signal_handler(SIGCHLD, signal_handler);
pid_t p1 = fork();
if (p1 == 0) {
// ...
exit(0);
}
usleep(750000); // will be interrupted by SIGCHLD!
This works most of the time…but, sadly, not all the time.
Race conditions
A race condition is a bug that only manifests sometimes, depending on how events are scheduled. Race conditions are terrible to debug, because they can happen so rarely that they’re nearly impossible to track down.
And sadly, there’s a race condition in waitblock.c
. We can see that
with a loop like this:
` for i in `seq 1 100000`; do ./waitblock; done `
This should print nothing, but every now and then—on my VM, every couple thousand times—it will print “Child exited after 0.75+ sec”, even though the child exited immediately!
The race condition arises because signals are delivered asynchronously, and because the operating system makes no guarantees about the order in which processes are scheduled. And if the signal is delivered right here:
set_signal_handler(SIGCHLD, signal_handler);
pid_t p1 = fork();
if (p1 == 0) {
// ...
exit(0);
}
` /* SIGNAL DELIVERED HERE, BEFORE `usleep` CALLED!! */ `
usleep(750000);
then the usleep
system call will not be interrupted!
Can we solve this race condition by changing our signal handler? Well, maybe if the signal handler sets a flag when it is called…
static volatile sig_atomic_t flag = 0;
void signal_handler(int signo) {
flag = 1;
}
int main(...) { ...
set_signal_handler(SIGCHLD, signal_handler);
pid_t p1 = fork();
if (p1 == 0) {
// ...
exit(0);
}
if (!flag) {
usleep(750000);
}
// ...
And this does, indeed, make the race condition less likely. But it does not fully solve the problem. In particular, the signal could be delivered here:
set_signal_handler(SIGCHLD, signal_handler);
pid_t p1 = fork();
if (p1 == 0) {
// ...
exit(0);
}
if (!flag) {
/* SIGNAL DELIVERED HERE!!! */
usleep(750000);
}
which recreates the problem we were trying to solve.
Suspenseful change of topic
Interprocess communication and pipes
Interprocess communication is what it sounds like: operating system mechanisms that allow processes to communicate. Interprocess communication punches holes in process isolation, but that’s OK: recall that process isolation means processes are isolated according to OS policy.
Processes can of course interact over the file system, but the operating system also offers special IPC mechanisms. The most important in Unix, and for your pset, is the pipe.
A pipe is a buffered stream abstraction for IPC. A pipe is a stream because it’s not seekable. It’s buffered because the operating system reserves some space in the buffer cache for piped data. Concretely, a pipe is a pair of file descriptors, one of which is open for writing and the other of which is open for reading, where data written to the write end of the pipe may be read from the read end of the pipe.
Pipes are created in Unix using the pipe
system call, which works
like this:
int pipefd[2];
int r = pipe(pipefd);
// Aftewards, if r == 0:
// pipefd[0] >= 0 is the file descriptor for the read end of the pipe
// pipefd[1] >= 0 is the file descriptor for the write end of the pipe
Initially, every pipe is a self-pipe: only the current process has
access to the pipe. The pipedemo.c
program illustrates a self-pipe.
Pipes become useful for IPC when they are shared between processes,
which happens when a process with a pipe calls fork
.
The read end of a pipe returns EOF only when all references to the
write end of the pipe are closed. It’s easy to forget to close one end
of a pipe, which can lead to some surprising behavior (see
pipedemo2.c
).
There’s also some interesting behavior when a process writes to a pipe whose read end is closed. By default, this delivers the EPIPE signal to the writing process, which then dies. This prevents writers from speaking endlessly into a void. Again, the behavior only occurs when all references to the read end are closed.
The process of closing all unneeded parts of a pipe is called pipe hygiene.
Pipe manipulation
To understand pipe manipulation in practice, we worked through an example of using pipes in a shell-like context. Here’s the goal.
- We start with a single “shell” process.
- The shell process’s goal is to create two child processes,
a
andb
, with a pipeline connecting them (a \| b
). Specifically:- The
a
child’s stdout should be connected to the write end of the pipe. - The
b
child’s stdin should be connected to the read end of the pipe. - All other child file descriptors should be connected to whatever the shell had.
- The
We start out with this situation: a single shell process with standard input open to the keyboard, and standard output and error open to the screen. (Really, all three file descriptors are open to the terminal.)
The first system call we must make is pipe()
, called by the shell.
The pipe must be created first, before forking children, because a child
can have a file descriptor reference open to the pipe only by inheriting
the file descriptor from the parent. That creates this picture:
And here is our goal: a picture like this.
Keeping the goal in mind will help us make the next steps. Here’s one order that works; there are many other orders that work.
- Shell calls
pipe()
(which, here, returns fds {3, 4}) - Shell calls
fork()
, creating proto-a
- Shell calls
fork()
, creating proto-b
- Shell calls
close(3)
(it doesn’t need the read end of the pipe) - Shell calls
close(4)
(it doesn’t need the write end of the pipe) - Proto-
a
callsclose(3)
(it doesn’t need the read end of the pipe) - Proto-
a
callsdup2(4, 1)
(this closes the old stdout, and makes stdout a reference to the write end of the pipe) - Proto-
a
callsclose(4)
(now that stdout is set up, we need to close the old reference to the write end of the pipe) - Proto-
a
callsexec("a")
, becominga
- Proto-
b
callsclose(4)
- Proto-
b
callsdup2(3, 0)
- Proto-
b
callsclose(3)
- Proto-
b
callsexec("b")
, becomingb
This also shows the benefits Unix derives from separate fork
and
exec
system calls (rather than, say, a combined
start_new_process
system call that started a new process running a
fresh copy of a named program). In between fork
and exec
, the
child processes manipulate file descriptors to set up their desired
environments. The fork
-exec
separation lets shell code perform
these manipulations! Although proto-a
and proto-b
are
independent, isolated processes—they are not the main shell—they are
still running the same program (the same instructions) as the main
shell. That means that the shell’s programmer can control their actions
(by, for example, writing code in an if (child_pid == 0)
block), and
can use that control to set up exactly the right environment for the
a
and b
programs to run.
This dance introduced a new system call, dup2(oldfd, newfd)
, which
makes newfd
point to the same file structure as oldfd
.
Solving the race condition
OK, so we know how pipes work. How can we use pipes to solve the race condition from before?
At root, the race condition issue is that process code cannot guarantee
atomicity except through system calls. Signal handlers can’t provide
atomicity on their own—we can’t delay a signal handler until the
usleep
call begins. But what we can do, is add a system call to
the signal handler. Then the kernel provides atomicity!
When the waitblocksafe2.c
process begins, it opens a self-pipe. It
then installs signal handlers for SIGCHLD and SIGALRM, an “alarm clock”
signal that goes off after a given number of seconds. Each signal
handler writes a byte to the self-pipe. And to detect either timeout
or child death, it suffices to read from the pipe!