Overview
In this lecture, we discuss race conditions in process control.
Full lecture notes — Textbook readings
Aside: Robustness and process control
- A program that executes only in theory needs only to be correct
- Given any input, it produces the expected output
- But theoretical programs often have preconditions
- Requirements on the world
- Input follows expected format, arrives in finite time…
- The world doesn’t always act as we’d prefer
- A program that executes in reality should also be robust
- “The ability of a computer system to cope with errors during execution and cope with erroneous input”
- Example: Kernels
- In systems that coordinate multiple processes, the system as a whole should also be robust!
Testing robustness
- Provide program with as many inputs as possible
- Fail individual system components
- Chaos Monkey: “Exposing engineers to failures more frequently incentivizes them to build resilient services.”
Techniques for robustness
- Assertions
- Error checking
- Timeouts
Racer
- Parent process starts a child
- Child does work, then exits
- Child should finish by
timeout - If child finishes before
timeout, parent printsok - If child times out, parent prints
FAIL - Questions
- Possible race conditions?
racer-poll
- Reliable, but uses 100% CPU to wait
- Not a valuable use of CPU
Racer arguments
-w TIME: child work time isTIME(default 0.5 sec)-t TIME: parent timeout isTIME(default 0.75 sec)-V: verbose output
Polling and blocking
- Blocking: Process waits for communication
- Polling: Process checks repeatedly for communication
- Advantage of polling: Fewer race conditions
- Advantage of blocking: Lower CPU usage
Signals
- A signal is the process control analogue for an operating system interrupt
- Represent events that might occur at unpredictable times and/or need
to interrupt long-running computations
- Control-C ⟶
SIGINT kill -9⟶SIGKILL- Null pointer reference ⟶
SIGSEGV - A child process exited ⟶
SIGCHLD
- Control-C ⟶
Signal system calls
sigactionestablishes a signal handler
void handle_signal(int signal_number) {
// do something to handle the signal
}
...
struct sigaction sa;
sa.sa_handler = handle_signal; // or SIG_IGN or SIG_DFL
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGINT, &sa, nullptr);
kill(pid_t pid, int sig)sends a signal- Some signals are generated automatically
When can a signal be delivered?
- In between any two instructions in the program
- Also interrupts certain system calls
- System call may return early (e.g., a short read)
- System call may return having done no work:
errno == EINTR
man 7 signalfor more- “Interruption of system calls…” for list of system calls that can return
EINTR - Also documented on system call manual pages
- “Interruption of system calls…” for list of system calls that can return
racer-block
- Unreliable!
- If child exits immediately, signal is delivered before
sleep, and therefore does not interruptsleep
racer-blockvar
- Still unreliable! (though less unreliable)
- Signal might delivered between any two instructions!
Race conditions!
- Can you use synchronous IPC to solve this race condition?
Statistics
- On an unloaded machine,
./racer-blockran more than 1,000,000 times without error - On a busy machine,
./racer-blockexperienced an error about 1 in 300 times!./racer-blockvarexperienced an error about 1 in 750 times- A version of
./racer-blockvarthat tried to minimize the race condition experienced an error 1 in 25,000 times
- Maybe, in practice, 1 in 25,000 times on a heavily-loaded machine (1 in \gg1,000,000 times on an unloaded machine) is acceptable?
- But often any race condition problem is a serious error!
racer-selfpipe
- Process opens pipe to itself
- Signal handlers, for either
SIGALRM(timeout) orSIGCHLD(child exit), write to pipe readwill either succeed right away or be interrupted by a signal- Reliable timeout!
Reasoning about race conditions
- In
racer-selfpipe, which actions act as barriers that enforce order?