Pipes, Forks, and Zombies
Pipe History
The idea of pipes
Doug Mcllroy, described the concept of pipes long before they were implemented.
Summary--what's most important.
To put my strongest concerns into a nutshell:
1. We should have some ways of coupling programs like
garden hose--screw in another segment when it becomes when
it becomes necessary to massage data in another way.
This is the way of IO also.
2. Our loader should be able to do link-loading and
controlled establishment.
3. Our library filing scheme should allow for rather
general indexing, responsibility, generations, data path
switching.
4. It should be possible to get private system components
(all routines are system components) for buggering around with.
M. D. McIlroy
October 11, 1964
Literative Programming
Don Knuth is the person who came up with the term computer science
.
He wrote The Art of Computer Programming
and created the Latex
language. Knuth enjoyed writing and programming so much that he
developed Literative Programming
. Knuth created a style of
programming that allows you to write text about a program as you write
code. The idea is to write prose and programs simultaneously. However,
this idea did not take off because there was a large amount of overhead
associated with completing simple tasks like text parsing. From a
systems programming perspective using pipes, Mcllroy responded to
Knuth's work. He was able to complete the same text parsing task in 6
lines of shell code using pipes. While Knuth approached the problem from
an algorithms perspective, Mcllroy approached the problem from a systems
perspective, chaining intermediate outputs together to arrive at the
answer.
Pipes matter!
The Less Program
The seq
program takes a number as an argument and prints consecutive
numbers starting at the first number. If a second argument is provided,
the numbers will stop printing at the once the second number has been
reached. Otherwise, the numbers will continue forever.
$seq 2 5
2
3
4
5
If we pipe the output of the seq
program to less, the output
displayed on the screen is truncated because piping to less only
displays enough output to fill your screen. The seq
program appears
to be paused. However, is it still running?
$seq 2 100000000 | less
2
3
4
5
6
:█
If we pipe seq
to less
and look at the list of running processes
using ps aux
, we can see that the seq
program is not running.
Using strace to further examine what is happing reveals that after a
series of write
commands, there is a SIGPIPE
signal. A
SIGPIPE
occurs when there you are writing to a pipe with no readers.
The default action after a SIGPIPE
is to kill the program. Pipes
automatically kill programs when their output is no longer needed. This
explains why the seq
program is killed when it is piped to less.
Using Pipe to Implement Waitpid
waitpid(p, &status, 0); // block until p exits or there is a signal
Given that we don't care about &status
, how can we use pipes to
create a blocking call that unblocks when the process dies? When the
child returns, we want the call to read
to return 0 because all the
child will have exited and all write ends of the pipe will be closed.
int main() {
int pipdfd[2];
pipe(pipefd);
pid_t p = fork();
if (p == 0) {
exec();
}
close(pipfd[1]); // all write ends must be closed
char buf;
read(pipefd[0], &buf, 1); // the read syscall will block until the child exits
// when the child exits, this will return 0
close(pipefd[0]); // pipe hygiene
}
Parents, Children, and Zombies
Process Hierarchy
Every process has a single parent. The root of the process hierarchy (or
process tree) is a process called init
, which has pid 1
. This is
the only process that cannot be killed. The waitpid
retrieves a
process's exit status. The exit status of a process is stored in the
process structure until the parent process needs the status. Waitpid
collects the status and recycles the process structure. This means that
the process structure can be reused for another process.
ManyFork
The manyfork
program tries to execute the fork instruction 10000
times. However, if we run ./manyfork
, only ~3400 process have been
created. Running sudo ./manyfork
, which gives the program more
privileges, results in ~6890 process created. The operating system is
protecting the user from runaway program. If we look at the processes
created by the manyfork
program, we see that most of them are
defunct.
Zombies
The manyfork
program does not wait for its children using
waitpid
. This will created what is called a zombie process. A
zombie process is a process that has been terminated but that has not
been waited upon by a parent. The ps
command allows to identify
these zombie processes. Below is a sample output of ps
after running
the manyfork
program.
user 78623 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork]
user 78624 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork]
user 78625 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork]
The Z+
column tells us that these processes are zombie processes.
These zombie processes are resources, namely process IDs. When a child
outlives its parent, the child's parent process is reassigned to the
init
process with pid 1. The init
process collects orphaned
children in this way. The job of init
is to call waitpid on orphaned
children to collect their resources.