Shell 1: Process control

This unit is about the system calls for

of processes.

“Screwing together programs like garden hose”

The UNIX philosophy: Simple programs achieve simple tasks, but a complex task can be accomplished by running many simple programs in coordination. The same program may even be reused multiple times in the same task. The operating system should enable this coordination scheme without much hassle.

There are costs and benefits of running multiple small programs instead of a big, complicated program for a certain task.

Costs: In some programming environments, there is a one-time, constant, but very high cost of running a program, regardless of the complexity of the program itself. The Java runtime is such an example, where memory consumption can be high for even very simple programs. In some interpreted languages like Python and JavaScript, running a program may involve starting up the interpreter, which can incur a high cost because these interpreters are very complex programs themselves.

Benefits: Different programs run in isolated environments, so an error in one program responsible for a certain task would be self-contained to that program only. This makes it easy to isolate failures in the system. The same program may also be reused for different stages of the task, reducing the code base required to perform the task. The main benefit, as described above, is what we call the modularity benefit.

Modularity: A programming system's property of dividing itself into self-contained modules.

A module ideally have a simple and well-defined interface, so that it's easy to use. The best modules also have deep and good implementations of the interfaces so that they perform their tasks efficiently.

"Say 'word count' one more time"

We are going to visit an article, Programming pearls: A literate program, about Turing Award winner and Stanford professor Donald Ervin Knuth. As one of the most respected voices in the computer science community, Prof. Knuth was the creator of the famous TeX typesetting system (which LaTeX is derived from). He was very good at creating incredibly complex software that also just appears to be 100% correct.

The columnist who wrote the article, Prof. Jon Bentley, once posed the following problem to Prof. Knuth:

Given a text file and an integer K, print the K most common words in the file (and the number of their occurrences) in decreasing frequency.

You can read Prof. Knuth's solution in the article, which involves some rigorous type definitions and a fascinating data structure. In short, it's very complicated.

Prof. Bentley proposed a UNIX shell script that achieves the same goal (adapted to modern shell syntax) with reasonably good performance:


# script:

tr -cs A-Za-z'' | \
tr A-Z a-z | \
sort | \
uniq -c | \
sort -rn | \
head -n ${1} # ${1} refers to the first parameter of the script, in this case the integer K

What the script does is to spawn a series of programs (6 in total in this case) and to connect them to form a pipeline. The 6 programs run in parallel and consume each others' outputs in the pipe line to perform the described task. Let's see what it does:

We can use this script (assuming it's called in a UNIX shell command line environment as shown below:

# to show 10 most frequent words and their counts in input.txt
$ cat input.txt | ./ 10

The simplicity of the shell script demonstrates the benefit of having modular coordination among small programs, or benefit of the UNIX programming philosophy in general.

Read further: Prof. Bentley's previous column, Programming pearls: Literate programming.

Controlling processes

In order to support this style of coordinated task solving with multiple programs we need the capability to easily control and manage multiple independent programs. There is one program whose only purpose is to provide such capacity -- the shell.

The shell uses a number of system calls to perform these process coordination tasks. We will discuss them one at a time.

Process creation: fork()

Recall that the fork() system calls creates a new process. The new process has a copy of the old process's state (at the time of the system call). Let's consider the following program:

int main() {
	pid_t p1 = fork();
	assert(p1 >= 0);

	printf("Hello from pid %d\n", getpid());

Running this program gives outputs like this:

Hello from pid 35931
Hello from pid 35932

This is not surprising at all. We may recognize that the process with the lower pid is the process that calls fork() (the parent process), and the one with the higher pid is the newly created process (the child process).

If we run the program multiple times though, sometimes we also see the following outputs:

Hello from pid 35933
<shell prompt> $ Hello from pid 35934

In this case a shell prompt gets printed before the child gets to print out its message. This indicates that the shell only waits for the parent to finish before printing out a prompt and therefore can accept new commands. When a shell "waits" for a process (like the parent, in this case) in this way, we call that the process executes in the foreground within the shell.

We know from the last lecture that fork() performs the following:

Here is

int main() {
	pid_t p1 = fork();
	assert(p1 >= 0);

	pid_t p2 = fork();
	assert(p2 >= 0);

	printf("Hello from pid %d\n", getpid());

Running this program gives us outputs like this:

Hello from pid 73285
Hello from pid 73286
Hello from pid 73288
<shell prompt> $ Hello from pid 73287

We have 4 processes running in this case, and the shell again only waits for the parent to finish. We can visualize it like:

                    1st fork()
                   /         \
             (p1!=0)         (p1==0)
              |                   |
          2nd fork()            2nd fork()
          /        \            /        \
       (p2!=0)  (p2==0)      (p2!=0)  (p2==0)
          |        |            |        |
       parent   child2       child1  grand child

Or we can draw a process hierarchy diagram (shown on the board during lecture):

                      /     \
                     /       \
                 1st fork  2nd fork
                   /           \
                  /             \
              (proc1)         (proc2)
           2nd fork

proc0: parent
proc1: child1
proc2: child2
proc3: grand child

Let's take a look at another program in

int main() {
	pid_t p1 = fork();
	assert(p1 >= 0);

	const char* test;
	if (p1 == 0) {
		text = "BABY\n";
	} else {
		text = "mama\n";

	for (int i = 0; i != 1000000; ++i) {
		ssize_t s = write(STDOUT_FILENO, text, strlen(text));
		assert(s == (ssize_t) strlen(text));

Running the program prints out interleaving lines of BABY and mama to the screen. Sometimes the interleaving is perfect (we see exactly one BABY followed by exactly one mama, and so on), sometimes we see consecutive printouts of BABY or mama. It does seem that every line is either BABY or mama.

We can verify this using the shell knowledge we just learned. Running

./forkmix-syscall | sort | uniq

generates exactly 2 lines (BABY and mama), so our assumption seems correct. It is guaranteed that every line is either BABY or mama. This is because of the atomicity guarantee of the write() system call.

Let's now look at a very similar program in The program is almost identical to except that it uses stdio's fputs() instead of the write() system call to write out data.

When we run this program, first we observe that it is much faster (because fewer system calls!), second the output doesn't look quite right this time. When we again run the sort-uniq pipeline against its output, we get much weirder results like lines with maBABY, mamaBABY, maY, and so on. So looks like even characters are now interleaved! Why?

Recall that stdio uses a internal user-level cache that's 4096 bytes large. Once the cache is filled the whole 4KB block is written using a single system call, which is atomic. The problem is that the strings we are writing, "BABY\n" and "mama\n", are actually 5 bytes long. 4096 is not divisible by 5, so at cache line boundaries we can have "left-over" characters that gives us this behavior. It's also worth noting here that the two processes here (parent and child) do not share the same stdio cache, because the cache is in the process's own isolated address space.

Now let's look at our last forkmix program, Again the program is similar, except that it now opens a file (all using stdio) and writes to the file, instead of the screen (stdout).

Note the following few lines in the program:

FILE* f = fopen(argv[1], "w");

pid_t p1 = fork();
assert(p1 >= 0);

So the program opened the file before calling fork(). This shouldn't give us any surprising results compared to forkmix-stdio. But what if we called fork() before opening the file?

Once we move the fopen() call after the fork, we observe that the output file now contains only half of the expected number of characters.

The reason for this effect is because when we call fopen() after the fork(), the parent and the child will get two separate offsets into the same file (maintained by the kernel). The two processes then essentially race with each other to fill up the file. The process that writes slower will overwrite what's already been written in the file at that offset. Eventually the length of the file will only be the number of characters written by either process, because that's the final value of the two file offset values in both the parent and the child.

(diagrams to follow)