2017/Shell3

From CS61
Jump to: navigation, search

Pipes, Forks, and Zombies

Pipe History

The idea of pipes

Doug Mcllroy, described the concept of pipes long before they were implemented.

           Summary--what's most important.
   To put my strongest concerns into a nutshell:
 1. We should have some ways of coupling programs like
 garden hose--screw in another segment when it becomes when
 it becomes necessary to massage data in another way.
 This is the way of IO also.
 2. Our loader should be able to do link-loading and
 controlled establishment.
 3. Our library filing scheme should allow for rather
 general indexing, responsibility, generations, data path
 switching.
 4. It should be possible to get private system components
 (all routines are system components) for buggering around with.
                                               M. D. McIlroy
                                               October 11, 1964

Literative Programming

Don Knuth is the person who came up with the term computer science. He wrote The Art of Computer Programming and created the Latex language. Knuth enjoyed writing and programming so much that he developed Literative Programming. Knuth created a style of programming that allows you to write text about a program as you write code. The idea is to write prose and programs simultaneously. However, this idea did not take off because there was a large amount of overhead associated with completing simple tasks like text parsing. From a systems programming perspective using pipes, Mcllroy responded to Knuth's work. He was able to complete the same text parsing task in 6 lines of shell code using pipes. While Knuth approached the problem from an algorithms perspective, Mcllroy approached the problem from a systems perspective, chaining intermediate outputs together to arrive at the answer.

Pipes matter!

Mcllroy Knuth

The Less Program

The seq program takes a number as an argument and prints consecutive numbers starting at the first number. If a second argument is provided, the numbers will stop printing at the once the second number has been reached. Otherwise, the numbers will continue forever.

 $seq 2 5
 2 
 3 
 4
 5   

If we pipe the output of the seq program to less, the output displayed on the screen is truncated because piping to less only displays enough output to fill your screen. The seq program appears to be paused. However, is it still running?

 $seq 2 100000000 | less
 2 
 3 
 4 
 5 
 6 
 :█

If we pipe seq to less and look at the list of running processes using ps aux, we can see that the seq program is not running. Using strace to further examine what is happing reveals that after a series of write commands, there is a SIGPIPE signal. A SIGPIPE occurs when there you are writing to a pipe with no readers. The default action after a SIGPIPE is to kill the program. Pipes automatically kill programs when their output is no longer needed. This explains why the seq program is killed when it is piped to less.

Using Pipe to Implement Waitpid

 waitpid(p, &status, 0); // block until p exits or there is a signal 

Given that we don't care about &status, how can we use pipes to create a blocking call that unblocks when the process dies? When the child returns, we want the call to read to return 0 because all the child will have exited and all write ends of the pipe will be closed.

 int main() { 
   int pipdfd[2]; 
   pipe(pipefd); 
   pid_t p = fork(); 
   if (p == 0)  {   
     exec(); 
   } 
   close(pipfd[1]);             // all write ends must be closed 
   char buf; 
   read(pipefd[0], &buf, 1);    // the read syscall will block until the child exits 
                                // when the child exits, this will return 0
   close(pipefd[0]);            // pipe hygiene 
 }

Parents, Children, and Zombies

Process Hierarchy

Every process has a single parent. The root of the process hierarchy (or process tree) is a process called init, which has pid 1. This is the only process that cannot be killed. The waitpid retrieves a process's exit status. The exit status of a process is stored in the process structure until the parent process needs the status. Waitpid collects the status and recycles the process structure. This means that the process structure can be reused for another process.

ManyFork

The manyfork program tries to execute the fork instruction 10000 times. However, if we run ./manyfork, only ~3400 process have been created. Running sudo ./manyfork, which gives the program more privileges, results in ~6890 process created. The operating system is protecting the user from runaway program. If we look at the processes created by the manyfork program, we see that most of them are defunct.

Zombies

The manyfork program does not wait for its children using waitpid. This will created what is called a zombie process. A zombie process is a process that has been terminated but that has not been waited upon by a parent. The ps command allows to identify these zombie processes. Below is a sample output of ps after running the manyfork program.

 user 78623 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork] <defunct>
 user 78624 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork] <defunct>
 user 78625 0.0 0.0 0 0 pts/0 Z+ 15:44 0:00 [manyfork] <defunct>

The Z+ column tells us that these processes are zombie processes. These zombie processes are resources, namely process IDs. When a child outlives its parent, the child's parent process is reassigned to the init process with pid 1. The init process collects orphaned children in this way. The job of init is to call waitpid on orphaned children to collect their resources.