Introduction

Today's section:

Reviews exec, fork, and dup2 system calls
Covers piping

Pull the latest version of the sections repository by entering in the terminal

git pull

The cs61-section/s11 directory contains the files for the exercises throughout this section.

System Calls

fork

pid_t fork(void);

Creates a new child process
Copies all the memory of the parent to the child
Note that in the virtual memory assignment, you will copy all user-accessible, writeable memory in the parent to the child. Most OSs use copy-on-write, where all writeable memory is marked as readonly in both parent and child, and then copied when either parent or child attempts to write to it.
Copies the parent’s file descriptor table to the child. This means the child has the same files open as the parent. Also, these open files share state, so if the parent seeks in one of these open files, the seek will affect both parent and child.
Starts child process and returns 0.
Resumes parent process and returns the PID of the child.

Try to create a basic conditional skeleton for fork before looking at the one below.

Solution here.

exec

int execvp(const char \*file, char \*const argv[]);

Takes in a file (either a path to a binary like “./prog” or something like “ls”, which it will look the path to), and a NULL terminated array of arguments to the program (like main’s argv, with argv[argc] (the end of the array) == NULL)
Executes the given program, replacing the current memory of the calling program with the memory for the given program
On success, execvp will not return, because the calling program “turns into” the program being executed.

If a command is passed as an argument to the program, what do we add to our skeleton to run that command in a child process? Assume that we are only passing a command with no argument (e.g. ls but not wc -c asdf.txt)

Solution here.

waitpid

pid_t waitpid(pid_t pid, int \*status, int options)

Simple case: waitpid(pid, NULL, 0);
- Returns when the process with process id pid exits
We can pass in various values in as the last argument to vary the behavior of waitpid. A flag of particular interest may be WNOHANG, which allows waitpid to return immediately without waiting.
We can also pass in an int status by reference to get various information about the process after it exits. For instance the following will print the child’s exit status (i.e. the return value of main or the number passed to an exit() call:

   int status;
   waitpid(pid, &status, 0);
   // make sure it exited normally (vs e.g. being killed)
   if (WIFEXITED(status))
       printf(“%d\n”, WEXITSTATUS(status));

Make the parent in our skeleton wait for the child process to finish running and then print "done"

Solution here.

dup2

int dup2(int oldfd, int newfd);

Makes newfd a copy of oldfd (closing the current newfd if necessary)
i.e. reading and writing from newfd will read and write from the file referred to by oldfd.

What if we want to redirect the stdin and stdout of the program we are running in the child process to use an infile and outfile that we pass as arguments to our program? (e.g. ./prog wc in.txt out.txt should output the character, word, and line counts of in.txt to the file out.txt)

Solution here.

Piping

Every pipe has a 'read' end and a 'write' end. Before you implement your pipe, make sure you know which side of the pipe is being used in each process!

pipe System Call

int pipe(int pipefd[2]);

Accepts an uninitialized array of 2 ints
After calling pipe, the first element of the array pipefd[0] will be the “read” end of the pipe, and the second element pipefd[1] will be the “write” end of the pipe. So anything written to pipefd[1] can then be read from pipefd[0].

Instead of an infile and an outfile, we want to take text that will be passed to the parent and pipe it to the stdin for the child process.

Solution here.

Exercise: Process Management

Here's the skeleton of a shell function implement a simple two-command pipeline, such as cmd1 \| cmd2

void simple_pipe(const char* cmd1, char* const* argv1, const char* cmd2, char* const* argv2) {
    int pipefd[2], r, status;

[A]

    pid_t child1 = fork();
    if (child1 == 0) {

[B]

        execvp(cmd1, argv1);
    }
    assert(child1 > 0);

[C]

    pid_t child2 = fork();
    if (child2 == 0) {

[D]

        execvp(cmd2, argv2);
    }
    assert(child2 > 0);

[E]

And here are some system calls:

[1] close(pipefd[0]);
[2] close(pipefd[1]);
[3] dup2(pipefd[0], STDIN_FILENO);
[4] dup2(pipefd[0], STDOUT_FILENO);
[5] dup2(pipefd[1], STDIN_FILENO);
[6] dup2(pipefd[1], STDOUT_FILENO);
[7] pipe(pipefd);
[8] r = waitpid(child1, &status, 0);
[9] r = waitpid(child2, &status, 0);

Your task is to assign system call IDs, such as "1", to slots, such as "A", to achieve several behaviors, including a correct pipeline and several incorrect pipelines.

Implement a correct foreground pipeline.
Implement a pipeline so that, given arguments corresponding to echo foo \| wc -c, the wc process reads “foo” from its standard input but does not exit thereafter.
Implement a pipeline so that, given arguments corresponding to echo foo \| wc -c, “foo” is printed to the shell’s standard output and the wc process prints “0”. (In a correctly implemented pipeline, wc would print 4, which is the number of characters in “foo\n”.)
Implement a pipeline that appears to work correctly on echo foo \| wc -c, but always blocks forever if the left-hand command outputs more than 65536 characters
Implement a pipeline so that, given arguments corresponding to echo foo \| wc -c, both echo and wc report a “Bad file descriptor” error. (This error, which corresponds to EBADF, is returned when a file descriptor is not valid or does not support the requested operation.)

Solutions here.

Exercise: The Sieve of Eratosthenes

Today's exercise will walk you through building a pretty cool implementation of a prime number generator. The algorithm that we'll use is called the Sieve of Eratosthenes.

Before we start coding, let's get a feel for how the algorithm works.

Write down the numbers 2 through 25, in order. Repeat the following steps until all numbers are either circled or crossed out.

1. Circle the first number that is neither circled nor crossed out.

2. Cross out all numbers after that number that are multiples of the number you just circled (i.e., when you circle 2, you will cross out all the even numbers).

You should be left with all the prime numbers less than 25 circled.

Now we'll build a program to do this (somewhat inefficiently, but in a particularly entertaining manner, giving you practice with fork, exec, and pipe.

seq: a good start

You'll find that there is a program on your system called seq that generates a stream of numbers. Check out the man page for seq and prove to yourself that you can create a stream of integers that we use in our sieve program.

filtermultiples: remove multiples

The first step of the sieve is removing multiples of the first prime, 2. filtermultiples is a program that takes in a number as an argument for which it will remove all multiples of that number from stdin. For example, running ./filtermultiples 2 and then typing 2 3 4 5 6 7 (pressing enter after each number) will yield

3
5
7

because we are removing all multiples of 2 from the arguments 2 3 4 5 6 7.

We have provided a skeleton in filtermultiples.c that will read each number in one by one. Your job is to write the rest of filtermultiples.c so that it outputs the subset of numbers that are not multiples of the first number read in.

Let's say we want to output all the numbers from 2 to 1000000 that are not multiples of 2. We don't want to type out every number from 2 to 1000000, so we can use seq to pipe the numbers 2 through 1000000 to ./filtermultiples 2 and output all of the numbers that are not divisible by 2 as follows

seq 2 1000000 | ./filtermultiples 2

What if we want to remove all multiples of 2 and 3?

seq 2 1000000 | ./filtermultiples 2 | ./filtermultiples 3

We can keep piping with subsequent primes to remove more and more composite numbers.

Make sure you understand why this works before moving on

sieve: building a simple sieve

Our sieve should output all prime numbers from 2 to 1000000. We have included a skeleton in sieve.c that already gets the numbers from 2 through 1000000 for you. It also has the basic sieve structure for taking the first number that is not a multiple and assigning it as the next prime number. Your job is to write the function add_filter that will take a stream of numbers that are not multiples of the last prime and output a stream of numbers that are not multiples of the new prime.

Hint: you will need to use pipe, fork, dup2, and exec! Think about how you want to redirect stdin and stdout and where you want them to be redirected.

After writing add_filter, running ./sieve should output all prime numbers from 2 through 1000000!