Section 8: Pipes – CS 61 2019

This section is about pipes. We run through a bunch of exercises showing how to write pipelines on the shell, and how to determine characteristics of pipes from command lines. Then we turn to implementing pipes.

Exercise: The following command line contains one pipe. What does it do?

ls | head -n 2

Exercise: Write a sequence of commands that prints the names of the two files in the current directory that come last in alphabetical order. For example, in a directory with files named a through z, you should print

z
y

Exercise: Count the number of lines in the file words (and print the count).

Exercise: Print every unique line in the file words exactly once.

Exercise: Count the number of unique lines in the file words (and print the count).

Pipe characteristics

In this section, we ask you to develop command lines that could help you determine how pipelines work. For each question, describe a command line that could answer that question. Then give the answer.

You’ll probably want to look at the shell utilities from last time.

Exercise: Can a pipeline contain more than two processes?

Exercise: How many pipes can be chained together in one line? Can you find a limit?

Exercise: Does the right-hand command in a pipeline start before or after the left-hand command starts?

They start at roughly the same time, but usually the left-hand command starts first. The easiest way to see this is to get both commands to print a message to standard error:
cat /tmp/notthere1 | cat /tmp/notthere2

Exercise: Does the right-hand command in a pipeline start before or after the left-hand command exits?

If the left-hand command exits really quickly, then sure, the right-hand command might start after the left-hand command exits. But in general the processes run in parallel, so the right-hand command starts before the left-hand exit. You can see this by slowing down the left-hand command:
sleep 10 | echo hello

Exercise: Does the shell run waitpid on the left-hand command before starting the right-hand command?

Exercise: Which of the commands in a pipe pair does the shell wait for, the left-hand command, the right-hand command, or both commands?

Exercise: What is the exit status of a pipeline?

It is the exit status of the rightmost command. For example:

false | false | false | cat && echo Zero-status
false | false | false | cat /tmp/notthere1 || echo Nonzero-status
true | cat && echo Zero-status
true | cat /tmp/notthere1 || echo Nonzero-status

You may also want to use more advanced features of production shells like bash and zsh. Specifically, the parenthesis (subshell) feature allows an entire command line, enclosed in parentheses, to be treated as a single command. This adds the following rule to the shell grammar:

command ::= "(" commandline ")"

To implement a subshell, the parent shell process forks a child process, which handles the embedded commandline.

Exercise: What is the exit status of a subshell, if the embedded command line ends with a foreground conditional?

It is the exit status of the terminating foreground conditional.
( false ) || echo Nonzero-status
( true && false ) || echo Nonzero-status
( true ) && echo Zero-status

Exercise: What is the exit status of a subshell, if the embedded command line ends with a background conditional?

It is zero.

( false & ) && echo Zero-status

Exercise: Write a pipeline that generates exactly N characters of output.

Exercise: Describe how to figure out the size of the pipe buffer through shell command-based experiments.

Run the following command line multiple times with different N. The maximum N that does not delay printing the error message is the pipe buffer size.
( yes | head -c N ; cat /tmp/thisfits ) | sleep 10

Exercise: Write a C++ program called showstatus that starts a subcommand described by its command line arguments, then prints its numeric exit status in hexadecimal to standard error. For instance, ./showstatus echo foo should print foo to standard output and 0x0 to standard error. Print the whole integer value of the exit status, without using WEXITSTATUS.

#include <cstdio>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char** argv) {
    pid_t p = fork();
    if (p == 0) {
        execvp(argv[1], argv + 1);
        _exit(1);
    }
    int status;
    while (waitpid(p, &status, 0) != p) {
    }
    fprintf(stderr, "0x%x\n", status);
}

Exercise: Use showstatus to determine the exit status of a command that writes to a pipe whose read end is closed.

./showstatus yes | echo foo
On Mac OS X and Linux this prints 0xd.

Exercise: Does a command that writes to a pipe whose read end is closed exit normally?

Subprocess

Exercise: Write a function that implements a version of Python’s subprocess.Popen functionality. Your function should have the following signature:

// subprocess(file, argv, pfd)
//    Run the command `file` with arguments `argv` in a child process.
//    Three pipes are opened between this process and the child process:
//    one for the child’s standard input, one for its standard output,
//    and one for its standard error. The `pfd` argument is populated
//    with this process’s pipe ends.
//
//    * Data written by this process to `pfd[0]` is read from the child’s
//      standard input.
//    * Data written to the child’s standard output is read by this
//      process from `pfd[1]`.
//    * Data written to the child’s standard error is read by this
//      process from `pfd[2]`.
//
//    Returns the process ID of the child or -1 on error.
pid_t subprocess(const char* file, char* const argv[], int pfd[3]);

You will use the pipe and dup2 system calls, among others.

Here’s one solution. It uses a common pattern in C programs called “goto error”: all the error handling code is in one place, and goto is used to jump to that place from multiple locations where errors could occur.

pid_t subprocess(const char* file, char* const argv[], int pfd[3]) {
    // create pipes
    int inpfd[2] = {-1, -1}, outpfd[2] = {-1, -1}, errpfd[2] = {-1, -1};
    pid_t p = -1;
    if (pipe(inpfd) < 0
        || pipe(outpfd) < 0
        || pipe(errpfd) < 0) {
        goto error;
    }

    // create child
    p = fork();
    if (p == 0) {
        dup2(inpfd[0], STDIN_FILENO);
        close(inpfd[0]);
        close(inpfd[1]);
        dup2(outpfd[1], STDOUT_FILENO);
        close(outpfd[0]);
        close(outpfd[1]);
        dup2(errpfd[1], STDERR_FILENO);
        close(errpfd[0]);
        close(errpfd[1]);
        execvp(file, argv);
        _exit(1);
    } else if (p < 0) {
        goto error;
    } else {
        // clean up file descriptors
        close(inpfd[0]);
        pfd[0] = inpfd[1];
        close(outpfd[1]);
        pfd[1] = outpfd[0];
        close(errpfd[1]);
        pfd[2] = errpfd[0];

        // return pid
        return p;
    }

  error:
    if (inpfd[0] >= 0) {
        close(inpfd[0]);
        close(inpfd[1]);
    }
    if (outpfd[0] >= 0) {
        close(outpfd[0]);
        close(outpfd[1]);
    }
    if (errpfd[0] >= 0) {
        close(errpfd[0]);
        close(errpfd[1]);
    }
    return -1;
}