This section is about pipes. We run through a bunch of exercises showing how to write pipelines on the shell, and how to determine characteristics of pipes from command lines. Then we turn to implementing pipes.
Exercise: The following command line contains one pipe. What does it do?
ls | head -n 2
It prints the first two items listed in the output of
ls
.
Exercise: Write a sequence of commands that
prints the names of the two files in the current directory that come last in
alphabetical order. For example, in a directory with files named
a
through z
, you should print
z
y
ls | sort -r | head -n 2
Much nicer than redirecting to and from files, and it leaves less garbage around.
Exercise: Count the number of lines in the file words
(and print the count).
wc -l words
Exercise: Print every unique line in the file words
exactly once.
sort -u words
Exercise: Count the number of unique lines in the file words
(and print the count).
sort -u words | wc -l
Pipe characteristics
In this section, we ask you to develop command lines that could help you determine how pipelines work. For each question, describe a command line that could answer that question. Then give the answer.
You’ll probably want to look at the shell utilities from last time.
Exercise: Can a pipeline contain more than two processes?
Sure:
echo holli | tr o e | tr i o
Exercise: How many pipes can be chained together in one line? Can you find a limit?
Technically, there’s no fixed limit, but in practice operating systems impose limits on the total number of file descriptors allowed, shells impose limits on the length of command lines, and there are RAM and CPU limits.
Exercise: Does the right-hand command in a pipeline start before or after the left-hand command starts?
They start at roughly the same time, but usually the left-hand command starts first. The easiest way to see this is to get both commands to print a message to standard error:
cat /tmp/notthere1 | cat /tmp/notthere2
Exercise: Does the right-hand command in a pipeline start before or after the left-hand command exits?
If the left-hand command exits really quickly, then sure, the right-hand command might start after the left-hand command exits. But in general the processes run in parallel, so the right-hand command starts before the left-hand exit. You can see this by slowing down the left-hand command:
sleep 10 | echo hello
Exercise: Does the shell run waitpid
on the left-hand command before
starting the right-hand command?
It must not! If the shell waited for the left-hand command to exit before moving on, then the right-hand command would be delayed.
Exercise: Which of the commands in a pipe pair does the shell wait for, the left-hand command, the right-hand command, or both commands?
Standard shells like
bash
andzsh
wait for both. Trysleep 2 | sleep 10
vs.sleep 10 | sleep 2
; both commands will wait for 10 seconds.Please note that your
sh61
shell only needs to wait for the right-hand command.
Exercise: What is the exit status of a pipeline?
It is the exit status of the rightmost command. For example:
false | false | false | cat && echo Zero-status false | false | false | cat /tmp/notthere1 || echo Nonzero-status true | cat && echo Zero-status true | cat /tmp/notthere1 || echo Nonzero-status
You may also want to use more advanced features of production shells like
bash
and zsh
. Specifically, the parenthesis (subshell) feature allows
an entire command line, enclosed in parentheses, to be treated as a single
command. This adds the following rule to the shell
grammar:
command ::= "(" commandline ")"
To implement a subshell, the parent shell process forks a child process, which
handles the embedded commandline
.
Exercise: What is the exit status of a subshell, if the embedded command line ends with a foreground conditional?
It is the exit status of the terminating foreground conditional.
( false ) || echo Nonzero-status ( true && false ) || echo Nonzero-status ( true ) && echo Zero-status
Exercise: What is the exit status of a subshell, if the embedded command line ends with a background conditional?
It is zero.
( false & ) && echo Zero-status
Exercise: Write a pipeline that generates exactly N
characters of
output.
yes | head -c N
Exercise: Describe how to figure out the size of the pipe buffer through shell command-based experiments.
Run the following command line multiple times with different
N
. The maximumN
that does not delay printing the error message is the pipe buffer size.( yes | head -c N ; cat /tmp/thisfits ) | sleep 10
Exercise: Write a C++ program called showstatus
that starts a subcommand
described by its command line arguments, then prints its numeric exit status
in hexadecimal to standard error. For instance, ./showstatus echo foo
should
print foo
to standard output and 0x0
to standard error. Print the whole
integer value of the exit status, without using WEXITSTATUS
.
#include <cstdio> #include <unistd.h> #include <sys/wait.h> int main(int argc, char** argv) { pid_t p = fork(); if (p == 0) { execvp(argv[1], argv + 1); _exit(1); } int status; while (waitpid(p, &status, 0) != p) { } fprintf(stderr, "0x%x\n", status); }
Exercise: Use showstatus
to determine the exit status of a command that
writes to a pipe whose read end is closed.
./showstatus yes | echo foo
On Mac OS X and Linux this prints
0xd
.
Exercise: Does a command that writes to a pipe whose read end is closed exit normally?
No, it doesn’t. All normal exit statuses have the lower 8 bits of the exit status set to 0. You could also check this with
WIFEXITED
.
Subprocess
Exercise: Write a function that implements a version of Python’s
subprocess.Popen
functionality. Your function should have the following
signature:
// subprocess(file, argv, pfd)
// Run the command `file` with arguments `argv` in a child process.
// Three pipes are opened between this process and the child process:
// one for the child’s standard input, one for its standard output,
// and one for its standard error. The `pfd` argument is populated
// with this process’s pipe ends.
//
// * Data written by this process to `pfd[0]` is read from the child’s
// standard input.
// * Data written to the child’s standard output is read by this
// process from `pfd[1]`.
// * Data written to the child’s standard error is read by this
// process from `pfd[2]`.
//
// Returns the process ID of the child or -1 on error.
pid_t subprocess(const char* file, char* const argv[], int pfd[3]);
You will use the pipe
and dup2
system calls, among others.
Here’s one solution. It uses a common pattern in C programs called “
goto error
”: all the error handling code is in one place, andgoto
is used to jump to that place from multiple locations where errors could occur.pid_t subprocess(const char* file, char* const argv[], int pfd[3]) { // create pipes int inpfd[2] = {-1, -1}, outpfd[2] = {-1, -1}, errpfd[2] = {-1, -1}; pid_t p = -1; if (pipe(inpfd) < 0 || pipe(outpfd) < 0 || pipe(errpfd) < 0) { goto error; } // create child p = fork(); if (p == 0) { dup2(inpfd[0], STDIN_FILENO); close(inpfd[0]); close(inpfd[1]); dup2(outpfd[1], STDOUT_FILENO); close(outpfd[0]); close(outpfd[1]); dup2(errpfd[1], STDERR_FILENO); close(errpfd[0]); close(errpfd[1]); execvp(file, argv); _exit(1); } else if (p < 0) { goto error; } else { // clean up file descriptors close(inpfd[0]); pfd[0] = inpfd[1]; close(outpfd[1]); pfd[1] = outpfd[0]; close(errpfd[1]); pfd[2] = errpfd[0]; // return pid return p; } error: if (inpfd[0] >= 0) { close(inpfd[0]); close(inpfd[1]); } if (outpfd[0] >= 0) { close(outpfd[0]); close(outpfd[1]); } if (errpfd[0] >= 0) { close(errpfd[0]); close(errpfd[1]); } return -1; }