- 1 Introduction
- 2 Process I/O
- 3 More about the Shell
- 4 Exercise: Parsing
- Gives an overview of I/O streams
- Provides examples of common shell utilities and shell syntax
- Explores the sh61 syntax and some structures for parsing it
Pull the latest version of the sections repository by entering in the terminal
cs61-section/s10 directory contains a few files we'll use throughout the section notes and some sample structures we might use to parse sh61 command lines.
The three I/O streams that are given by default to every process are stdin, stdout, and stderr.
- stdin is the default input stream for the program. When your C program expects input from the console or wants to read something without opening a file, it usually reads from stdin.
- stdout is the default output stream for the program. When your C program prints something to the console, it is writing to stdout.
- stderr is the default output stream for errors. Why is this useful if we already have a stdout? Sometimes programs want to redirect stdout to another file or program. If we print errors to stdout, we won't notice until it's too late.
For example, if we run
$ echo foo, the following diagram describes how the echo program interacts with stdin, stdout, and stderr:
Note that in this example,
foo is a command-line argument, not read in from stdin.
Redirections to redirect a program’s input and output from and to files of our choosing. For example, let's run
$ echo foo > temp.txt. If you open up
temp.txt, you'll find that it has a single line containing
foo. The right angle bracket told our shell to redirect
echo's stdout to
temp.txt and rewrite it:
If instead we wanted to append to a file, we could use a double right angle bracket:
$ echo bar >> temp.txt. If you run that command, you'll find that
temp.txt now has a second line, saying
We can also redirect the contents of a file to a program's stdin. Let's run
$ wc -w < temp.txt.
wc is a common Unix program used for counting words; the
-w command-line argument tells it to count the number of words.
wc usually takes input directly from the console, but here we have redirected stdin to a file:
But what if we wanted to direct the output of one file directly to the input of another, without sending the output to a file first? We can do this using an operating system abstraction called pipes. Pipes are a way of linking an I/O stream from one program directly to the I/O stream of another program, in real time. Each pipe has a read end and a write end; characters written to the write end can be immediately read from the read end. Each program interacts with its end of a pipe just like any other file descriptor; it can call read, write, close, etc. When a program calls read on the read end of a pipe, the read call blocks until something is written to the write end of the pipe, at which point the read call returns.
$ echo "foo bar baz" | wc -w. You should find
3 printed out at the terminal. What's going on here? We have used the
| character to tell the shell to create a pipe between the
wc programs, and to redirect
echo's stdout to the write end of the pipe, and
wc's stdin to the read end of the pipe:
In particular, the shell has used a system call named
dup2 to actually set the file descriptor associated with stdout in
echo to the write end of the pipe, and to set the file descriptor associated with stdin in
wc to the read end of the pipe.
wc have no idea that they aren't writing to and reading from the console!
More about the Shell
Useful Shell Utilities
Here's some common shell utilities that you may find useful in your daily life and for testing your shell:
||Write standard input to standard output.|
||Count lines, words, and characters in standard input, write result to standard output.|
||Print first N lines of standard input.|
||Print last N lines of standard input.|
||Print arguments with printf-style formatting.|
||Always succeed (exit with status 0).|
||Always fail (exit with status 1).|
||Sort lines in input.|
||Drop duplicate lines in input (or print only duplicate lines).|
||Change characters; e.g., |
||Download URL and write result to standard output.|
||Pause for N seconds, then exit with status 0.|
||Cut selected portions of each line of a file.|
Common Shell Syntax
Features of normal shells and
- command1 ; command2. Sequencing. Run command1, and when it finishes, run command2.
- command1 & command2. Backgrounding. Start command1, but don't wait for it to finish. Run command2 right away.
- command1 && command2. On success. Run command1. If it finishes by exiting with status 0, run command2.
- command1 || command2. On failure. Run command1. If it finishes by not exiting (e.g., with a segfault), or by exiting with a status ≠ 0, then run command2.
- command1 | command2. Pipe. Run command1 and command2 in parallel. command1’s standard output is hooked up to command2’s standard input. Thus, command2 reads what command1 wrote. The exit status of the pipeline is the exit status of command2.
- command > file. stdout redirection. Run command with its standard output writing to file.
- command < file. stdin redirection. Run command with its standard input reading from file. The file is truncated before the command is run.
- command 2> file. stderr redirection. Run command with its standard error writing to file.
Features of normal shells, but not
- var=value. Sets a variable to a value.
- $var. Variable reference. Replaced with the variable’s value. There are several special variables; for instance,
$?expands to the numeric exit status of the most recently executed foreground pipeline, and
$$expands to the shell’s own process ID.
- command >> file. Run command with its standard output appending to file. The file is not truncated before the command is run.
- command 2>&1. Run command with its standard error redirected to go to the same place as standard output.
- command 1>&2. Run command with its standard output redirected to go to the same place as standard error. Thus,
echo Error 1>&2prints
Errorto standard error.
- (command1; command2). Parentheses group commands into a “subshell.” The entire subshell can have redirections, and can have its output put into a pipe.
- command1 $(command2). Command substitution. The shell runs command2, then passes its output as the first argument to command1.
Shell Exercises 1
- Print the contents of the files fork_1.c, fork_2.c, and fork_3.c in the
cs61-sections/s10directory, in order, using a single command line.
- Repeat #1, but store the result in a file called cs61.
- Do #2 again but produce a different command line.
- What is the exit status of true?
- What is the exit status of false?
- What is the exit status of curl http://ipinfo.io/ip?
- What is the exit status of curl cs61://ipinfo.io/ip?
- curl a URL and print Success if the download succeeds or Fail if the download fails.
- Repeat the above, but downloading the URL to a file called ip.
- Count the number of lines in the file words. Hint: use the wc utility.
- Print every unique line in the file words exactly once.
- Count the number of unique lines in the file words.
- Write a command that could help you discover whether a shell really executes the two sides of a pipeline in parallel. Describe the result if (1) the shell executed the left side to completion first (and buffered the output for the right side to read), (2) the shell executed the sides in parallel.
So how does the shell do all its magic? Let's talk about the syntax for
sh61 and some ways you might represent it in a data structure.
This, taken from the problem set, is a grammar representing command lines in
commandline ::= list | list ";" | list "&" list ::= conditional | list ";" conditional | list "&" conditional conditional ::= pipeline | conditional "&&" pipeline | conditional "||" pipeline pipeline ::= command | pipeline "|" command command ::= [word or redirection]... redirection ::= redirectionop filename redirectionop ::= "<" | ">" | "2>"
This is an example of a BNF Grammar. A BNF grammar gives recursive definitions for a few terms (the "words" of the grammar). The
::= indicates definition (i.e.,
commandline is defined to be
list | list ";" | list "&". On the definition side, the
| is a logical or. For example, in sh61's grammar, a
commandline is composed of a
list, or a
list followed by a semicolon, or a
list followed by an ampersand.
You may notice that in some of the later definitions, the term being defined is used in the definition. This recursive definition allows for lists or trees of terms to be chained together. Let's take the definition of
list for example:
list ::= conditional | list ";" conditional | list "&" conditional
This reads "a list is a conditional, or a list followed by a semicolon and then a conditional, or a list followed by an ampersand and then a conditional." But this means that the list is just a bunch of conditionals, linked by semicolons or ampersands! Notice that the other recursive definitions in sh61's grammar also follow this pattern. In other words:
- A list is a series of conditionals, concatenated by
- A conditional is a series of pipelines, concatenated by
- A pipeline is a series of commands, concatenated by
- A redirection is one of
2>, followed by a filename.
What about the definition of command?
[word or redirection]... seems a bit vague; in this case, you should use your intuition. When you type a command into the terminal, it's just a series of words representing the program name and its arguments, possibly followed by some number of redirection commands.
Representing a Parsed Command Line
We now consider two ways to represent our parsed data in a data structure appropriate for our shell: a tree representation and a list representation. Students often are biased toward the tree representation, which precisely represents the structure of the grammar, but the list representation is in some ways easier to handle! The tradeoff is simplicity vs. execution time: the list representation requires more work to answer certain important questions about commands. (But command lines are small enough in practice that the extra work doesn’t matter.)
Again, the overall structure of a line is:
- A command is composed of words and redirections.
- A pipeline is composed of commands joined by
- A conditional is composed of pipelines joined by
- A list is composed of conditionals joined by
&(and the last command in the list might or might not end with
The following tree-formatted data structure precisely represents this grammar structure.
- A command contains its executable, arguments, and any redirections.
- A pipeline is a linked list of commands. Since commands in a pipeline are always joined by
|, the linked list contains all the structure we need.
- A conditional is a linked list of pipelines. But since adjacent conditionals can be connected by either
||, we need to store whether the link is
- A list is a linked list of conditionals, each flagged as foreground (i.e., joined by
;) or background (i.e., joined by
&). Note that while the conditional linktype doesn’t matter for the last pipeline in a conditional, the background flag does matter for the last conditional in a list, since the last conditional in a list might be run in the foreground or background.
For instance, consider this command line:
a & b | c && d ; e | f &
which comprises three conditionals, four pipelines, and six commands. (Exercise: What are the four pipelines? What are the three conditionals?)
In tree structure, that would look like this. (The list—the “linked list of conditionals”—is the whole first line.)
(We’re not showing the multiple words and redirections that would be part of each command.)
- Sketch a set of C structures corresponding to this design.
- How can one traverse your C structures to decide which commands to run at which times?
- How can one traverse them to determine which conditionals should be executed in the foreground or the background?
- Given a command in a pipeline, how can one examine the command C structure to determine whether the command is on the left-hand side of a pipe?
- How about the right-hand side of a pipe?
Flat Linked List
Alternatively, we can create a single linked list of all of the commands. In this case, we also store the connecting operator (one of
||). For our sample command line
a & b | c && d ; e | f &
that might look like this:
We now consider how we would parse and traverse this data structure.
- Write a set of C structures corresponding to this design.
- Given a command structure corresponding to the first command in a conditional, how can shell code determine whether the conditional should be executed in the foreground or the background?
- Given a command structure in a pipeline, how can shell code determine whether the command is on the left-hand side of a pipe?
- Given a command structure in a pipeline, how can shell code determine whether the command is on the right-hand side of a pipe?
- Sketch out code for parsing a command line into these C structures.