Introduction

Today's section:

Gives an overview of I/O streams
Provides examples of common shell utilities and shell syntax
Explores the sh61 syntax and some structures for parsing it

Pull the latest version of the sections repository by entering in the terminal

git pull

The cs61-section/s10 directory contains a few files we'll use throughout the section notes and some sample structures we might use to parse sh61 command lines.

Process I/O

Basics

The three I/O streams that are given by default to every process are stdin, stdout, and stderr.

stdin is the default input stream for the program. When your C program expects input from the console or wants to read something without opening a file, it usually reads from stdin.
stdout is the default output stream for the program. When your C program prints something to the console, it is writing to stdout.
stderr is the default output stream for errors. Why is this useful if we already have a stdout? Sometimes programs want to redirect stdout to another file or program. If we print errors to stdout, we won't notice until it's too late.

For example, if we run $ echo foo, the following diagram describes how the echo program interacts with stdin, stdout, and stderr:

Note that in this example, foo is a command-line argument, not read in from stdin.

Redirections

Redirections to redirect a program’s input and output from and to files of our choosing. For example, let's run $ echo foo > temp.txt. If you open up temp.txt, you'll find that it has a single line containing foo. The right angle bracket told our shell to redirect echo's stdout to temp.txt and rewrite it:

If instead we wanted to append to a file, we could use a double right angle bracket: $ echo bar >> temp.txt. If you run that command, you'll find that temp.txt now has a second line, saying bar.

We can also redirect the contents of a file to a program's stdin. Let's run $ wc -w < temp.txt. wc is a common Unix program used for counting words; the -w command-line argument tells it to count the number of words. wc usually takes input directly from the console, but here we have redirected stdin to a file:

Pipes

But what if we wanted to direct the output of one file directly to the input of another, without sending the output to a file first? We can do this using an operating system abstraction called pipes. Pipes are a way of linking an I/O stream from one program directly to the I/O stream of another program, in real time. Each pipe has a read end and a write end; characters written to the write end can be immediately read from the read end. Each program interacts with its end of a pipe just like any other file descriptor; it can call read, write, close, etc. When a program calls read on the read end of a pipe, the read call blocks until something is written to the write end of the pipe, at which point the read call returns.

Let's run $ echo "foo bar baz" | wc -w. You should find 3 printed out at the terminal. What's going on here? We have used the | character to tell the shell to create a pipe between the echo and wc programs, and to redirect echo's stdout to the write end of the pipe, and wc's stdin to the read end of the pipe:

In particular, the shell has used a system call named dup2 to actually set the file descriptor associated with stdout in echo to the write end of the pipe, and to set the file descriptor associated with stdin in wc to the read end of the pipe. echo and wc have no idea that they aren't writing to and reading from the console!

More about the Shell

Useful Shell Utilities

Here's some common shell utilities that you may find useful in your daily life and for testing your shell:

Shell Program	Description
`cat`	Write standard input to standard output.
`wc`	Count lines, words, and characters in standard input, write result to standard output.
`head -n N`	Print first N lines of standard input.
`tail -n N`	Print last N lines of standard input.
`echo ARG1 ARG2...`	Print arguments.
`printf FORMAT ARG...`	Print arguments with printf-style formatting.
`true`	Always succeed (exit with status 0).
`false`	Always fail (exit with status 1).
`sort`	Sort lines in input.
`uniq`	Drop duplicate lines in input (or print only duplicate lines).
`tr`	Change characters; e.g., `tr a-z A-Z` makes all letters uppercase.
`ps`	List processes.
`curl URL`	Download URL and write result to standard output.
`sleep N`	Pause for N seconds, then exit with status 0.
`cut`	Cut selected portions of each line of a file.

Common Shell Syntax

Features of normal shells and sh61:

command1 ; command2. Sequencing. Run command1, and when it finishes, run command2.
command1 & command2. Backgrounding. Start command1, but don't wait for it to finish. Run command2 right away.
command1 && command2. On success. Run command1. If it finishes by exiting with status 0, run command2.
command1 || command2. On failure. Run command1. If it finishes by not exiting (e.g., with a segfault), or by exiting with a status ≠ 0, then run command2.
command1 | command2. Pipe. Run command1 and command2 in parallel. command1’s standard output is hooked up to command2’s standard input. Thus, command2 reads what command1 wrote. The exit status of the pipeline is the exit status of command2.
command > file. stdout redirection. Run command with its standard output writing to file.
command < file. stdin redirection. Run command with its standard input reading from file. The file is truncated before the command is run.
command 2> file. stderr redirection. Run command with its standard error writing to file.

Features of normal shells, but not sh61:

var=value. Sets a variable to a value.
$var. Variable reference. Replaced with the variable’s value. There are several special variables; for instance, \$? expands to the numeric exit status of the most recently executed foreground pipeline, and \$\$ expands to the shell’s own process ID.
command >> file. Run command with its standard output appending to file. The file is not truncated before the command is run.
command 2>&1. Run command with its standard error redirected to go to the same place as standard output.
command 1>&2. Run command with its standard output redirected to go to the same place as standard error. Thus, echo Error 1\>&2 prints Error to standard error.
(command1; command2). Parentheses group commands into a “subshell.” The entire subshell can have redirections, and can have its output put into a pipe.
command1 $(command2). Command substitution. The shell runs command2, then passes its output as the first argument to command1.

Precedence:

| (highest)
&&, ||
;, & (lowest)

Shell Exercises 1

Print the contents of the files fork_1.c, fork_2.c, and fork_3.c in the cs61-sections/s10 directory, in order, using a single command line.
Repeat #1, but store the result in a file called cs61.
Do #2 again but produce a different command line.
What is the exit status of true?
What is the exit status of false?
What is the exit status of curl http://ipinfo.io/ip?
What is the exit status of curl cs61://ipinfo.io/ip?
curl a URL and print Success if the download succeeds or Fail if the download fails.
Repeat the above, but downloading the URL to a file called ip.
Count the number of lines in the file words. Hint: use the wc utility.
Print every unique line in the file words exactly once.
Count the number of unique lines in the file words.
Write a command that could help you discover whether a shell really executes the two sides of a pipeline in parallel. Describe the result if (1) the shell executed the left side to completion first (and buffered the output for the right side to read), (2) the shell executed the sides in parallel.

Solutions here.

Exercise: Parsing

So how does the shell do all its magic? Let's talk about the syntax for sh61 and some ways you might represent it in a data structure.

sh61 Grammar

This, taken from the problem set, is a grammar representing command lines in sh61:

commandline ::= list
          |  list ";"
          |  list "&"

list     ::=  conditional
          |   list ";" conditional
          |   list "&" conditional

conditional ::=  pipeline
          |   conditional "&&" pipeline
          |   conditional "||" pipeline

pipeline ::=  command
          |   pipeline "|" command

command  ::=  [word or redirection]...

redirection  ::=  redirectionop filename
redirectionop  ::=  "<"  |  ">"  |  "2>"

This is an example of a BNF Grammar. A BNF grammar gives recursive definitions for a few terms (the "words" of the grammar). The ::= indicates definition (i.e., commandline is defined to be list | list ";" | list "&". On the definition side, the | is a logical or. For example, in sh61's grammar, a commandline is composed of a list, or a list followed by a semicolon, or a list followed by an ampersand.

You may notice that in some of the later definitions, the term being defined is used in the definition. This recursive definition allows for lists or trees of terms to be chained together. Let's take the definition of list for example:

list     ::=  conditional
          |   list ";" conditional
          |   list "&" conditional

This reads "a list is a conditional, or a list followed by a semicolon and then a conditional, or a list followed by an ampersand and then a conditional." But this means that the list is just a bunch of conditionals, linked by semicolons or ampersands! Notice that the other recursive definitions in sh61's grammar also follow this pattern. In other words:

A list is a series of conditionals, concatenated by ; or &.
A conditional is a series of pipelines, concatenated by && or ||.
A pipeline is a series of commands, concatenated by |.
A redirection is one of <, >, or 2>, followed by a filename.

What about the definition of command? [word or redirection]... seems a bit vague; in this case, you should use your intuition. When you type a command into the terminal, it's just a series of words representing the program name and its arguments, possibly followed by some number of redirection commands.

Representing a Parsed Command Line

We now consider two ways to represent our parsed data in a data structure appropriate for our shell: a tree representation and a list representation. Students often are biased toward the tree representation, which precisely represents the structure of the grammar, but the list representation is in some ways easier to handle! The tradeoff is simplicity vs. execution time: the list representation requires more work to answer certain important questions about commands. (But command lines are small enough in practice that the extra work doesn’t matter.)

Again, the overall structure of a line is:

A command is composed of words and redirections.
A pipeline is composed of commands joined by \|.
A conditional is composed of pipelines joined by && or \|\|
A list is composed of conditionals joined by ; or & (and the last command in the list might or might not end with &).

Tree representation

The following tree-formatted data structure precisely represents this grammar structure.

A command contains its executable, arguments, and any redirections.
A pipeline is a linked list of commands. Since commands in a pipeline are always joined by \|, the linked list contains all the structure we need.
A conditional is a linked list of pipelines. But since adjacent conditionals can be connected by either && or \|\|, we need to store whether the link is && or \|\|.
A list is a linked list of conditionals, each flagged as foreground (i.e., joined by ;) or background (i.e., joined by &). Note that while the conditional linktype doesn’t matter for the last pipeline in a conditional, the background flag does matter for the last conditional in a list, since the last conditional in a list might be run in the foreground or background.

For instance, consider this command line:

a & b | c && d ; e | f &

which comprises three conditionals, four pipelines, and six commands. (Exercise: What are the four pipelines? What are the three conditionals?)

In tree structure, that would look like this. (The list—the “linked list of conditionals”—is the whole first line.)

(We’re not showing the multiple words and redirections that would be part of each command.)

Questions

Sketch a set of C structures corresponding to this design.
How can one traverse your C structures to decide which commands to run at which times?
How can one traverse them to determine which conditionals should be executed in the foreground or the background?
Given a command in a pipeline, how can one examine the command C structure to determine whether the command is on the left-hand side of a pipe?
How about the right-hand side of a pipe?

Flat Linked List

Alternatively, we can create a single linked list of all of the commands. In this case, we also store the connecting operator (one of & ; \| && or \|\|). For our sample command line

a & b | c && d ; e | f &

that might look like this:

We now consider how we would parse and traverse this data structure.

Questions

Write a set of C structures corresponding to this design.
Given a command structure corresponding to the first command in a conditional, how can shell code determine whether the conditional should be executed in the foreground or the background?
Given a command structure in a pipeline, how can shell code determine whether the command is on the left-hand side of a pipe?
Given a command structure in a pipeline, how can shell code determine whether the command is on the right-hand side of a pipe?
Sketch out code for parsing a command line into these C structures.

Sample solutions here.