Section 5

The UNIX shell 🐚

Please get the latest section code from the cs61-sections repo.

Open up a new terminal window on your computer. Congratulations, you’ve opened up a UNIX Shell (unless you opened Command Prompt)! So far, you’ve used this wonderful utility to run around your filesystem, possibly open some code, build your pset, and/or play around with Git. Isn’t it magical? Whoever built such a tool must’ve been true geniuses.

Fortunately, you too are a genius, because you will be implementing a (slightly simpler) shell for pset 5.

Digging deeper into The Inner Life of a Cell Shell

If you looked at the above video (it’s not too relevant to CS 61), you might now be in awe of the fluid mosaic, phospholipid bilayer, etc. etc.—we certainly are. We also think the UNIX shell is an interesting work of art. For example, the shell can:

Interpret commands, e.g. ls,
Run some programs, e.g. sudo apt-get install sl -y && sl, and
It can even fork bomb your computer! 😈

Deep down, though, the shell is just doing the same thing in all of these cases: it’s parsing the input you type into the shell, forking a subprocess to run the command, waiting for that process to finish running, and subsequently execute the next part of the command. Let’s start off by playing around with the shell to familiarize yourself with it.

In pset 5, you will be given the rudimentary tools to parse the command line input and some shell code (🥁🥁) to get you started. You will:

Use fork() to create subprocesses, use exec() to load the program’s code, and examine the output of exit() to determine the “success” of the execution.
Interpret command line syntax to emulate more complex behavior, e.g. a && b will only run b if a is successful.
Use pipes to direct input to and output from a program.

Shell exercises

Refer to the shell descriptions at the bottom of these section notes to complete these exercises. We’ll start off on one to warm up, and move on to more throughout the section.

Exercise: Use only shell commands to print the contents of eve_returns.cc and evil.sh into the console, in that order.

Process I/O and redirections

As you saw in pset 3, I/O is very important! It’s so important, in fact, that reading from and writing to memory is a kernel-only power, and it’s invoked by processes through a syscall. You may also have noticed that new files had file descriptors starting at 3—what happened to 0, 1, and 2? By convention, file descriptor 0 is STDIN, 1 is STDOUT, and 2 is STDERR.

How, then, are STDIN, STDOUT, and STDERR created, handled, and used? Remember that they’re simply file descriptors, so all the strace calls you looked at in pset 3, including the read() and write() syscalls, can be used for this purpose. With that out of the way... how can we play around with these three file descriptors?

Redirections allow us to change the behavior of a program’s STDIN and STDOUT—in particular, their sources. Try running these three commands:

$ echo hello kitty
$ echo hello kitty > cs61.txt
$ cat < cs61.txt

The first echo will print

hello kitty

to the terminal.

Exercise: What effects does the second echo have?

Exercise: What effects does the cat have?

As it turns out, program > sometxt will cause program to print its standard output into a file called sometxt (creating the file if it does not exist, and clobbering any of its previous contents if it does exist), and program < sometxt will cause program to read from sometxt as its standard input. How convenient!

More kinds of redirection

>> is like >, but will append to the file if it already exists.

<< is not a thing.

2> sometxt means “redirect file descriptor 2 to sometxt for writing.” Note that there is no space between 2 and >.

2>&1 means “redirect file descriptor 2 to file descriptor 1.” This causes all of STDERR to be printed directly to STDOUT instead. Again, note that there is no space between > and &.

Exercise: Repeat Shell Exercise #1, but store the result in a file called cs61.txt.

Exercise: Do it again, but with a different sequence of commands.

Pipes

Among the redirections briefly mentioned above is a funny-looking one: 2>&1, which redirects file descriptor 2 into file descriptor 1. File descriptors are actually rather arbitrary... what if we could redirect one process’s file descriptor into a different process’s file descriptor? More importantly, what if we could redirect command1’s STDOUT into command2’s STDIN?

Of course, as with all loaded, leading questions, this question is a bit disingenuous—there’s a reason why this section is titled “Pipes.” Let’s do a quick investigation of these magical pipes.

Consider this line of shell commands:

ls | head -n 2

Exercise: What does this do?

Pipes take information from one process and spit it into another. Aside from being more robust against typos, pipes also have another distinct advantage: in command1 | command2, command2 can start working on the output of command1 before command1 has fully completed!

Pipes can also be chained together more than one at a time:

ls | head -n 2 | sed 's/[a-z]/?/g'

will print out the first two items listed in ls, but with all their lowercase letters replaced by question marks.

Question: How many pipes can be chained together in one line?

We call a sequence of processes piped to each other a pipeline.

Use only shell commands to complete the following tasks. Refer to the the shell descrptions to help you find and use the proper shell utilities.

Exercise: Count the number of lines in the file words (and print the count).

Exercise: Print every unique line in the file words exactly once.

Exercise: Count the number of unique lines in the file words (and print the count).

Exercise: Write a command that could help you discover whether a shell really executes the two sides of a pipeline in parallel. Describe the result if (1) the shell executed the left side to completion first (and buffered the output for the right side to read), (2) the shell executed the sides in parallel.

Putting commands together

Pipes aren’t the only way to put commands together! Shells also allow us to run commands in sequences called command lists. And some of those ways let us detect and respond to failures.

First, the ; operator says “run this, then run that.” The semicolon waits for the left-hand command to complete. Then it executes the right-hand command.

Exercise: Write a single shell command list that repeats Shell Exercise #1, but runs two separate commands in a single command list.

Now, shell commands, and processes in general, can either succeed or fail. A process indicates its status when it exits by passing an integer (between 0 and 255) to the exit() library function. By convention, the zero status indicates success, and all non-zero statuses indicate failure. That is, successful processes are all alike; every failing process fails in its own way.

To test different statuses, we often use the true and false programs. true just exits with successful status: exit(0). false just exits with the unsuccessful status 1: exit(1).

Exercise: Write C++ code that compiles to a program equivalent to true.

Exercise: Write C++ code that compiles to a program equivalent to false, but don’t call exit explicitly.

In well-written programs, errors that cause the program to exit should cause a failing exit status. For instance, cat will fail if it is passed a nonexistent file.

Exit status is commonly checked by the shell && and || operators. These are like ;—they execute the left-hand command first, then turn to the right-hand command—but unlike ;, they check the status of the left-hand command, and only run the right-hand command if the left-hand command has an expected status.

Exercise: Use true, false, and echo to figure out what statuses && and || expect.

The && operator executes the right-hand command if the left-hand command succeeds. The || operator executes the right-hand command if the left-hand command fails. We might figure that out with commands like this:
true && echo left side succeeded 1
false && echo left side failed 2
true || echo left side succeeded 3
false || echo left side failed 4
That will print:
left side succeeded 1
left side failed 4

In command lists with multiple conditionals, the conditionals run left to right.

Exercise: What status does false && echo foo return? Use more conditionals to figure it out.

Exercise: Run and print the numerical exit status of true to the shell. (If you’re working through on your own, read manual pages here, or look below at our documentation.)

More exploration

In lecture, we discussed some of the shell utilities—if you’d like to discover more and have more practice with them, here are some exercises to help you along your way.

Use only shell commands to complete the following tasks.

Exercise: Run and print the exit status of false to the shell.

Exercise: Print the exit status of curl http://ipinfo.io/ip to the shell.

Exercise: Print the exit status of curl cs61://ipinfo.io/ip to the shell.

Exercise: curl a URL and print Success if the download succeeds, or Failure if the download fails.

Exercise: Do it again, but store the downloaded result in a file called ip.

Representing a parsed command line

In sh61, commands are parsed into tokens. The sh61 grammar discusses more about how sh61 parses commands and how to understand that process some more. For now, let’s take a look at what we can do with a parsed command line.

There are two common ways to represent our parsed command line in a data structure appropriate for our shell, a tree representation and a list representation. Students often are biased toward the tree representation, which precisely represents the structure of the grammar, but the list representation can be easier to handle! The tradeoff is simplicity vs. execution time: the list representation requires more work to answer some questions about commands, but command lines are small enough in practice that the extra work doesn’t matter.

The overall structure of a line, as described by the grammar, is:

A command is composed of words and redirections.
A pipeline is composed of commands joined by |.
A conditional is composed of pipelines joined by && or ||.
A list is composed of conditionals joined by ; or & (and the last command in the list might or might not end with &).

Tree representation

The following tree-formatted data structure precisely represents this grammar structure.

A command contains its executable, arguments, and any redirections.
A pipeline is a linked list of commands, all of which are joined by |.
A conditional is a linked list of pipelines. But since adjacent conditionals can be connected by either && or ||, we need to store whether the link is && or ||.
A list is a linked list of conditionals, each flagged as foreground (i.e., joined by ;) or background (i.e., joined by &). Note that while the conditional linktype doesn’t matter for the last pipeline in a conditional, the background flag does matter for the last conditional in a list, since the last conditional in a list might be run in the foreground or background.

For instance, consider this tree structure for a command line (the words and redirections that are there, but not being shown for the sake of simplifying the image somewhat):

Exercise: Assuming that a, b, c, d, e, and f are distinct commands, what could be a command line that would produce this tree structure?

Exercise: Sketch a set of C++ structures corresponding to this design.

Here’s one:
struct command {
    std::vector<std::string> args;
    command* next_in_pipeline;

    // initialize members to nullptr by default (good practice)
    command() {
        next_in_pipeline = nullptr;
    }
};
struct pipeline {
    command* child;
    pipeline* next_in_conditional;
    bool is_or;
    pipeline() {
        child = nullptr;
        next_in_conditional = nullptr;
    }
};
struct conditional {
    pipeline* child;
    conditional* next_in_list;
    bool is_background;
    conditional() {
        child = nullptr;
        next_in_list = nullptr;
    }
};
There are many other solutions. Logically, we need a struct for conditional chains that indicates whether the chain should be run in the foreground or background; a struct for pipelines; and a struct for commands. Each struct needs to hold the necessary pointers to form a linked list, as well as a pointer to the first item of its sublist (i.e., the overall command line struct needs a pointer to the head of the lists linked list, each list struct needs to hold a pointer to the head of its conditionals linked list, etc).

Exercise: Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.

Exercise: Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?

Exercise (left to the reader): Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?

Flat linked list representation

Alternatively, we can create a single linked list of all of the commands. In this case, we also store the connecting operator (one of & ; | && or ||). For our example command line above, the flat linked list structure might look something like:

Exercise: Write a set of C++ structures corresponding to this design.

Here’s an example:

struct command {
    std::vector<std::string> args;
    command* next;
    int op;         // Operator following this command. Equals one of the TOKEN_ constants;
                    // always TOKEN_SEQUENCE or TOKEN_BACKGROUND for last in list.

    // Initialize `next` and `op` when a `command` is allocated.
    command() {
        next = nullptr;
        op = TOKEN_SEQUENCE;
    }
};

Much simpler!

Another good one, with fewer pointers and thus fewer opportunities for memory mistakes, but harder to traverse (you need C++ iterators):

struct command {
    std::vector<std::string> args;
    int op;         // as above
};
using command_list = std::list<command>;

Exercise: Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.

bool chain_in_background(command* c) {
    while (c->op != TOKEN_SEQUENCE && c->op != TOKEN_BACKGROUND) {
        c = c->next;
    }
    return c->op != TOKEN_BACKGROUND;
}
Things are slightly more complicated with a flat linked list. We are still executing commands sequentially, but some subsequences of commands may need to be executed in the foreground or the background. During parsing, we can skip ahead in the command line to find the end of our current list (i.e., our current chain of conditionals).

Exercise (left to the reader): Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?

Exercise (left to the reader): Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?

Exercise (left to the reader): Sketch out code for parsing a command line into these C structures.

Documentation and References

Feel free to refer to these during section and as you work on pset 5.

`sh61` shell subset

For pset 5, you will be expected to implement a subset of common UNIX shell features. Try a few of them now, and start thinking about how you might consider implementing them!

Syntax	Description
`command1 ; command2`	Sequencing. Run `command1`, and when it finishes, run `command2`.
`command1 & command2`	Backgrounding. Start `command1`, but don’t wait for it to finish. Run `command2` right away.
`command1 && command2`	And conditional. Run `command1`; if it finishes by exiting with status `0`, run `command2`.
`command1 \|\| command2`	Or conditional. Run `command1`; if it finishes by exiting with a status ≠ `0`, then run `command2`.
`command1 \| command2`	Pipe. Run `command1` and `command2` in parallel. `command1`’s standard output is hooked up to `command2`’s standard input. Thus, `command2` reads what `command1` wrote. The exit status of the pipeline is the exit status of `command2`.
`command > file`	STDOUT redirection. Run `command` with its standard output writing to file. The file is truncated before the command is run.
`command < file`	STDIN redirection. Run `command` with its standard input reading from file.
`command 2> file`	STDERR redirection. Run `command` with its standard error writing to file. The file is truncated before the command is run.

Useful shell utilities

Here are some commonly-installed, commonly-used programs that you can call directly from the shell. Documentation for these programs can be accessed via the man page, e.g. man cat, or often also through a help switch, e.g. cat --help.

Shell Program	Description
`cat`	Write standard input to standard output.
`wc`	Count lines, words, and characters in standard input, write result to standard output.
`head -n N`	Print first `N` lines of standard input.
`tail -n N`	Print last `N` lines of standard input.
`echo ARG1 ARG2...`	Print arguments.
`printf FORMAT ARG...`	Print arguments with printf-style formatting.
`true`	Always succeed (exit with status `0`).
`false`	Always fail (exit with status `1`).
`sort`	Sort lines in input.
`uniq`	Drop duplicate lines in input (or print only duplicate lines).
`tr`	Change characters; e.g., `tr a-z A-Z` makes all letters uppercase.
`ps`	List processes.
`curl URL`	Download `URL` and write result to standard output.
`sleep N`	Pause for `N` seconds, then exit with status `0`.
`cut`	Cut selected portions of each line of a file.

Non-`sh61` shell features

Here are some commonly-used features of a more fully-implemented shell (like the one you generally use!), but won’t be part of the sh61 assignment. Take a look, some of them are incredibly powerful!

Syntax	Description
`var=value`	Sets a variable to a value.
`$var`	Variable reference. Replaced with the variable’s value. There are several special variables; for instance:
`$?`	The numeric exit status of the most recently executed foreground pipeline.
`$$`	The shell’s own process ID.
`command >> file`	Run `command` with its standard output appending to `file`. The file is not truncated before the command is run.
`command 2>&1`	Run `command` with its standard error redirected to go to the same place as standard output.
`command 1>&2`	Run `command` with its standard output redirected to go to the same place as standard error. Thus, `echo Error 1>&2` prints `Error` to standard error.
`(command1; command2)`	Parentheses group commands into a “subshell.” The entire subshell can have redirections, and can have its output put into a pipe.
`command1 $(command2)`	Command substitution. The shell runs `command2`, then passes its output as the first argument to `command1`.

`sh61` grammar

This is the BNF grammar for sh61’s parser, as found in pset 5:

commandline ::= list
          |  list ";"
          |  list "&"

list     ::=  conditional
          |   list ";" conditional
          |   list "&" conditional

conditional ::=  pipeline
          |   conditional "&&" pipeline
          |   conditional "||" pipeline

pipeline ::=  command
          |   pipeline "|" command

command  ::=  [word or redirection]...

redirection  ::=  redirectionop filename
redirectionop  ::=  "<"  |  ">"  |  "2>"

A BNF grammar gives recursive definitions for a few terms (the “words” of the grammar). The ::= indicates definition (i.e., commandline is defined to be list | list ";" | list "&". On the definition side, the | is a logical or. For example, in sh61’s grammar, a commandline is composed of a list, or a list followed by a semicolon, or a list followed by an ampersand.

You may notice that in some of the later definitions, the term being defined is used in the definition. This recursive definition allows for lists or trees of terms to be chained together. Let’s take the definition of list for example:

list     ::=  conditional
          |   list ";" conditional
          |   list "&" conditional

This reads “a list is a conditional, or a list followed by a semicolon and then a conditional, or a list followed by an ampersand and then a conditional.” But this means that the list is just a bunch of conditionals, linked by semicolons or ampersands! Notice that the other recursive definitions in sh61’s grammar also follow this pattern. In other words:

A list is a series of conditionals, concatenated by ; or &.
A conditional is a series of pipelines, concatenated by && or ||.
A pipeline is a series of commands, concatenated by |.
A redirection is one of <, >, or 2>, followed by a filename.

What about the definition of command? [word or redirection]... seems a bit vague; in this case, you should use your intuition. When you type a command into the terminal, it’s just a series of words representing the program name and its arguments, possibly followed by some number of redirection commands.