The UNIX shell 🐚
Please get the latest section code from the
cs61-sections
repo.
Open up a new terminal window on your computer. Congratulations, you’ve opened up a UNIX Shell (unless you opened Command Prompt)! So far, you’ve used this wonderful utility to run around your filesystem, possibly open some code, build your pset, and/or play around with Git. Isn’t it magical? Whoever built such a tool must’ve been true geniuses.
Fortunately, you too are a genius, because you will be implementing a (slightly simpler) shell for pset 5.
Digging deeper into The Inner Life of a Cell Shell
If you looked at the above video (it’s not too relevant to CS 61), you might now be in awe of the fluid mosaic, phospholipid bilayer, etc. etc.—we certainly are. We also think the UNIX shell is an interesting work of art. For example, the shell can:
- Interpret commands, e.g.
ls
, - Run some programs, e.g.
sudo apt-get install sl -y && sl
, and - It can even fork bomb your computer! 😈
Deep down, though, the shell is just doing the same thing in all of these cases: it’s parsing the input you type into the shell, forking a subprocess to run the command, waiting for that process to finish running, and subsequently execute the next part of the command. Let’s start off by playing around with the shell to familiarize yourself with it.
In pset 5, you will be given the rudimentary tools to parse the command line input and some shell code (🥁🥁) to get you started. You will:
- Use
fork()
to create subprocesses, useexec()
to load the program’s code, and examine the output ofexit()
to determine the “success” of the execution. - Interpret command line syntax to emulate more complex behavior, e.g.
a && b
will only runb
ifa
is successful. - Use pipes to direct input to and output from a program.
Shell exercises
Refer to the shell descriptions at the bottom of these section notes to complete these exercises. We’ll start off on one to warm up, and move on to more throughout the section.
Exercise: Use only shell commands to print the contents of
eve_returns.cc
and evil.sh
into the console, in that order.
cat eve_returns.cc evil.sh
Process I/O and redirections
As you saw in pset 3, I/O is very important! It’s so important, in fact, that
reading from and writing to memory is a kernel-only power, and it’s invoked by
processes through a syscall
. You may also have noticed that new files had file
descriptors starting at 3—what happened to 0, 1, and 2? By convention, file
descriptor 0 is STDIN
, 1 is STDOUT
, and 2 is STDERR
.
How, then, are STDIN
, STDOUT
, and STDERR
created, handled, and used?
Remember that they’re simply file descriptors, so all the strace
calls you
looked at in pset 3, including the read()
and write()
syscalls, can be used
for this purpose. With that out of the way... how can we play around with
these three file descriptors?
Redirections allow us to change the behavior of a program’s STDIN
and
STDOUT
—in particular, their sources. Try running these three commands:
$ echo hello kitty
$ echo hello kitty > cs61.txt
$ cat < cs61.txt
The first echo will print
hello kitty
to the terminal.
Exercise: What effects does the second echo
have?
Nothing is printed to the terminal, but a file called
cs61.txt
now contains the string,hello kitty
.
Exercise: What effects does the cat
have?
The string,
hello kitty
is printed to the terminal, despite nothing in the shell command containing the string,hello kitty
! Strangely enough, this is also the exact string that we had printed tocs61.txt
earlier. Coincidence? I think not.
As it turns out, program > sometxt
will cause program
to print its standard
output into a file called sometxt
(creating the file if it does not exist, and
clobbering any of its previous contents if it does exist), and
program < sometxt
will cause program
to read from sometxt
as its standard
input. How convenient!
More kinds of redirection
>>
is like>
, but will append to the file if it already exists.
<<
is not a thing.
2> sometxt
means “redirect file descriptor 2 to sometxt for writing.” Note that there is no space between2
and>
.
2>&1
means “redirect file descriptor 2 to file descriptor 1.” This causes all ofSTDERR
to be printed directly toSTDOUT
instead. Again, note that there is no space between>
and&
.
Exercise: Repeat Shell Exercise #1, but store the result in a file called
cs61.txt
.
cat eve_returns.cc evil.sh > cs61.txt
Exercise: Do it again, but with a different sequence of commands.
cat eve_returns.cc > cs61.txt cat evil.sh >> cs61.txt
Or
cat eve_returns.cc > cs61.txt ; cat evil.sh >> cs61.txt
There’s an infinite number of possibilities.
Pipes
Among the redirections briefly mentioned above is a funny-looking one: 2>&1
,
which redirects file descriptor 2 into file descriptor 1. File descriptors are
actually rather arbitrary... what if we could redirect one process’s file
descriptor into a different process’s file descriptor? More importantly, what
if we could redirect command1
’s STDOUT
into command2
’s STDIN
?
Of course, as with all loaded, leading questions, this question is a bit disingenuous—there’s a reason why this section is titled “Pipes.” Let’s do a quick investigation of these magical pipes.
Consider this line of shell commands:
ls | head -n 2
Exercise: What does this do?
The same thing we were doing earlier with two redirections: finding the first two items listed in the output of
ls
! Isn’t this so much nicer than redirecting into and out from a file?
Pipes take information from one process and spit it into
another. Aside from being more robust against typos, pipes also have another
distinct advantage: in command1 | command2
, command2
can start working on
the output of command1
before command1
has fully completed!
Pipes can also be chained together more than one at a time:
ls | head -n 2 | sed 's/[a-z]/?/g'
will print out the first two items listed in ls
, but with all their lowercase
letters replaced by question marks.
Question: How many pipes can be chained together in one line?
Technically, infinite. Practically, this is constrained mostly by memory and CPU... but assuming infinite RAM and infinite CPU, you could in theory have as many pipes chained together as you’d like.
We call a sequence of processes piped to each other a pipeline.
Use only shell commands to complete the following tasks. Refer to the the shell descrptions to help you find and use the proper shell utilities.
Exercise: Count the number of lines in the file words
(and print the count).
wc -l words
Exercise: Print every unique line in the file words
exactly once.
sort -u words
Exercise: Count the number of unique lines in the file words
(and print the count).
sort -u words | wc -l
Exercise: Write a command that could help you discover whether a shell really executes the two sides of a pipeline in parallel. Describe the result if (1) the shell executed the left side to completion first (and buffered the output for the right side to read), (2) the shell executed the sides in parallel.
Any number of solutions, but
sleep 10 | echo foo
is a classic one. If the pipeline is executed left-to-right with intermediate buffering, we’d see nothing printed for 10 seconds while thesleep
counted down, then we might seefoo
printed afterward. If the pipeline is indeed executed in parallel, then we’d seefoo
printed immediately instead.
Putting commands together
Pipes aren’t the only way to put commands together! Shells also allow us to run commands in sequences called command lists. And some of those ways let us detect and respond to failures.
First, the ;
operator says “run this, then run that.” The semicolon waits for the left-hand
command to complete. Then it executes the right-hand command.
Exercise: Write a single shell command list that repeats Shell Exercise #1, but runs two separate commands in a single command list.
cat eve_returns.cc > cs61.txt ; cat evil.sh >> cs61.txt
Now, shell commands, and processes in general, can either succeed or fail. A process indicates
its status when it exits by passing an integer (between 0 and 255) to the exit()
library function. By convention,
the zero status indicates success, and all non-zero statuses indicate failure. That is, successful
processes are all alike; every failing process fails in its own way.
To test different statuses, we often use the true
and false
programs. true
just exits with
successful status: exit(0)
. false
just exits with the unsuccessful status 1: exit(1)
.
Exercise: Write C++ code that compiles to a program equivalent to true
.
#include <stdlib.h> int main() { exit(0); }
Exercise: Write C++ code that compiles to a program equivalent to false
, but don’t call exit
explicitly.
int main() { return 1; }
In well-written programs, errors that cause the program to exit should cause a failing exit status. For instance,
cat
will fail if it is passed a nonexistent file.
Exit status is commonly checked by the shell &&
and ||
operators. These are like ;
—they execute the left-hand
command first, then turn to the right-hand command—but unlike ;
, they check the status of the left-hand
command, and only run the right-hand command if the left-hand command has an expected status.
Exercise: Use true
, false
, and echo
to figure out what statuses &&
and ||
expect.
The
&&
operator executes the right-hand command if the left-hand command succeeds. The||
operator executes the right-hand command if the left-hand command fails. We might figure that out with commands like this:true && echo left side succeeded 1 false && echo left side failed 2 true || echo left side succeeded 3 false || echo left side failed 4
That will print:
left side succeeded 1 left side failed 4
In command lists with multiple conditionals, the conditionals run left to right.
Exercise: What status does false && echo foo
return? Use more conditionals to figure it out.
It returns the status of
false
, which is 1.false && echo foo || echo bar
andfalse && echo foo && echo quux
would help you figure out thatfalse && echo foo
has a failing exit status; you’d have to use other features, such as$?
, to figure out the exact status.
Exercise: Run and print the numerical exit status of true
to the shell. (If you’re working
through on your own, read manual pages here, or look below at our documentation.)
true; echo $?
More exploration
In lecture, we discussed some of the shell utilities—if you’d like to discover more and have more practice with them, here are some exercises to help you along your way.
Use only shell commands to complete the following tasks.
Exercise: Run and print the exit status of false
to the shell.
false; echo $?
Exercise: Print the exit status of curl http://ipinfo.io/ip
to the shell.
curl http://ipinfo.io/ip; echo $?
Exercise: Print the exit status of curl cs61://ipinfo.io/ip
to the shell.
curl cs61://ipinfo.io/ip; echo $?
Exercise: curl
a URL and print Success
if the download succeeds, or Failure
if
the download fails.
curl lcdf.org && echo "Success" || echo "Failure"
Exercise: Do it again, but store the downloaded result in a file called ip
.
curl lcdf.org > ip && echo "Success" || echo "Failure"
Representing a parsed command line
In sh61
, commands are parsed into tokens. The sh61
grammar
discusses more about how sh61
parses commands and how to understand that
process some more. For now, let’s take a look at what we can do with a parsed
command line.
There are two common ways to represent our parsed command line in a data structure appropriate for our shell, a tree representation and a list representation. Students often are biased toward the tree representation, which precisely represents the structure of the grammar, but the list representation can be easier to handle! The tradeoff is simplicity vs. execution time: the list representation requires more work to answer some questions about commands, but command lines are small enough in practice that the extra work doesn’t matter.
The overall structure of a line, as described by the grammar, is:
- A
command
is composed ofword
s andredirection
s. - A
pipeline
is composed ofcommand
s joined by|
. - A
conditional
is composed ofpipeline
s joined by&&
or||
. - A
list
is composed ofconditional
s joined by;
or&
(and the lastcommand
in thelist
might or might not end with&
).
Tree representation
The following tree-formatted data structure precisely represents this grammar structure.
- A
command
contains its executable, arguments, and any redirections. - A
pipeline
is a linked list ofcommand
s, all of which are joined by|
. - A
conditional
is a linked list ofpipelines
. But since adjacentconditional
s can be connected by either&&
or||
, we need to store whether the link is&&
or||
. - A
list
is a linked list ofconditional
s, each flagged as foreground (i.e., joined by;
) or background (i.e., joined by&
). Note that while theconditional
linktype doesn’t matter for the last pipeline in aconditional
, the background flag does matter for the lastconditional
in alist
, since the lastconditional
in alist
might be run in the foreground or background.
For instance, consider this tree structure for a command line (the word
s and
redirection
s that are there, but not being shown for the sake of simplifying
the image somewhat):
Exercise: Assuming that a
, b
, c
, d
, e
, and f
are distinct
commands, what could be a command line that would produce this tree structure?
a & b | c && d ; e | f &
Exercise: Sketch a set of C++ structures corresponding to this design.
Here’s one:
struct command { std::vector<std::string> args; command* next_in_pipeline; // initialize members to nullptr by default (good practice) command() { next_in_pipeline = nullptr; } }; struct pipeline { command* child; pipeline* next_in_conditional; bool is_or; pipeline() { child = nullptr; next_in_conditional = nullptr; } }; struct conditional { pipeline* child; conditional* next_in_list; bool is_background; conditional() { child = nullptr; next_in_list = nullptr; } };
There are many other solutions. Logically, we need a struct for conditional chains that indicates whether the chain should be run in the foreground or background; a struct for pipelines; and a struct for commands. Each struct needs to hold the necessary pointers to form a linked list, as well as a pointer to the first item of its sublist (i.e., the overall command line struct needs a pointer to the head of the lists linked list, each list struct needs to hold a pointer to the head of its conditionals linked list, etc).
Exercise: Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.
For our solution, that’s as simple as
c->is_background
.
Exercise: Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?
For our solution, a command
c
is on the left-hand side of a pipe iffc->next_in_pipeline != nullptr
.
Exercise (left to the reader): Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?
Flat linked list representation
Alternatively, we can create a single linked list of all of the commands. In
this case, we also store the connecting operator (one of &
;
|
&&
or
||
). For our example command line above, the flat linked list structure might
look something like:
Exercise: Write a set of C++ structures corresponding to this design.
Here’s an example:
struct command { std::vector<std::string> args; command* next; int op; // Operator following this command. Equals one of the TOKEN_ constants; // always TOKEN_SEQUENCE or TOKEN_BACKGROUND for last in list. // Initialize `next` and `op` when a `command` is allocated. command() { next = nullptr; op = TOKEN_SEQUENCE; } };
Much simpler!
Another good one, with fewer pointers and thus fewer opportunities for memory mistakes, but harder to traverse (you need C++ iterators):
struct command { std::vector<std::string> args; int op; // as above }; using command_list = std::list<command>;
Exercise: Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.
bool chain_in_background(command* c) { while (c->op != TOKEN_SEQUENCE && c->op != TOKEN_BACKGROUND) { c = c->next; } return c->op != TOKEN_BACKGROUND; }
Things are slightly more complicated with a flat linked list. We are still executing commands sequentially, but some subsequences of commands may need to be executed in the foreground or the background. During parsing, we can skip ahead in the command line to find the end of our current list (i.e., our current chain of conditionals).
Exercise (left to the reader): Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?
Exercise (left to the reader): Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?
Exercise (left to the reader): Sketch out code for parsing a command line into these C structures.
Documentation and References
Feel free to refer to these during section and as you work on pset 5.
sh61
shell subset
For pset 5, you will be expected to implement a subset of common UNIX shell features. Try a few of them now, and start thinking about how you might consider implementing them!
Syntax | Description |
---|---|
command1 ; command2 |
Sequencing. Run command1 , and when it finishes, run command2 . |
command1 & command2 |
Backgrounding. Start command1 , but don’t wait for it to finish. Run command2 right away. |
command1 && command2 |
And conditional. Run command1 ; if it finishes by exiting with status 0 , run command2 . |
command1 || command2 |
Or conditional. Run command1 ; if it finishes by exiting with a status ≠ 0 , then run command2 . |
command1 | command2 |
Pipe. Run command1 and command2 in parallel. command1 ’s standard output is hooked up to command2 ’s standard input. Thus, command2 reads what command1 wrote. The exit status of the pipeline is the exit status of command2 . |
command > file |
STDOUT redirection. Run command with its standard output writing to file. The file is truncated before the command is run. |
command < file |
STDIN redirection. Run command with its standard input reading from file. |
command 2> file |
STDERR redirection. Run command with its standard error writing to file. The file is truncated before the command is run. |
Useful shell utilities
Here are some commonly-installed, commonly-used programs that you can call
directly from the shell. Documentation for these programs can be accessed via
the man
page, e.g. man cat
, or often also through a help switch, e.g.
cat --help
.
Shell Program | Description |
---|---|
cat |
Write standard input to standard output. |
wc |
Count lines, words, and characters in standard input, write result to standard output. |
head -n N |
Print first N lines of standard input. |
tail -n N |
Print last N lines of standard input. |
echo ARG1 ARG2... |
Print arguments. |
printf FORMAT ARG... |
Print arguments with printf-style formatting. |
true |
Always succeed (exit with status 0 ). |
false |
Always fail (exit with status 1 ). |
sort |
Sort lines in input. |
uniq |
Drop duplicate lines in input (or print only duplicate lines). |
tr |
Change characters; e.g., tr a-z A-Z makes all letters uppercase. |
ps |
List processes. |
curl URL |
Download URL and write result to standard output. |
sleep N |
Pause for N seconds, then exit with status 0 . |
cut |
Cut selected portions of each line of a file. |
Non-sh61
shell features
Here are some commonly-used features of a more fully-implemented shell (like the
one you generally use!), but won’t be part of the sh61
assignment. Take a
look, some of them are incredibly powerful!
Syntax | Description |
---|---|
var=value |
Sets a variable to a value. |
$var |
Variable reference. Replaced with the variable’s value. There are several special variables; for instance: |
$? |
The numeric exit status of the most recently executed foreground pipeline. |
$$ |
The shell’s own process ID. |
command >> file |
Run command with its standard output appending to file . The file is not truncated before the command is run. |
command 2>&1 |
Run command with its standard error redirected to go to the same place as standard output. |
command 1>&2 |
Run command with its standard output redirected to go to the same place as standard error. Thus, echo Error 1>&2 prints Error to standard error. |
(command1; command2) |
Parentheses group commands into a “subshell.” The entire subshell can have redirections, and can have its output put into a pipe. |
command1 $(command2) |
Command substitution. The shell runs command2 , then passes its output as the first argument to command1 . |
sh61
grammar
This is the BNF grammar
for sh61
’s parser, as found in pset 5:
commandline ::= list | list ";" | list "&" list ::= conditional | list ";" conditional | list "&" conditional conditional ::= pipeline | conditional "&&" pipeline | conditional "||" pipeline pipeline ::= command | pipeline "|" command command ::= [word or redirection]... redirection ::= redirectionop filename redirectionop ::= "<" | ">" | "2>"
A BNF grammar gives recursive definitions for a few terms (the “words” of the
grammar). The ::=
indicates definition (i.e., commandline
is defined to be
list | list ";" | list "&"
. On the definition side, the |
is a logical or.
For example, in sh61
’s grammar, a commandline
is composed of a list
, or a
list
followed by a semicolon, or a list
followed by an ampersand.
You may notice that in some of the later definitions, the term being defined is
used in the definition. This recursive definition allows for lists or trees of
terms to be chained together. Let’s take the definition of list
for example:
list ::= conditional | list ";" conditional | list "&" conditional
This reads “a list
is a conditional
, or a list
followed by a semicolon and
then a conditional
, or a list
followed by an ampersand
and then a
conditional
.” But this means that the list
is just a bunch of
conditional
s, linked by semicolons or ampersands! Notice that the other
recursive definitions in sh61
’s grammar also follow this pattern. In other
words:
- A
list
is a series ofconditionals
, concatenated by;
or&
. - A
conditional
is a series ofpipelines
, concatenated by&&
or||
. - A
pipeline
is a series ofcommand
s, concatenated by|
. - A
redirection
is one of<
,>
, or2>
, followed by afilename
.
What about the definition of command
? [word or redirection]
... seems a bit
vague; in this case, you should use your intuition. When you type a command into
the terminal, it’s just a series of words representing the program name and its
arguments, possibly followed by some number of redirection commands.