In this assignment, you use fork
, exec
, and several other interesting
system calls to build a command shell. You will implement simple commands,
background commands, conditional commands (&&
and ||
), redirections, and
pipes, as well as implement command interruption.
- Due Friday 11/19 11:59pm EST for college
- Due Saturday 11/20 11:59pm EST for DCE
Your shell implements a subset of the bash shell’s syntax, and is generally compatible with bash for the features they share. You may be interested in a tutorial for Unix shells.
Get the code
Start with the cs61-psets
Git repository you used for Problem Set
4 and run git pull handout main
to
merge our code, which is in the pset5
subdirectory, with
your previous work. If you have any “conflicts” from Problem Set 4, resolve
them before continuing further. Run git push
to save
your work back to GitHub.
You may also create a new cs61-psets
repository for this assignment. Don’t
forget to enter your repository URL on the grading server.
Shell grammar
There are two main phases to writing a shell. The shell must parse commands from a command line string, and then it must execute those commands.
We’ve already completed some of the parsing phase for you. The shell_parser
and shell_token_iterator
structures parse tokens from the command line, and
can differentiate between normal words like “echo
” and special control
operators like “;
” or “>
”. But currently, parse_line
treats every token
like a normal command word, so “echo foo ; echo bar | wc
” would print “foo ; echo bar | wc
”! Real shells allow users to build up interesting commands
from collections of commands, connected by control operators like &&
and
|
. Part of your task is to complete the parsing phase. You can complete it
all at once, but you don’t need to; see below for staging hints.
sh61 command lines follow this grammar. Each
input line is a “commandline
” defined as follows:
commandline ::= list | list ";" | list "&" list ::= conditional | list ";" conditional | list "&" conditional conditional ::= pipeline | conditional "&&" pipeline | conditional "||" pipeline pipeline ::= command | pipeline "|" command command ::= word | redirection | command word | command redirection redirection ::= redirectionop filename redirectionop ::= "<" | ">" | "2>"
This grammar says, for example, that the command “echo foo && echo bar & echo baz
” is parsed and executed as follows:
{ { echo foo } && { echo bar } } & { echo baz }
The &&
is “inside” the background command, so “echo foo && echo bar
” runs in the background and “echo baz
” runs in the
foreground.
A robust shell will detect errors in its input and handle them gracefully, but for this problem set, we promise that all the inputs we use in tests follow the grammar above.
About BNF grammars
Command execution
The main part of this assignment is actually implementing the shell.
If you’re confused about a shell feature and the tutorials and manuals
don’t help, run some experiments! The bash
shell (which is default on
Docker and on Mac OS X) is compatible with our shell.
See the commonly-used programs for testing ideas.
Run your shell by typing ./sh61
and entering commands at the prompt.
Exit your shell by typing Control-D at the prompt or by going to another
window and entering killall sh61
.
Part 1: Simple commands
(You can implement features in whatever order you prefer, but our suggested order may help structure your work.)
First, support simple commands like “echo foo
” by changing command::run
and run_list
. You’ll use the following system calls: fork
, execvp
,
and waitpid
(consult the man pages for details on how to call them and
what they do). Also read the function definitions in sh61.hh
.
Testing note. In this pset, you will need to change
parse_line
to create a possibly-complex in-memory structure corresponding to a command line, and changerun_list
and friends to execute that command line. You will likely have bugs in both these tasks, and you may mistake a parsing bug for an execution bug or vice versa. And of course you may have an undefined-behavior bug. So make sure you think through your testing strategy, and at least check out the relevant part of section.
You may also need to exit a forked copy of the shell (for example, if execvp
fails). To exit a child process, call the _exit
system call. For instance,
call _exit(1)
or, equivalently, _exit(EXIT_FAILURE)
to exit with status 1.
Your code for this problem set should never call the
exit
function.exit
(without the underscore) performs cleanup actions, including messing with open stdio files, that shouldn’t happen in child processes (they should only happen in the parent shell). If you callexit
instead of_exit
from child processes, you may see weird behavior where, for example, the parent shell re-executes parts of a command file.
Part 2: Background commands
Next, add support to run commands in the background, such as sleep 1 &
. This
allows the shell to continue accepting new commands without waiting for the
background command to complete. This will require changes to parse_line
(to
detect control operators) and run_list
, as well as, most likely, to struct command
.
Part 3: Command lists
Support command lists, which are chains of commands linked by ;
and &
.
The semicolon runs two commands in sequence—the first command must complete
before the second begins. The ampersand runs two commands in parallel by
running the first command in the background. These operators have the same
precedence and associate
to the left.
This part will require changes to run_list
, struct command
, and
parse_line
at least. Check Section 10 for hints and
exercises on how to represent command lists.
Simplicity note. As you add more features to your shell, your code will naturally get more complicated. But try to resist unnecessary complication. Don’t copy whole blocks of code—avoid this pattern:
if (feature 1 is enabled) { step 1 of child process creation; step 2 of child process creation; step 3a of child process creation; step 4 of child process creation; step 5a of child process creation; } else if (feature 2 is enabled) { step 1 of child process creation; step 2 of child process creation; step 3b of child process creation; step 4 of child process creation; step 5b of child process creation; } else { step 1a of child process creation; step 2 of child process creation; step 3c of child process creation; step 4b of child process creation; step 5a of child process creation; }
Instead, aim for something like this:
step 1 of child process creation; step 2 of child process creation; if (feature 1 is enabled) { step 3a of child process creation; } else { step 3b of child process creation; } step 4 of child process creation; if (feature 1 is enabled) { step 5a of child process creation; } else if (feature 2 is enabled) { step 5b of child process creation; } ...
The second style is more robust to changes in requirements, because adding another feature requires local changes, as opposed to many distributed changes across slightly-different blocks of code.
Our solutions, with all features enabled, have two lines that call
fork()
, two lines that callwaitpid()
, and one line that callsexecvp()
.
Part 4: Conditionals
Support conditionals, which are chains of commands linked by &&
and/or ||
. These operators run two commands, but the second command is
run conditionally, based on the status of the first command. For
example:
$ true ; echo print
# The second command always runs, because ';' is an
# unconditional control operator.
print
$ false ; echo print
print
$ true && echo print
# With &&, though, the 2nd command runs ONLY if
# the first command exits with status 0.
# (WIFEXITED(status) && WEXITSTATUS(status) == 0)
print
$ false && echo print
# (prints nothing)
$ true || echo print
# With ||, the 2nd command runs ONLY if the first
# command (1) DOES NOT exit, or (2) exits with status
# nonzero. || is the opposite of &&.
# (prints nothing)
$ false || echo print
print
The &&
and ||
operators have higher precedence than ;
and &
, so
a command list can contain many conditionals. &&
and ||
have the
same precedence and they associate to the left. The exit status of a
conditional is taken from the last command executed in that conditional.
For example, true || false
has status 0 (the exit status of true
)
and true && false
has exit status 1 (the exit status of false
).
Check out how conditionals work in the background; for instance, try this command:
$ sleep 10 && echo foo & echo bar
Depending on your command representation, background conditional support may
require your command::run
function to look ahead into later command
structures.
To support conditionals, you’ll probably make changes to run_list
,
struct command
, and parse_line
. You’ll also use the WIFEXITED
and
WEXITSTATUS
macros defined in man waitpid
.
Part 5: Pipelines
Support pipelines, which are chains of commands linked by |
. The
pipe operator |
runs two commands in parallel, connecting the standard
output of the left command to the standard input of the right command.
The |
operator has higher precedence than &&
and ||
, so a
conditional can contain several pipelines. Unlike conditionals and
lists, the commands in the pipeline run in parallel. The shell
starts all the pipeline’s commands; the exit status of the pipeline is
taken from that last command.
If a foreground pipeline has two or more commands, your shell must wait at least for the last command to complete. Normal shells, such as Linux
bash
, wait for all commands in the pipeline to complete before moving on (though only the last command’s status matters); your shell may wait for these other commands to complete, but it is also OK to just wait for the last command.
To support pipelines, you’ll need to use some new system calls—namely pipe
,
dup2
, and close
—and changes to command::run
, run_list
, and struct command
.
Part 6: Zombie processes
Your shell should eventually reap all its zombie processes using
waitpid
.
Hint: You must reap all zombies eventually, but you don’t need to
reap them immediately. We don’t recommend using signal handlers to
reap zombies, since a signal handler can interfere with the waitpid
calls used to wait for foreground processes to complete. A well-placed
waitpid
loop will suffice to reap zombies; where should it go?
Part 7: Redirections
Support redirections, where some of a command’s file descriptors are sent to disk files. You must handle three kinds of redirection:
< filename
: The command’s standard input is taken fromfilename
.> filename
: The command’s standard output is sent tofilename
.2> filename
: The command’s standard error is sent tofilename
.
For instance, echo foo > x
writes foo
into the file named x
.
shell_token_iterator
represents redirect operators, such as >
and 2>
, as
type TOKEN_REDIRECT_OP
. You’ll need to change parse_line
to detect
redirections and store them in struct command
. Each redirection operator
must be followed by a filename, which is a TOKEN_NORMAL
token. You’ll also
change command::run
to set up the redirections, using system calls open
,
dup2
, and close
.
The shell sets up a command’s redirections before executing the command. If a redirection fails (because the file can’t be opened), the shell doesn’t actually run the command. Instead, the child process that would normally have run the command prints an error message to standard error and exits with status 1. Your shell should behave this way too. For example:
$ echo > /tmp/directorydoesnotexist/foo
/tmp/directorydoesnotexist/foo: No such file or directory
$ echo > /tmp/directorydoesnotexist/foo && echo print
/tmp/directorydoesnotexist/foo: No such file or directory
$ echo > /tmp/directorydoesnotexist/foo || echo print
/tmp/directorydoesnotexist/foo: No such file or directory
print
How to figure out the right error message? Try man strerror
!
Hint: Your calls to open
will have different arguments depending
on what type of redirection is used. How to figure out what those
arguments are? Well, you could use the manual page; or you could simply
use strace
to check the regular shell’s behavior. Try this:
$ strace -o strace.txt -f sh -c "echo foo > output.txt"
The strace output is placed in file strace.txt
. Look at that file.
Which flags were provided to open
for output.txt
? Try this with
different redirection types.
Part 8: Interruption
Support interruption: pressing Control-C to the shell should kill the current command line, if there is one.
Control-C is part of job control, the aspects of the Unix operating system that help users interact with sets of processes. Job control complicated and involves process groups, controlling terminals, and signals. Luckily, Control-C is not too hard to handle on its own. You will need to take the following steps:
- All processes in each pipeline must have the same process group (see below).
- Your shell should use the
claim_foreground
function to inform the OS about the currently active foreground pipeline.
If you do these things, then if the user presses Control-C while the
shell is executing a foreground pipeline, every process in that
pipeline will receive the SIGINT
signal.
About process groups
Job control is designed to create a common-sense mapping between operating
system processes and command-line commands. This gets interesting because
processes spawn new helper processes: when a user kills a command with
Control-C, the helper processes should also die transcend. Unix’s
solution uses process groups, where a process group is a set of processes.
The Control-C key sends a signal to all members of the current foreground
process group, not just the current foreground process.
Each process is a member of exactly one process group. Process groups are
identified by number: process group IDs are positive integers. Furthermore,
process group IDs are drawn from the space of process IDs. Process groups are
initially inherited—they start out equal to the parent’s process group—but the
setpgid
system call can change it:
setpgid(pid, pgid)
sets processpid
’s process group topgid
. Process groups use the same ID space as process IDs, so you’ll often see code likesetpgid(pid, pid)
.setpgid(0, 0)
means the same thing assetpgid(getpid(), getpid())
. This divorces the current process from its old process group and puts it into the process group named for itself.
A system call kill(-pgid, signal_number)
will send a signal to all processes
in group pgid
, but you are not likely to need the kill
system call in this
problem set. This is because when a user types Control-C into a terminal, Unix
automatically sends the SIGINT
signal to all members of the process
group associated with that terminal. The signal typically causes any currently
executing commands to exit. (Their waitpid status will have
WIFSIGNALED(status) != 0
and WTERMSIG(status) == SIGINT
.)
Changing sh61
to use process groups
-
Add calls to
setpgid
incommand::run
. The goal: Every process in a foreground command pipeline must be part of the same process group, and this process group’s ID must be different from the parent shell’s process group ID or process ID.-
The first child process in a foreground command pipeline should define a brand-new process group, whose ID equals its own process ID. (
setpgid(0, 0)
in the child.) -
command::run
should export this process group in some way so thatrun_list
can use it. -
The second, third, and subsequent child processes in the foreground command pipeline should use the process group established by the first child process. (
setpgid(0, [PGID])
in the child, where[PGID]
is the group established by the first child process.)
-
-
Add calls to
claim_foreground
that are executed while waiting for a pipeline to complete. The goal: Cause Control-C to send SIGINT to the current foreground pipeline, reverting to the original behavior once the foreground pipeline is done.-
Before calling
waitpid
for a foreground pipeline, callclaim_foreground(PGID)
, wherePGID
is the process group established by the pipeline’s first child process. This attaches the terminal to the process group namedPGID
, which makes Control-C send SIGINT to the foreground pipeline. -
After
waitpid
returns, callclaim_foreground(0)
. This detaches the terminal from process group and causes Control-C to revert to the shell.
-
-
Make sure you handle race conditions! Remember that after
fork()
, either the child or parent process might run first. If you’re not careful, thenclaim_foreground
might execute before the process group was populated. That could cause an error. In a correct implementation,command::run
will callsetpgid
twice. -
Finally, if the shell itself gets a
SIGINT
signal, it should cancel the current command line and print a new prompt. This will require adding a signal handler.Hint: We strongly recommend that signal handlers do almost nothing. A signal handler might be invoked at any moment, including in the middle of a function or library call; memory might be in an arbitrary intermediate state. Since these states are dangerous, Unix restricts signal handler actions. Most standard library calls are disallowed, including
printf
. (A signal handler that callsprintf
would work most of the time—but one time in a million the handler would invoke nasal demons.) The complete list of library calls allowed in signal handlers can be found inman 7 signal
. For this problem set, you can accomplish everything you need with a one-line signal handler that writes a global variable of typevolatile sig_atomic_t
(which is a synonym forvolatile int
).
Control-C behavior varies slightly from shell to shell. For instance, in old versions of the
bash
shell, typing Control-C while the shell is executing “sleep 10 ; echo Sleep failed
” or “sleep 10 || echo Sleep failed
” will printSleep failed
, but in most current shells, nothing is printed—when the shell detects a child failed because ofSIGINT
, it cancels the whole command line. Your shell may exhibit either behavior. However, if you press Control-C during “sleep 10 && echo Sleep succeeded
”, the message does not print on any shell, and you must not print the message either.
Part 9: The cd
command
Your shell should support the cd directory
command. The cd
command is special; why?
Checking your work
Use make check
to check your work; make SAN=1 check
to check your work
with sanitizers; and make SAN=1 LSAN=1 check
to check your work with
sanitizers and memory leak detection.
You may also run make check-TEST
to run a specific test,
or (for example) make check-simple
to run all the SIMPLE
tests. As always,
it can be great to create your own tests!
Extra credit
There are numerous ways you can extend your shell. For instance, you can add support for:
Complex redirections. Our parsing code understands more redirections than your code is required to support. Add support for more redirections.
Subshells. A subshell adds the following production to the grammar:
command ::= "(" list ")" [redirection]...
This executes the list
in a grouped subshell—that is, a child shell
process. All the commands in the subshell may have their file
descriptors redirected as a group.
Variable substitution.
Control structures. Design and implement analogues of the if
,
for
, and while
control structures common to many programming
languages. For example, your if
structure should execute several
commands in order, but only if some condition is true—for example, only
if a command exits with status 0.
Shell functions. Design and implement a way for shell users to write
their own “functions.” Once a function f
is defined, typing f
at the
command line will execute the function (rather than searching for an executable
named f
). For example, the user might write a function echo_twice
that
printed its arguments twice, by running the echo
command twice. Discuss
how other command line arguments will be passed to the shell function.
Or anything else that strikes your fancy. Read up about existing shells (bash, zsh, dash, Windows PowerShell, etc.) for ideas.
Turnin
Hand in your code by editing README.md
and AUTHORS.md
, committing
your changes, and pushing the result to GitHub. Update the grading
server with your current
partner information.
Common shell utilities
Here are some commonly-installed, commonly-used programs that you can call
directly from the shell. You may find them useful for testing. Documentation
for these programs can be accessed via the man
page, e.g. man cat
, or
often also through a help switch, e.g. cat --help
.
Shell Program | Description |
---|---|
cat |
Write named files (or standard input) to standard output. |
wc |
Count lines, words, and characters in named files (or standard input). |
head -n N |
Print first N lines of standard input. |
head -c N |
Print first N characters of standard input. |
tail -n N |
Print last N lines of standard input. |
echo ARG1 ARG2... |
Print arguments to standard output. |
printf FORMAT ARG... |
Print arguments with printf-style formatting. |
true |
Always succeed (exit with status 0 ). |
false |
Always fail (exit with status 1 ). |
sort |
Sort lines in input. |
uniq |
Drop duplicate lines in input (or print only duplicate lines). |
tr |
Change characters; e.g., tr a-z A-Z makes all letters uppercase. |
ps |
List processes. |
curl URL |
Download URL and write result to standard output. |
sleep N |
Pause for N seconds, then exit with status 0 . |
cut |
Cut selected portions of each line of a file. |
grep PATTERN |
Print lines in named files (or standard input) that match a regular expression PATTERN . |
tac |
Write lines in named files (or standard input) in reverse order to standard output. |
This lab was originally created for CS 61, but every course has its own shell lab.