Problem set 5 – CS 61 2018

In this assignment, you use fork, exec, and several other interesting system calls to build a command shell. You will implement simple commands, background commands, conditional commands (&& and ||), redirections, and pipes, as well as implement command interruption.

Due Wed 11/21 11:59pm (extension 2 days later)

Your shell implements a subset of the bash shell’s syntax, and is generally compatible with bash for the features they share. You may be interested in a tutorial for Unix shells.

Get the code

Start with the cs61-psets Git repository you used for Problem Set 4 and run git pull handout master This will merge our code, which is in the pset5 subdirectory, with your previous work. If you have any “conflicts” from Problem Set 4, resolve them before continuing further. Run git push to save your work back to GitHub.

You may also create a new cs61-psets repository for this assignment.

Don’t forget to enter your repository URL on the grading server.

Shell grammar

There are two main phases to writing a shell. The shell must parse commands from a command line string, and then it must execute those commands.

We’ve already completed some of the parsing phase for you. The parse_shell_token function returns the next “token” from the command line, and it differentiates between normal words like “echo” and special control operators like “;” or “>”. But currently, eval_line treats every token like a normal command word, so “echo foo ; echo bar | wc” would print “foo ; echo bar | wc”! Real shells allow users to build up interesting commands from collections of commands, connected by control operators like && and |. Part of your task is to complete the parsing phase. You can complete it all at once, but you don’t need to; see below for staging hints.

sh61 command lines follow this grammar. Each command is a “commandline” defined as follows:

commandline ::= list
          |  list ";"
          |  list "&"

list     ::=  conditional
          |   list ";" conditional
          |   list "&" conditional

conditional ::=  pipeline
          |   conditional "&&" pipeline
          |   conditional "||" pipeline

pipeline ::=  command
          |   pipeline "|" command

command  ::=  [word or redirection]...

redirection  ::=  redirectionop filename
redirectionop  ::=  "<"  |  ">"  |  "2>"

This grammar says, for example, that the command “echo foo && echo bar & echo baz” is parsed and executed as follows:

   {   { echo foo } && { echo bar }   } & { echo baz }

That is, the && is “inside” the background command, so “echo foo && echo bar” runs in the background and “echo baz” runs in the foreground.

A robust shell will detect errors in its input and handle them gracefully, but for this problem set, we promise that all the inputs we use in tests follow the grammar above.

Command execution

The main part of this assignment is actually implementing the shell.

If you’re confused about a shell feature and the tutorials and manuals don’t help, run some experiments! The bash shell (which is default on the appliance and on Mac OS X) is compatible with our shell. You may find the following commands particularly useful for testing; find out what they do by reading their manual pages and be creative with how you combine them.

echo (print arguments to standard output)
true (exit with status 0)
false (exit with status 1)
sleep N (wait for N seconds then exit)
sort (sort lines)
wc (count lines on standard input)
cat (print one or more files to standard output)

Run your shell by typing ./sh61 and entering commands at the prompt. Exit your shell by typing Control-D at the prompt or by going to another window and entering killall sh61.

We suggest you implement shell features in this order.

Part 1: Simple commands

Support simple commands like “echo foo” by changing command::make_child and command::run. You’ll use the following system calls: fork, execvp, and waitpid (consult the man pages for details on how to call them and what they do). Also read the function definitions in sh61.hh.

You may also need to exit a forked copy of the shell (for example, if execvp fails). To exit a child process, call the _exit function. For instance, call _exit(1) or, equivalently, _exit(EXIT_FAILURE) to exit with status 1.

Your code for this problem set should never call the normal exit function. This function performs cleanup actions, including messing with open stdio files, that shouldn’t happen in child processes (they should only happen in the parent shell). If you call exit instead of _exit from child processes, you may see weird behavior where, for example, the parent shell re-executes parts of a command file.

Part 2: Background commands

Next, add support to run processes in the background, such as sleep 1 &. A background process allows the shell to continue accepting new commands without waiting for the prior command to complete. This will require changes to eval_line (to detect control operators) and command::run, as well as, most likely, to struct command.

Part 3: Command lists

Support command lists, which are chains of commands linked by ; and &. The semicolon runs two commands in sequence—the first command must complete before the second begins. The ampersand runs two commands in parallel by running the first command in the background. These operators have the same precedence and associate to the left.

This part will require changes to command::run, struct command, and eval_line at least. Check the section material for hints and exercises on how to represent command lists.

Part 4: Conditionals

Support conditionals, which are chains of commands linked by && and/or ||. These operators run two commands, but the second command is run conditionally, based on the status of the first command. For example:

$ true ; echo print      # The second command always runs, because ';' is an
                         # unconditional control operator.
print
$ false ; echo print
print
$ true && echo print     # With &&, though, the 2nd command runs ONLY if
                         # the first command exits with status 0.
print
$ false && echo print
                         # (prints nothing)
$ true || echo print     # With ||, the 2nd command runs ONLY if the first
                         # command DOES NOT exit with status 0.
                         # (prints nothing)
$ false || echo print
print

The && and || operators have higher precedence than ; and &, so a command list can contain many conditionals. && and || have the same precedence and they associate to the left. The exit status of a conditional is taken from the last command executed in that conditional. For example, true || false has status 0 (the exit status of true) and true && false has exit status 1 (the exit status of false).

Check out how conditionals work in the background; for instance, try this command:

$ sleep 10 && echo foo & echo bar

To support conditionals, you’ll probably make changes to command::run, struct command, and eval_line. You’ll also use the WIFEXITED and WEXITSTATUS macros defined in man waitpid.

Part 5: Pipelines

Support pipelines, which are chains of commands linked by |. The pipe operator | runs two commands in parallel, connecting the standard output of the left command to the standard input of the right command.

The | operator has higher precedence than && and ||, so a conditional can contain several pipelines. Unlike conditionals and lists, the commands in the pipeline run in parallel. The shell starts all the pipeline’s commands, but only waits for the last command in the pipeline to finish. The exit status of the pipeline is taken from that last command.

To support pipelines, you’ll need to use some new system calls—namely pipe, dup2, and close—and changes to command::make_child, command::run, and struct command.

Part 6: Zombie processes

Your shell should eventually reap all its zombie processes using waitpid.

Hint: You must reap all zombies eventually, but you don’t need to reap them immediately. We don’t recommend using signal handlers to reap zombies, since a signal handler can interfere with the waitpid calls used to wait for foreground processes to complete. A well-placed waitpid loop will suffice to reap zombies; where should it go?

Part 7: Redirections

Support redirections, where some of a command’s file descriptors are sent to disk files. You must handle three kinds of redirection:

< filename: The command’s standard input is taken from filename.
> filename: The command’s standard output is sent to filename.
2> filename: The command’s standard error is sent to filename.

For instance, echo foo > x writes foo into the file named x.

The parse_shell_token command returns redirection operators as type TOKEN_REDIRECTION. You’ll need to change eval_line to detect redirections and store them in struct command. Each redirection operator must be followed by a filename (a TOKEN_NORMAL token). You’ll also change command::make_child to set up the redirections, using system calls open, dup2, and close.

The shell sets up a command’s redirections before executing the command. If a redirection fails (because the file can’t be opened), the shell doesn’t actually run the command. Instead, the child process that would normally have run the command prints an error message to standard error and exits with status 1. Your shell should behave this way too. For example:

$ echo > /tmp/directorydoesnotexist/foo
/tmp/directorydoesnotexist/foo: No such file or directory
$ echo > /tmp/directorydoesnotexist/foo && echo print
/tmp/directorydoesnotexist/foo: No such file or directory
$ echo > /tmp/directorydoesnotexist/foo || echo print
/tmp/directorydoesnotexist/foo: No such file or directory
print

How to figure out the right error message? Try man strerror!

Hint: Your calls to open will have different arguments depending on what type of redirection is used. How to figure out what those arguments are? Well, you could use the manual page; or you could simply use strace to check the regular shell’s behavior. Try this:

$ strace -o strace.txt -f sh -c "echo foo > output.txt"

The strace output is placed in file strace.txt. Look at that file. Which flags were provided to open for output.txt? Try this with different redirection types.

Part 8: Interruption

Support interruption: pressing Control-C to the shell should kill the current command line, if there is one.

Control-C is an initial step into job control, the aspects of the Unix operating system that help users interact with sets of processes. Job control is a complicated affair involving process groups, controlling terminals, and signals. Luckily, Control-C is not too hard to handle on its own. You will need to take the following steps:

All processes in each pipeline must have the same process group (see below).
Your shell should use the claim_foreground function to inform the OS about the currently active foreground pipeline.
If the user presses Control-C while the shell is executing a foreground pipeline, every process in that pipeline will receive the SIGINT signal. This will kill them.

What are process groups? Job control is designed to create a common-sense mapping between operating system processes and command-line commands. This gets interesting because processes spawn new helper processes. If a user kills a command with Control-C, the helper processes should also die. Unix’s solution uses process groups, where a process group is a set of processes. The Control-C key kills all members of the current foreground process group, not just the current foreground process.

Each process is a member of exactly one process group. This group is initially inherited from the process’s parent, but the setpgid system call can change it.

setpgid(pid, pgid) sets process pid’s process group to pgid. Process groups use the same ID space as process IDs, so you’ll often see code like setpgid(pid, pid).
setpgid(0, 0) means the same thing as setpgid(getpid(), getpid()). This divorces the current process from its old process group and puts it into the process group named for itself.

To kill all processes in group pgid, use the system call kill(-pgid, signal_number).

(Note that one process can change another process’s group. Process isolation restricts this functionality somewhat, but it’s safe for the shell to change its children’s process groups.)

For interrupt handling, each process in a foreground command pipeline must be part of the same process group. This will require that you call setpgid in command::make_child. In fact, you should call it twice, at two different locations in command::make_child, to avoid race conditions (why?).

Once this is done, your shell should call claim_foreground before waiting for a command. This function makes the terminal dispatch Control-C to the process group you choose. Call claim_foreground(pgid) before waiting for the foreground pipeline, and call claim_foreground(0) once the foreground pipeline is complete. This function manipulates the terminal so that commands like man kill will work inside your shell.

When a user types Control-C into a terminal, the Unix system automatically sends the SIGINT signal to all members of that terminal’s foreground process group. This will cause any currently executing commands to exit. (Their waitpid status will have WIFSIGNALED(status) != 0 and WTERMSIG(status) == SIGINT.)

Finally, if the shell itself gets a SIGINT signal, it should cancel the current command line and print a new prompt. This will require adding a signal handler.

Hint: We strongly recommend that signal handlers do almost nothing. A signal handler might be invoked at any moment, including in the middle of a function or library call; memory might be in an arbitrary intermediate state. Since these states are dangerous, Unix restricts signal handler actions. Most standard library calls are disallowed, including printf. (A signal handler that calls printf would work most of the time—but one time in a million the handler would invoke nasal demons.) The complete list of library calls allowed in signal handlers can be found in man 7 signal. For this problem set, you can accomplish everything you need with a one-line signal handler that writes a global variable of type volatile sig_atomic_t (which is a synonym for volatile int).

Control-C and command lines interact in different ways on different operating systems. For instance, try typing Control-C while the shell is executing “sleep 10 ; echo Sleep failed” (or “sleep 10 || echo Sleep failed”). The answer depends: Mac OS X prints “Sleep failed”, but Linux does not! Your shell may exhibit either behavior. But note that if you press Control-C during “sleep 10 && echo Sleep succeeded”, the message does not print on any OS, and you must not print the message either.

Part 9: The `cd` command

Your shell should support the cd directory command. The cd command is special; why?

Checking your work

Use make check to check your work; make SAN=1 check to check your work with sanitizers; and make SAN=1 LSAN=1 check to check your work with sanitizers and memory leak detection.

You may also run make check-TEST to run a specific test, or (for example) make check-simple to run all the SIMPLE tests. As always, it can be great to create your own tests!

Extra credit

There are numerous ways you can extend your shell. For instance, you can add support for:

Complex redirections. Our parsing code understands more redirections than your code is required to support. Add support for more redirections.

Subshells. A subshell adds the following production to the grammar:

   command  ::=  "(" list ")" [redirection]...

This executes the list in a grouped subshell—that is, a child shell process. All the commands in the subshell may have their file descriptors redirected as a group.

Variable substitution.

Control structures. Design and implement analogues of the if, for, and while control structures common to many programming languages. For example, your if structure should execute several commands in order, but only if some condition is true—for example, only if a command exits with status 0.

Shell functions. Design and implement a way for shell users to write their own “functions.” Once a function f is defined, typing f at the command line will execute the function (rather than searching for an executable named f). For example, the user might write a function echo_twice that printed its arguments twice, by running the echo command twice. Discuss how other command line arguments will be passed to the shell function.

Or anything else that strikes your fancy. Read up about existing shells (bash, zsh, dash, Windows PowerShell, etc.) for ideas.

Turnin

Hand in your code by editing README.md and AUTHORS.md, committing your changes, and pushing the result to GitHub. Update the grading server with your current partner information.

This lab was originally created for CS 61, but every course has its own shell lab.

Problem set 5: Shell