2013/Assignment5

From CS61
Jump to: navigation, search
Computer Science 61 and E61
Systems Programming and Machine Organization
This is the 2013 version of the course. Main site

Assignment 5: Shell

  • Assigned Tue 11/12
  • Due Fri 11/22 11:59pm (1 day later for extension)
  • This assignment may be completed in pairs.

Introduction

In the last assignment, you implemented fork. In this assignment, you’ll use fork—and several other interesting system calls—to build an important application: the sh61 shell! Your shell will read commands on its standard input and execute them. You will implement simple commands, background commands, conditional commands (&& and ||), redirections, and pipes. You will also implement command interruption. Your shell implements a subset of the bash shell’s syntax, and is generally compatible with bash for the features they share. You may be interested in a tutorial for Unix shells.

Get the code

Start with the cs61-psets Git repository you created for Assignment 4.

First, ensure that your repository has a handout remote. Type

git remote show handout

If this reports an error, run

git remote add handout git://code.seas.harvard.edu/cs61/cs61-psets.git

Then run git pull handout master. This will merge our Assignment 5 code with your previous work. If you have any “conflicts” from Assignment 4, resolve them before continuing further. Run git push to save your work back to code.seas.

You may also create a new cs61-psets repository for this assignment. You’ll need to tell us using the grading server if you do.

Shell grammar

There are two main phases to writing a shell. The shell must first parse commands from a command line string, and then it must execute those commands.

We’ve already completed a lot of the parsing phase for you. The parse_shell_token function returns the next “token” from the command line, and it differentiates between normal words like “echo” and special control operators like “;” or “>”. However, the parsing phase is incomplete. Currently, eval_command_line treats every token like a normal command word, so “echo foo ; echo bar | wc” would print out “foo ; echo bar | wc”! Real shells allow users to build up interesting commands from collections of commands, connected by control operators like && and |. Part of your task is to complete the parsing phase. You don’t need to complete it all at once, though; see below for staging hints.

Specifically, sh61 command lines follow this grammar. Each command is a “list” defined as follows:

 list     ::=  cmdgroup
           |   cmdgroup ";" [list]
           |   cmdgroup "&" [list]
 
 cmdgroup ::=  pipeline
           |   pipeline "&&" cmdgroup
           |   pipeline "||" cmdgroup
 
 pipeline ::=  command
           |   command "|" pipeline
 
 command  ::=  [word or redirection]...
 
 redirection  ::=  redirectionop filename
 redirectionop  ::=  "<"  |  ">"  |  number "<"  |  number ">"

This grammar says, for example, that the command “echo foo && echo bar & echo baz” is parsed as follows:

   {   { echo foo } && { echo bar }   } & { echo baz }

That is, the && is “inside” the background command, so “echo foo && echo bar” should be run in the background with “echo baz” run in the foreground.

A robust shell would detect errors in its input and handle them gracefully, but in our tests, we will only provide inputs that follow the grammar above.

Command execution

The main part of the lab is actually implementing the shell.

If you’re confused about a shell feature and the tutorials and manuals don’t help, run some experiments! The bash shell (which is default on the appliance and on Mac OS X) is compatible with our shell. You may find the following commands particularly useful for testing; find out what they do by reading their manual pages and be creative with how you combine them.

  • cat (print one or more files to standard output)
  • echo (print arguments to standard output)
  • true (exit with status 0)
  • false (exit with status 1)
  • sleep (wait for N seconds then exit)
  • sort (sort lines)

Run your shell by typing ./sh61 and entering commands at the prompt. Exit your shell by typing Control-D at the prompt, or by going to another window and entering killall sh61 .

We suggest you implement shell features in this order.

Simple commands

Support simple commands like “echo foo” by completing eval_command. You’ll use the following system calls: fork, execvp, and waitpid (run man systemcall to learn about them).

Background commands

Support background commands, such as sleep 1 &, which the shell runs without waiting for them to complete. This will require changes to eval_command_line (to detect control operators) and to eval_command.

Command lists

Support command lists, which are chains of commands linked by ; and &. The semicolon runs two commands in sequence—the first command must complete before the second begins. The ampersand runs two commands in parallel by running the first command in the background. This will require changes to eval_command_line at least, and it will probably require changes to struct command.

Command groups

Support command groups, which are chains of commands linked by && and ||. These conditional control operators run two commands, but the second command is run conditionally, based on the status of the first command. For example:

 $ true ; echo print      # The second command always runs, because ';' is an
                          # unconditional control operator.
 print
 $ false ; echo print
 print
 $ true && echo print     # With &&, though, the 2nd command runs ONLY if
                          # the first command exits with status 0.
 print
 $ false && echo print
                          # (prints nothing)
 $ true || echo print     # With ||, the 2nd command runs ONLY if the first
                          # command DOES NOT exit with status 0.
                          # (prints nothing)
 $ false || echo print
 print

The && and || operators have higher precedence than ; and &, so a command list can contain many command groups. To support command groups, you’ll probably make changes to eval_command, eval_command_line, and struct command. You’ll also use the WIFEXITED and WEXITSTATUS macros defined in man waitpid.

Pipelines

Support pipelines, which are chains of commands linked by |. The pipe operator | runs two commands in parallel, connecting the standard output of the left command to the standard input of the right command.

The | operator has higher precedence than && and ||, so a command group can contain several pipelines. To support pipelines, you’ll need some new system calls—namely pipe, dup2, and close—and changes to eval_command at least.

Zombie processes

Your shell should eventually reap all its zombie processes using waitpid.

Hint: You must reap all zombies eventually, but you don’t need to reap them immediately. We don’t recommend using signal handlers to reap zombies, since a signal handler can interfere with the waitpids used to wait for foreground processes to complete. A well-placed waitpid loop will suffice to reap zombies; where should it go?

Redirections

Support redirections, where some of a command’s file descriptors are sent to disk files. For instance, echo foo > x writes foo into the file named x.

The parse_shell_token command returns redirection operators as type TOKEN_REDIRECTION. You’ll need to change eval_command_line to detect redirections and store them in struct command. Each redirection operator must be followed by a filename (a TOKEN_NORMAL token). You’ll also change eval_command to set up the redirections, using system calls open, dup2, and close.

The shell sets up a command’s redirections before executing the command. If a redirection fails (because the file can’t be opened), the shell doesn’t actually run the command. Instead, the child process that would normally have run the command prints an error message to standard error and exits with status 1. Your shell should behave this way too. For example:

 $ echo > /tmp/directorydoesnotexist/foo
 /tmp/directorydoesnotexist/foo: No such file or directory
 $ echo > /tmp/directorydoesnotexist/foo && echo print
 /tmp/directorydoesnotexist/foo: No such file or directory
 $ echo > /tmp/directorydoesnotexist/foo || echo print
 /tmp/directorydoesnotexist/foo: No such file or directory
 print

Hint: Your calls to open will have different arguments depending on what type of redirection is used. How to figure out what those arguments are?? Well, you could use the manual page; or you could simply use strace to check the regular shell’s behavior. Try this:

 $ strace -o strace.txt -f sh -c "echo foo > output.txt"

The strace output is placed in file strace.txt. Look at that file. Which flags were provided to open for output.txt? Try this with different redirection types.

Command group interruption

Support command group interruption: pressing Control-C to the shell should kill the current (foreground) command group, if there is one.

Control-C is an initial step into job control, the aspects of the Unix operating system that help users interact with sets of processes. Job control is a complicated affair involving process groups, controlling terminals, and signals. Luckily, Control-C is not too hard to handle on its own. You will need to take the following steps:

  • Your shell should put each command group into its own process group (see below).
  • Your shell should use the set_foreground function to inform the OS about the currently active foreground process group.
  • If the user presses Control-C while the shell is executing a foreground command group, every process in that command group will receive the SIGINT signal. The shell should cancel that command group (and, optionally, the rest of the command line).
  • If SIGINT is received at another time, your shell should clear the current partial command line and print another prompt.

What are process groups? The goal of job control is to create a common-sense mapping between operating system processes and command-line commands. This gets interesting because processes spawn new helper processes; if a user kills a command with Control-C, the helper processes should also die. Unix’s solution uses process groups, where a process group is a set groups of processes that can be killed simultaneously with one system call. Each process is a member of exactly one process group, which it initially inherits from its parent. However, a process can “divorce” itself from its process group with the call setpgid(0, 0). This creates a new process group for the process. (The new group’s ID equals getpid(); in fact setpgid(0, 0) is an abbreviation for setpgid(getpid(), getpid()).)

For interrupt handling, you want each foreground command group to correspond to a separate process group. To do this, you will need to do two things. First, all child processes in the command group must inherit from a single parent process, the command parent, which is the leader of its own process group. Adjust your code so each command group inherits from a command parent, and make sure the command parent calls setpgid(0, 0) before birthing any children. Second, your shell should call set_foreground when the foreground command group changes. The shell must call set_foreground(COMMAND_PARENT_PID) before it waits for the command parent, and set_foreground(getpid()) after the command parent exits. The set_foreground function manipulates process groups and sets the terminal’s foreground process group so that commands like man kill will work inside the shell.

Tough question to test your understanding: The set_foreground(cp_pid) function contains a call to setpgid(cp_pid, cp_pid). This will often change nothing, since the command parent runs setpgid(0, 0), which has the same effect. Why is it important to call setpgid in both places?

When a user types Control-C into a terminal, the Unix system automatically sends the SIGINT signal to all members of that terminal’s foreground process group. This will cause any currently executing commands to exit. (Their waitpid status will have WIFSIGNALED(status) != 0 and WTERMSIG(status) == SIGINT.) You must ensure that the sh61 process(es) managing that foreground process group also handle the signal correctly. For instance, try typing Control-C while the shell is executing “sleep 10 || echo Sleep failed”. Even though sleep 10 fails (because it got the SIGINT signal), the shell should not execute the echo command. To detect this case, it is easiest to use signal handlers.

Hint 1: We strongly recommend that signal handlers do almost nothing. A signal handler might be invoked at any moment, including in the middle of a function or library call; memory might be in an arbitrary intermediate state. Since these states are dangerous, Unix restricts signal handler actions. Most standard library calls are disallowed, including printf. (A signal handler that calls printf would work most of the time—but one time in a million the handler could go truly nuts.) The complete list of library calls allowed in signal handlers can be found in man 7 signal. For this problem set, you only need to read and write global variables of type sig_atomic_t (which is a synonym for int). Our signal handler is 1 line long. You could also call kill if you wanted (man 2 kill to learn its arguments).

Hint 2: The first time you call set_foreground your shell may put itself into a coma. Ouch! Use gdb and/or read man tcsetpgrp to understand what’s happening and to fix it. (The problem is the SIGTTOU signal.)

The cd command

Your shell should support the cd directory command. The cd command is special; why?

Checking your work

Use make check to check your work. You may also run make check-N to run a specific test, or, as always, create your own tests!

Extra credit

Extra credit opportunities are rich for this lab. It doesn’t take much work to extend your shell into something much closer to a real shell! In particular, you can add support for:

Subshells. A subshell adds the following production to the grammar:

   command  ::=  "(" list ")" [redirection]...

This executes the list in a grouped subshell—that is, a child shell process. All the commands in the subshell may have their file descriptors redirected as a group.

Control structures. Design and implement analogues of the if, for, and while control structures common to many programming languages. For example, your if structure should execute several commands in order, but only if some condition is true—for example, only if a command exits with status 0.

Shell functions. Design and implement a way for shell users to write their own "functions". Once a function f is defined, typing f at the command line will execute the function, rather than a binary program named f. For example, the user might write a function echo_twice that printed its arguments twice, by running the echo command twice. Discuss how other command line arguments will be passed to the shell function.

More to come!

Turnin

Hand in your code by editing README.txt, committing your changes, and pushing the result to code.seas. Update the grading server with your current partner information.



This lab was originally created for CS 61, but every course has its own shell lab.