Section 10: Command line representation

The shell grammar defines the underlying hierarchical structure of command lines. The overall structure of a line, as described by the grammar, is:

• A list is composed of conditionals joined by ; or & (and the last command in the list might or might not end with &).
• A conditional is composed of pipelines joined by && or ||.
• A pipeline is composed of commands joined by |.
• A command is composed of words and redirections.

So lists are comprised of conditionals, which are comprised of pipelines, which are comprised of commands.

There are two common ways to represent our parsed command line in a data structure appropriate for our shell, a tree representation and a list representation. Students often are biased toward the tree representation, which precisely represents the structure of the grammar, but the list representation is simpler in some ways. It’s useful to consider them both.

Neither the tree representation nor the list representation will exactly match the parent/child relationships implied by parent process IDs for the shell process and its children! In-memory command line representations should be designed for ease of traversal; parent/child process relationships are determined by other constraints, such as which processes need to wait for others.

Before going further:

EXERCISE SET. Work through the BNF grammar exercises.

Tree representation

The following tree-formatted data structure precisely represents this grammar structure.

• A command contains its executable, arguments, and any redirections.
• A pipeline is a linked list of commands, all of which are joined by |.
• A conditional is a linked list of pipelines. But since adjacent conditionals can be connected by either && or ||, we need to store whether the link is && or ||.
• A list is a linked list of conditionals, each flagged as foreground (i.e., joined by ;) or background (i.e., joined by &). Note that while the conditional link type doesn’t matter for the last pipeline in a conditional, the background flag does matter for the last conditional in a list, since the last conditional in a list might be run in the foreground or background.

For instance, consider this tree structure for a command line (the words and redirections that are there, but not being shown for the sake of simplifying the image somewhat):

EXERCISE. Assuming that a, b, c, d, e, and f are distinct commands, what could be a command line that would produce this tree structure?

EXERCISE. Sketch C++ structure definitions corresponding to this design, starting from this:

struct command {
std::vector<std::string> args;
command* next_in_pipeline = nullptr;
command* prev_in_pipeline = nullptr;
};


EXERCISE. Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.

EXERCISE. Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?

EXERCISE. Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?

EXERCISE. How should we destroy a conditional, in other words, free all its memory and the memory of related structures? How should the conditional::~conditional() destructor function be implemented?

Alternatively, we can create a single linked list of all of the commands. In this case, we also store the connecting operator (one of & ; | && or ||). For our example command line above, the flat linked list structure might look something like:

Traversing this structure is slightly more time-consuming than in the tree representation, but the structure is much easier to build, and the additional CPU time spent traversing it is in the noise given how short command lines typically are.

EXERCISE. Write C++ structure definitions corresponding to this design.

EXERCISE. Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.

EXERCISE. Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?

EXERCISE. Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?

EXERCISE. Sketch out code for parsing a command line into these C++ structures.

In our experience, students experience two different kinds of error with this problem set: first, creating incorrect parsed representations for a command line; and second, executing incorrect system calls for a given representation. It is worth thinking explicitly with your peers about how you will test these aspects of your problem set! There’s no one right answer.

EXERCISE. Discuss what general tools you might use for testing your code. Are there kinds of error the compiler might help you find?

EXERCISE. Discuss how you might test your parsing function, parse_line. What does it mean for the function to be correct? How could you test it for different kinds of error?

EXERCISE. Discuss how you might test your execution functions, command::run and run_list. What does it mean for the functions to be correct? How could you test it for different kinds of error?

Inferring properties of command lines

Some properties of command line execution can be observed by executing carefully-constructed command lines of common Unix utilities, such as these:

Program Description
true Silently succeed (exit with status 0).
false Silently fail (exit with status 1).
cat Write file(s) to standard output.
wc Count lines, words, and characters in standard input.
head -n N Print first N lines of standard input.
tail -n N Print last N lines of standard input.
yes Print y (or any arguments) forever.
sleep N Pause for N seconds.
exit [STATUS] Exit the shell with status STATUS (default 0).
sh -c "COMMANDLINE" Run another shell executing "COMMANDLINE".

For example, the behaviors of the && and || conditional operators can be explored with different combinations of false, true, and echo, among other programs.

# Demonstrate that A && B runs B iff A exits with status 0
$true && echo X X$ false && echo X
$sh -c "exit 2" && echo X # Demonstrate that A || B runs B iff A does not exit with status 0$ true || echo X
$false || echo X X$ sh -c "exit 2" || echo X
X


In these exercises, you’ll attempt to explore more complex shell properties.

EXERCISE. Demonstrate that the exit status of a conditional equals the exit status of the last-executed command in that conditional.

Hint: The $? shell variable equals the exit status of the most-recently-executed foreground conditional. For instance: $ true
$echo$?
0
$false$ echo $? 1$ sh -c "exit 2"
$echo$?
2


EXERCISE. Demonstrate that a conditional chain, such as A && B || C, executes from left to right.

EXERCISE. Demonstrate that in a two-process pipeline, such as A | B, processes A and B execute in parallel.

EXERCISE. Demonstrate the same thing for three-or-more-process pipelines.

EXERCISE. Demonstrate that the exit status of a pipeline equals the exit status of its rightmost command. (Try to avoid using \$?.)

Subprocess

Finally, this exercise (which is not about command line representation) will test your understanding of file descriptor system calls as well as processes.

EXERCISE. Write a function that implements a version of Python’s subprocess.Popen functionality. Your function should have the following signature:

// subprocess(file, argv, pfd)
//    Run the command file with arguments argv in a child process.
//    Three pipes are opened between this process and the child process:
//    one for the child’s standard input, one for its standard output,
//    and one for its standard error. The pfd argument is populated
//    with this process’s pipe ends.
//
//    * Data written by this process to pfd[0] is read from the child’s
//      standard input.
//    * Data written to the child’s standard output is read by this
//      process from pfd[1].
//    * Data written to the child’s standard error is read by this
//      process from pfd[2].
//
//    Returns the process ID of the child or -1 on error.
pid_t subprocess(const char* file, char* const argv[], int pfd[3]);


You will use the pipe and dup2 system calls, among others.