Command line representation
The shell grammar defines the underlying hierarchical structure of command lines. The overall structure of a line, as described by the grammar, is:
- A
list
is composed ofconditional
s joined by;
or&
(and the lastcommand
in thelist
might or might not end with&
). - A
conditional
is composed ofpipeline
s joined by&&
or||
. - A
pipeline
is composed ofcommand
s joined by|
. - A
command
is composed ofword
s andredirection
s.
So list
s are comprised of conditional
s, which are comprised of
pipeline
s, which are comprised of command
s. But must this structure be
reflected in your implementation?
There are two common ways to represent our parsed command line in a data structure appropriate for our shell, a tree representation and a list representation. Students often are biased toward the tree representation, which precisely represents the structure of the grammar, but the list representation can be easier to handle! The tradeoff is simplicity vs. execution time: the list representation requires more work to answer some questions about commands, but command lines are small enough in practice that the extra work doesn’t matter.
Tree representation
The following tree-formatted data structure precisely represents this grammar structure.
- A
command
contains its executable, arguments, and any redirections. - A
pipeline
is a linked list ofcommand
s, all of which are joined by|
. - A
conditional
is a linked list ofpipelines
. But since adjacentconditional
s can be connected by either&&
or||
, we need to store whether the link is&&
or||
. - A
list
is a linked list ofconditional
s, each flagged as foreground (i.e., joined by;
) or background (i.e., joined by&
). Note that while theconditional
link type doesn’t matter for the last pipeline in aconditional
, the background flag does matter for the lastconditional
in alist
, since the lastconditional
in alist
might be run in the foreground or background.
For instance, consider this tree structure for a command line (the word
s and
redirection
s that are there, but not being shown for the sake of simplifying
the image somewhat):
Exercise: Assuming that a
, b
, c
, d
, e
, and f
are distinct
commands, what could be a command line that would produce this tree structure?
a & b | c && d ; e | f &
Exercise: Sketch a set of C++ structures corresponding to this design.
Here’s one:
struct command { std::vector<std::string> args; command* command_sibling = nullptr; }; struct pipeline { command* command_child = nullptr; pipeline* pipeline_sibling = nullptr; bool is_or = false; }; struct conditional { pipeline* pipeline_child = nullptr; conditional* conditional_sibling = nullptr; bool is_background = false; };
There are many other solutions. Logically, we need a struct for conditional chains that indicates whether the chain should be run in the foreground or background; a struct for pipelines; and a struct for commands. Each struct needs to hold the necessary pointers to form a linked list of siblings. Furthermore, conditionals and pipleines need a pointer to the first item of their sublists.
Exercise: Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.
For our solution, that’s as simple as
c->is_background
.
Exercise: Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?
For our solution, a command
c
is on the left-hand side of a pipe iffc->command_sibling != nullptr
.
Exercise: Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?
Exercise: How should we destroy a conditional
, in other
words, free all its memory and the memory of related structures?
How should the conditional::~conditional()
destructor function
be implemented?
Linked list representation
Alternatively, we can create a single linked list of all of the commands. In
this case, we also store the connecting operator (one of &
;
|
&&
or
||
). For our example command line above, the flat linked list structure might
look something like:
Again, this is the structure we recommend (we don’t require it). Traversing this structure is slightly more time-consuming than in the tree representation, but the structure is much easier to build, and the additional CPU time spent traversing it is in the noise given how short command lines typically are.
Exercise: Write a set of C++ structures corresponding to this design.
Here’s an example:
struct command { std::vector<std::string> args; command* next = nullptr; int link = TYPE_SEQUENCE; // Operator following this command. // Always TYPE_SEQUENCE or TYPE_BACKGROUND for last in list. };
Much simpler!
Another good one, with fewer pointers and thus fewer opportunities for memory mistakes, but harder to traverse (you need C++ iterators):
struct command { std::vector<std::string> args; int op; // as above }; using command_list = std::list<command>;
Exercise: Given a C++ structure corresponding to a conditional chain, explain how to tell whether that chain should be executed in the foreground or the background.
bool chain_in_background(command* c) { while (c->op != TYPE_SEQUENCE && c->op != TYPE_BACKGROUND) { c = c->next; } return c->op == TYPE_BACKGROUND; }
Things are slightly more complicated with a flat linked list. We are still executing commands sequentially, but some subsequences of commands may need to be executed in the foreground or the background. During parsing, we can skip ahead in the command line to find the end of our current list (i.e., our current chain of conditionals).
Exercise: Given a C++ structure corresponding to a command, how can one determine whether the command is on the left-hand side of a pipe?
Exercise: Given a C++ structure corresponding to a command, how can one determine whether the command is on the right-hand side of a pipe?
Exercise: Sketch out code for parsing a command line into these C++ structures.