# BNF Grammars

The input language for sh61 command lines is described in terms of a BNF grammar, where BNF stands for Backus–Naur Form, or Backus Normal Form. BNF is a declarative notation for describing a language, which in this context just means a set of strings. BNF notation consists of three pieces:

• Terminals, such as "x", are strings of characters that must exactly match characters in the input.
• Nonterminals (or symbols for short), such as lettera, represent sets of strings. One of the nonterminals is called the root or start symbol of the grammar. By convention, the root is the first nonterminal mentioned in the grammar.
• Rules, such as lettera ::= "a" or word ::= letter word, define how nonterminals and strings relate.

A string is in the language if it can be obtained by following the rules of the grammar starting from a single instance of the root symbol. Any context-free language can be described this way.

Here’s an example grammar that can recognize exactly two sentences, namely "cs61 is awesome" and "cs61 is terrible":

cs61 ::= "cs61 is " emotionword
emotionword ::= "awesome" | "terrible"

It’s useful to be able to read BNF grammars because they’re very common in computer science and programming. Many specification documents use BNF to define languages, including programming languages, configuration files, and even network packet formats.

These exercises show features of BNF grammars using simple examples. Peek at some answers and then try to answer the questions on your own.

Exercise: Define a grammar for the empty language, which is a language containing no strings. (There is no valid sentence in the empty language.)

Exercise: Define a grammar for the children language (because children should be “not heard”). The empty string is the only valid sentence in this language.

Exercise: Define a grammar for the letter language. A letter is a lower-case Latin letter between a and z.

Exercise: Define a grammar for the word language. A word is a sequence of one or more letters. You may refer to the letter nonterminal defined above.

Exercise: Define two different grammars for the word language.

Exercise: Define a grammar for the parenthesis language, where all sentences in the parenthesis language consist of balanced pairs of left and right parentheses. Some examples:

Valid Invalid
() ())(
(()) (
()() )(
(((()())()())) ))))))))))(

## The shell grammar

This is the BNF grammar for sh61, as found in problem set 5. (Note that this grammar doesn’t define words, and doesn’t define where whitespace is required. That kind of looseness is typical in systems grammars; in your case helpers.cc already implements the necessary rules.)

commandline ::= list
|  list ";"
|  list "&"

list     ::=  conditional
|   list ";" conditional
|   list "&" conditional

conditional ::=  pipeline
|   conditional "&&" pipeline
|   conditional "||" pipeline

pipeline ::=  command
|   pipeline "|" command

command  ::=  word
|   redirection
|   command word
|   command redirection

redirection  ::=  redirectionop filename
redirectionop  ::=  "<"  |  ">"  |  "2>"

The recursive definitions allow for lists or trees of terms to be chained together. Taking list as an example:

list     ::=  conditional
|   list ";" conditional
|   list "&" conditional

This reads “a list is a conditional, or a list followed by a semicolon and then a conditional, or a list followed by an ampersand and then a conditional.” But this means that the list is just a bunch of conditionals, linked by semicolons or ampersands! Notice that the other recursive definitions in sh61’s grammar also follow this pattern. In other words:

• A list is a series of conditionals, concatenated by ; or &.
• A conditional is a series of pipelines, concatenated by && or ||.
• A pipeline is a series of commands, concatenated by |.
• A redirection is one of <, >, or 2>, followed by a filename.

We’ve written out a full definition of command, but BNF-style grammars seen in the wild often use shorthand, such as:

command ::= [word or redirection]...

This shorthand, like the full definition, says that a command is a series of words representing the program name and its arguments, interspersed with redirection commands.