# BNF grammars

The input language for sh61 command lines is described in terms of a BNF grammar, where BNF stands for Backus–Naur Form, or Backus Normal Form. BNF is a declarative notation for describing a language, meaning simply a set of strings. BNF notation is built from:

• Terminals, such as "x", which must exactly match characters in the input.
• Nonterminals (or symbols for short), such as lettera, which represent sets of strings. One of the nonterminals is called the root or start symbol of the grammar. By convention, the root is the first nonterminal mentioned in the grammar.
• Rules, such as lettera ::= "a" or word ::= letter word, which define how nonterminals and strings relate.

A string is in the language if it can be obtained by following the rules of the grammar starting from a single instance of the root symbol.

It’s useful to be able to read BNF grammars because they’re very common in computer science and programming. Many specification documents use BNF to define languages, including programming languages, configuration files, and even network packet formats. Any context-free language can be described using a BNF grammar.

## Example

This grammar that can recognize exactly two strings, namely "cs61 is awesome" and "cs61 is terrible":

cs61 ::= "cs61 is " emotion
emotion ::= "awesome" | "terrible"


The vertical bar, |, is shorthand for alternate definitions of a nonterminal, so this defines the exact same language. (A given language may be defined by many grammars.)

cs61 ::= "cs61 is " emotion
emotion ::= "awesome"
emotion ::= "terrible"


The language includes "cs61 is awesome" because:

• The start symbol, cs61, has one rule, cs61 ::= "cs61 is " emotion, so the string must match that rule.
• The rule starts with terminal "cs61 is ", so the first part of the string must match that terminal exactly. It does, leaving "awesome".
• The rule ends with nonterminal emotion, so the remaining part of the string, "awesome", must match a rule for emotion.
• The first emotion rule matches "awesome" exactly!

We can visualize this derivation as a parse tree that shows how the string is formed. The start symbol is at the top; each tree branching point expands one nonterminal using a rule.

                   cs61
/      \
/        \
"cs61 is "  emotion
|
"awesome"


The language does not include "cs121 is great" or "cs61 is meh"; there is no way to build up either of those strings using the grammar.

## Recursive example

Grammars often contain recursive rules. This grammar recognizes one or more kinds of zoo animal, separated by commas:

zoo     ::= species
|  zoo ", " species
species ::= "lions" | "tigers" | "bears"


The string "lions, tigers, bears, tigers, lions" is in the language because of the following parse tree:

                                 ________ zoo __________
/                \      \
_________ zoo ________        |   species
/               \      \       |      |
______ zoo _______       |   species   |      |
/           \      \      |      |      |      |
__ zoo __         |   species  |      |      |      |
/    |    \        |      |     |      |      |      |
zoo    |   species   |      |     |      |      |      |
|     |      |      |      |     |      |      |      |
species  |      |      |      |     |      |      |      |
|     |      |      |      |     |      |      |      |
"lions" ", " "tigers" ", " "bears" ", " "tigers" ", " "lions"



## Precedence example

Grammars can define the precedence of infix operators. For example, the “PEMDAS rule” says that Multiplication has higher precedence than Addition, so that the expression “1+2*3” equals 7, not 9 (“1+(2*3)”, not “(1+2)*3”). This grammar makes that precedence explicit:

expr   ::= factor
|  expr "+" factor
factor ::= number
|  factor "*" number
number ::= digit
|  number digit
digit  ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"


The grammar represents the string 1+2*3 as 1 plus 2*3, because the 2*3 part is grouped together into a single nonterminal.

    ______ expr ____
/     /          \
|     |       _ factor _
expr   |      /    /     \
|     |     |     |     |
factor  |   factor  |     |
|     |     |     |     |
number  |   number  |   number
|     |     |     |     |
digit   |   digit   |   digit
|     |     |     |     |
"1"   "+"   "2"   "*"   "3"


## Exercises

EXERCISE. Define a grammar for the empty language, which is a language containing no strings. (There is no valid sentence in the empty language.)

EXERCISE. Define a grammar for the children language (because children should be “not heard”). The empty string is the only valid sentence in this language.

EXERCISE. Define a grammar for the letter language. A letter is a lower-case Latin letter between a and z.

EXERCISE. Define a grammar for the word language. A word is a sequence of one or more letters. You may refer to the letter nonterminal defined above.

EXERCISE. Define a different grammar for the word language.

EXERCISE. Define a grammar for the parenthesis language, where all strings in the parenthesis language consist of balanced pairs of left and right parentheses. Some examples:

Valid Invalid
() ())(
(()) (
()() )(
(((()())()())) ))))))))))(

## Shell grammar

This is the BNF grammar for sh61, as found in problem set 5.

commandline ::= list
|  list ";"
|  list "&"

list     ::=  conditional
|   list ";" conditional
|   list "&" conditional

conditional ::=  pipeline
|   conditional "&&" pipeline
|   conditional "||" pipeline

pipeline ::=  command
|   pipeline "|" command

command  ::=  word
|   redirection
|   command word
|   command redirection

redirection  ::=  redirectionop filename
redirectionop  ::=  "<"  |  ">"  |  "2>"


This grammar doesn’t define the word nonterminal, and doesn’t define where whitespace is allowed or required. Such looseness is not atypical in grammars; in the case of the shell, our parsing functions in helpers.cc implement the necessary rules. Shorthand such as command ::= [word | redirection]... is also common.

The recursive definitions specify that:

• A list is one or more conditionals, separated by ; or &.
• A conditional is one or more pipelines, separated by && or ||.
• A pipeline is one or more commands, separated by |.