## LL(1) parsing

**Introduction:**

Using the rule that we choose a production N -> α on input symbol c if

- c ∈ FIRST(a), or
- Nullable(a) and c ∈ FOLLOW(N).

If we can always choose a production uniquely by using these rules, this is called LL(1) parsing – the first L indicates the reading direction (left-to-right), the second L indicates the derivation order (left) and the 1 indicates that there is a one symbol look ahead. A grammar that can be parsed using LL(1) parsing is called an LL(1) grammar.

There are two implementation methods of LL(1) parsers as programs: **Recursive descent**, where grammar structure is directly translated into the structure of a program, and **a table based approach **that encodes the decision process in a table.

**Recursive descent: **As the name indicates, recursive descent uses recursive functions to implementpredictive parsing. The central idea is that each non-terminal in the grammar isimplemented by a function in the program.

Each such function looks at the next input symbol in order to choose one of the productions for the non-terminal. The right-hand side of the chosen production is then used for parsing in the following way:

A terminal on the right-hand side is matched against the next input symbol. If they match, we move on to the following input symbol and the next symbol on the right hand side, otherwise an error is reported.

A non-terminal on the right-hand side is handled by calling the corresponding function and, after this calls returns, continuing with the next symbol on the right hand side.

When there are no more symbols on the right-hand side, the function returns.

As an example, figure 3.16 shows pseudo-code for a recursive descent parser for Stringaabbbcc. We have constructed this program by the following process:

We have first added a production T’ -> T$ and calculated FIRST and FOLLOW for all productions.

T’ has only one production, so the choice is trivial. However, we have added a check on the next input symbol anyway, so we can report an error if it is not in FIRST (T’). This is shown in the function parse T’.

For the parse T function, we look at the productions for T. As FIRST(R) = {b}, the production T -> R is chosen on the symbol b. Since R is also Nullable, we must choose this production also on symbols in FOLLOW (T), i.e., c or $. FIRST (aTc) = {a}, so we select T -> aTc on an a. On all other symbols we report an error.

For parse R, we must choose the empty production on symbols in FOLLOW(R) (c or $). The production R -> bR is chosen on input b. Again, all other symbols produce an error.

The function match takes as argument a symbol, which it tests for equality with the next input symbol. If they are equal, the following symbol is read into the variable next. We assume next is initialized to the first input symbol before parseT’ is called.

The program in figure 3.16 only checks if the input is valid. It can easily be extended to construct a syntax tree by letting the parse functions return the sub-trees for the parts of input that they parse.