Lecture 007

Predictive Parsing

Predictive Parsing: recursive-descent but parser can "predict" which production rule to use

LL(k) grammar: left-to-write parsing, left-most derivation, look ahead k tokens.

Predictive parser accept LL(k) grammar. In practice, we only use k = 1.

Predictive Parsing is just a grammar rewrite so that we always know which rule to apply:

  E -> T + E | T
  T -> int | int * T | (E)

  E -> TE'
  E' -> +E | epsilon
  T -> int T' | (E)
  T' -> epsilon | * T

Parse Table

We would like to have a look up table, taking a terminal or non-terminal and a token, tell us what rules to apply.

For example, we would like to have the table of the following:

int * + ( ) $
X +E eps eps
T int Y (E)
Y * T eps eps eps

Example Parsing Using Parse Table

Example Parsing Using Parse Table

But how can we generate the table?

We populates T[A, t] = alpha in two cases:

First Set

First Set: given a string X (mix of terminor or non-terminal), then the first set (which contains some terminals or epsilon) is:

\text{First}(X) = \{t | X \to^* t\alpha\} \cup \{\epsilon | X \to^* \epsilon\}

Intuitively, first set tells you what a string of code can be parsed from.

Algorithm Sketch:

Example: give the first set for the grammar below:

E -> TX
T -> (E) |  int Y
X -> + E | eps
Y -> * T | eps

Firstly, the first set of terminals is their singleton set:

First(+) = {+}
First(*) = {*}
First(()) = {()}
First()) = {)}
First(int) = {int}

The first set of non-terminal contains the first set of first character:

First(E) contains First(T)
First(T) contains First(() and First(int) = {(, int}

Now since First(T) does not have eps, we should not add First(X) to First(E).

So First(E) = First(T) = {(, int}

First(X) = {+, eps}
First(Y) = {*, eps}

Follow Set

Follow Set: what token can follow S

\text{Follow}(X) = \{t | S \to^* \beta X t \delta\}


Algorithm Sketch

Follow Set Example

Follow Set Example

Parsing Table Construction

For each production rule A \to \alpha, do:

Parsing Table Building Example

Parsing Table Building Example

Note that LL(1) parsing table can only be built for LL(1) grammar.

Example of non-LL(1) invalid Parsing Table

Example of non-LL(1) invalid Parsing Table

The only mechanical way to check for LL(1) grammar is to build the parsing table. (although quick checks includes: non-ambiguous, non-left-recursive, non-left-factored, and more)

LL(1) grammar is too weak to describe modern languages.

Bottom-Up Parsing

Bottom-up Parsing is the preferred method, can be just as efficient. It is more general than (deterministic) top-down parsing.

Bottom-up Parsing: reduces a string to the starting symbol by inverting production rules. (reduction)

Bottom-up parser traces a rightmost derivation in reverse.

Bottom-up Parse Tree

Bottom-up Parse Tree

Note that we try to expand right most derivation (e.g. we choose to parse E first in T + E)

Shift-Reduce Parsing

Consequence of right-most derivation: when you see \alpha \beta \omega, then if the next production rule to apply reversely is X \to \beta, then we know \omega must be a terminal.

So we can let the \omega be terminal come from the input steam of tokens!

Shift: add a token from the token steam to our working set from the right hand side.

Reduce: reversely apply production rule to the left hand side.

Example of Shift-Reduce Parsing

Example of Shift-Reduce Parsing

How do we know when and where to shift and reduce?

Shift-reduce Conflict: when the parse is free to choose either do shift or do reduce in the next round. (almost expected)

Reduce-reduce Conflict: when the parser is possible to perform more than one possible reduce rules, indicating the grammar is bad.

Shift pushes a terminal onto stack. Reduce pop 0 or more symbols off the stack and push produced symbols on the stack.

Table of Content