# Lecture 007

## Predictive Parsing

Predictive Parsing: recursive-descent but parser can "predict" which production rule to use

• never guess wrong

• never backtrack

LL(k) grammar: left-to-write parsing, left-most derivation, look ahead $k$ tokens.

Predictive parser accept LL(k) grammar. In practice, we only use $k = 1$.

Predictive Parsing is just a grammar rewrite so that we always know which rule to apply:

from:
E -> T + E | T
T -> int | int * T | (E)

to:
E -> TE'
E' -> +E | epsilon
T -> int T' | (E)
T' -> epsilon | * T


### Parse Table

We would like to have a look up table, taking a terminal or non-terminal and a token, tell us what rules to apply.

For example, we would like to have the table of the following:

int * + ( ) $E TX TX X +E eps eps T int Y (E) Y * T eps eps eps But how can we generate the table? We populates T[A, t] = alpha in two cases: • First Set ($t \in \text{First}(\alpha)$): populate $\iff \alpha \to^* t \beta$ • Follow Set ($t \in \text{Follow}(A)$): populate $\iff S \to^* \beta A t \sigma \land A \to \alpha \land \alpha \to^* \epsilon$ ### First Set First Set: given a string $X$ (mix of terminor or non-terminal), then the first set (which contains some terminals or epsilon) is: \text{First}(X) = \{t | X \to^* t\alpha\} \cup \{\epsilon | X \to^* \epsilon\} Intuitively, first set tells you what a string of code can be parsed from. Algorithm Sketch: • $\text{First}(t) = t$ • $\epsilon \in \text{First}(X)$ if $X \to \epsilon \lor (X \to A_1 ... A_n \land (\forall 1 \leq i \leq n)(\epsilon \in \text{First}(A_i)))$ • $\text{First}(\alpha) \subseteq \text{First}(X)$ if $(X \to A_1 ... A_n\alpha \land (\forall 1 \leq i \leq n)(\epsilon \in \text{First}(A_i)))$ Example: give the first set for the grammar below: E -> TX T -> (E) | int Y X -> + E | eps Y -> * T | eps  Firstly, the first set of terminals is their singleton set: First(+) = {+} First(*) = {*} First(()) = {()} First()) = {)} First(int) = {int}  The first set of non-terminal contains the first set of first character: First(E) contains First(T) First(T) contains First(() and First(int) = {(, int} Now since First(T) does not have eps, we should not add First(X) to First(E). So First(E) = First(T) = {(, int} First(X) = {+, eps} First(Y) = {*, eps}  ### Follow Set Follow Set: what token can follow $S$ \text{Follow}(X) = \{t | S \to^* \beta X t \delta\} Observation: • $\text{First}(B) \subseteq \text{Follow}(A) \land \text{Follow}(X) \subseteq \text{Follow}(B)$ if $X \to AB$ • $\text{Follow}(X) \subseteq \text{Follow}(A)$ if $X \to AB \land B \to^* \epsilon$ • $\ \in \text{Follow}(S)$ if $S$ is the start symbol. Algorithm Sketch • $\ \in \text{Follow}(S)$ • $\text{First}(\beta) - \{\epsilon\} \subseteq \text{Follow}(X)$ for each production $A \to \alpha X \beta$ • $\text{Follow}(A) \subseteq \text{Follow}(X)$ for each production $A \to \alpha X \beta$ where $\epsilon \in \text{First}(\beta)$ ### Parsing Table Construction For each production rule $A \to \alpha$, do: • for each terminal $t \in \text{First}(\alpha)$, T[A, t] = a • if $\epsilon \in \text{First}(\alpha)$, for each $t \in \text{Follow}(A)$, T[A, t] = a • if $\epsilon \in \text{First}(\alpha) \land \ \in \text{Follow}(A)$, T[A,$] = a

Note that LL(1) parsing table can only be built for LL(1) grammar.

The only mechanical way to check for LL(1) grammar is to build the parsing table. (although quick checks includes: non-ambiguous, non-left-recursive, non-left-factored, and more)

LL(1) grammar is too weak to describe modern languages.

## Bottom-Up Parsing

Bottom-up Parsing is the preferred method, can be just as efficient. It is more general than (deterministic) top-down parsing.

Bottom-up Parsing: reduces a string to the starting symbol by inverting production rules. (reduction)

Bottom-up parser traces a rightmost derivation in reverse.

Note that we try to expand right most derivation (e.g. we choose to parse E first in T + E)

### Shift-Reduce Parsing

Consequence of right-most derivation: when you see $\alpha \beta \omega$, then if the next production rule to apply reversely is $X \to \beta$, then we know $\omega$ must be a terminal.

So we can let the $\omega$ be terminal come from the input steam of tokens!

Shift: add a token from the token steam to our working set from the right hand side.

Reduce: reversely apply production rule to the left hand side.

How do we know when and where to shift and reduce?

Shift-reduce Conflict: when the parse is free to choose either do shift or do reduce in the next round. (almost expected)

Reduce-reduce Conflict: when the parser is possible to perform more than one possible reduce rules, indicating the grammar is bad.

Shift pushes a terminal onto stack. Reduce pop 0 or more symbols off the stack and push produced symbols on the stack.

Table of Content