Lecture 008

Handel

How do we decide shift and reduce?

Consider int * int + int, with production rules E -> T + E | T T -> int * T | int | (E) Then if we do reduction greedily when we have T | * int + int, we will get stuck since there is no production rules begin with * token.

Handel: a handel is a sequence of symbols on the stack that, once reduction is applied to the handel, will never get wrong (ie. always get to E)

there is no known efficient algorithm for recognize handels
On some context free grammar, there are heuristics that always guess correct handels

Note that handels only appear at the top of the stack, never inside.

Grammar

Grammars:

All Context Free Grammar (CFG)
Unambiguous (CFG)
LR(k) CFG: deterministic
LALR: simplification
Simple LRG: we care about

Prefix

Viable Prefix: $\alpha$ is a viable prefix if $\alpha | \omega$ is a valid state. We know viable predix is a prefix of the handel.

For any grammar, the set of variable predix is a regular language.　We can show this by constructing a DFA to recognize the viable predix.

Item (LR(0) items): items for the a production rule is putting a dot at each $n+1$ index.

Example: the items for $T \to (E)$ are:

T -> .(E) T -> (.E) T -> (E.) T -> (E).

Example: the items for $T \to \epsilon$ is $T \to .$

Items of a production describes possible state of the parser that can eventually use that production.

Example: the items for $T \to (E.)$ says that the current state of the parser is (E|) and we could use the production rule $T \to (E)$ , so we hope to see $)$ in the future.

Structure of the stack: the stack contains many prefixes of the right hand side of production rules:

$\text{Prefix}_1 \text{Prefix}_2 ... \text{Prefix}_n | ...$

Observe:

$\text{Prefix}_i$ is prefix of $X_i \to \alpha_i$
$\text{Prefix}_i$ will eventually reduce to $X_i$ (if no error)
The reduced $X_i$ from $\text{Prefix}_i$ will eventually combine with $\text{Prefix}_{i-1}$ to form another prefix of $\alpha_{i - 1}$ (ie. there is $X_{i - 1} \to \text{Prefix}_{i - 1} X_i \beta$ for some $\beta$ )
by induction,

Recognizing Handel

To recognize handels, we use NFA to try all possible routes by feeding NFA our current stack. If NFA accepts if and only if our current stack is a handel. (similar to DPLab in 15210)

Here are the steps

we add a dummy production $S' \to S$ to the set of all productions $G$
NFA states are "canonical collection of LR(0)" items of $G$ (including added state)
For item $E \to \alpha . X \beta$ (where $X$ is any terminal or non-terminal symbol), we add transition $(E \to \alpha . X \beta) \to^X (E \to \alpha X . \beta)$ (this represent we now go on checking whether shifted version can be satisfied)
For item $E \to \alpha . X \beta$ (where $X$ is any non-terminal symbol) and production $X \to \gamma$ , we add $\epsilon$ -transition $(E \to \alpha . X \beta) \to^\epsilon X \to .\gamma$ (this represent we now go on checking whether $X$ can be satisfied)
Every state is an accepting state (can still reject by taking in wrong symbol)
The start state is $S' \to .S$

Valid Item: item $X \to \beta . \gamma$ is valid for a viable prefix $\alpha \beta$ if we have

$S' \to^* \alpha X \omega \to \alpha \beta \gamma \omega$

So this is saying after seeing $\alpha \beta$ on the stack, we know item $X \to \beta . \gamma$ production can be used if we read more token from the steam, then $X \to \beta . \gamma$ is a valid item.

Note that DFA will terminate on the state that is a valid item.

Example: item $T \to (.E)$ is valid for (, ((, (((, ...

Example of DFA converted fron NFA

SLR Parsing

LR(0) parsing: at state $\alpha | t$

reduce $X \to \beta$ : if DFA terminates in state contain $X \to \beta.$
shift: if DFA terminates in state contain $X \to \beta.t\omega$

However, there might be a reduce/reduce conflict or shift/reduce conflict.

For example, if a state contains both

E -> T. // this tell us to reduce T E -> T. + E // this tell us to shift until seeing E

then there is a conflict

We fix the issue by adding one more rule.

SLR parsing: at state $\alpha | t$

reduce $X \to \beta$ : if DFA terminates in state contain $X \to \beta.$ and $t \in \text{Follow}(X)$
shift: if DFA terminates in state contain $X \to \beta.t\omega$

If there are still conflicts, then the grammary is not SLR.

You can see shift-reduce conflict in precedence: take E * E + E as example

we can apply E -> E * E.
we can apply E -> E. + E
It is really conflict resolution, but for more complicated grammar, things are different

SLR Parsing

let $M$ be DFA
let $|x_1...x_n\$$ be initial configuration
Repeat until configuration is $S | \$$
1. Let $\alpha | \omega$ be current configuration
2. Run $M$ on $\alpha$
3. (don't need to check if $M$ reject, since we will check below)
4. $M$ ends at state with items $I$ , let $a$ be next input 1. Shift if $X \to \beta . a \gamma \in I$ 2. Reduce if $X \to \beta . \in I \land a \in \text{Follow}(X)$ 3. Report parsing error if neither applies (containing situration when $\alpha$ is not a viable prefix)

To improve efficiency of DFA, we can store not just symbol in the stack but associate each symbol with the DFA state. This is so that we don't need to run DFA all over again when there is tiny changes on top of the stack.

LR(1) is more powerful than SLR since lookahead is built in item. e.g. T -> . int * T, $ is an item with $ (or any terminals) being lookahead (allow more fine-grained lookahead than entire follow set).

Table of Content