Lecture 008

Handel

How do we decide shift and reduce?

Consider int * int + int, with production rules E -> T + E | T T -> int * T | int | (E) Then if we do reduction greedily when we have T | * int + int, we will get stuck since there is no production rules begin with * token.

Handel: a handel is a sequence of symbols on the stack that, once reduction is applied to the handel, will never get wrong (ie. always get to E)

Note that handels only appear at the top of the stack, never inside.

Grammar

CF Grammars

CF Grammars

Grammars:

Prefix

Viable Prefix: \alpha is a viable prefix if \alpha | \omega is a valid state. We know viable predix is a prefix of the handel.

For any grammar, the set of variable predix is a regular language. We can show this by constructing a DFA to recognize the viable predix.

Item (LR(0) items): items for the a production rule is putting a dot at each n+1 index.

Example: the items for T \to (E) are:

T -> .(E) T -> (.E) T -> (E.) T -> (E).

Example: the items for T \to \epsilon is T \to .

Items of a production describes possible state of the parser that can eventually use that production.

Example: the items for T \to (E.) says that the current state of the parser is (E|) and we could use the production rule T \to (E), so we hope to see ) in the future.

Structure of the stack: the stack contains many prefixes of the right hand side of production rules:

\text{Prefix}_1 \text{Prefix}_2 ... \text{Prefix}_n | ...

Observe:

Example of Items

Example of Items

Recognizing Handel

To recognize handels, we use NFA to try all possible routes by feeding NFA our current stack. If NFA accepts if and only if our current stack is a handel. (similar to DPLab in 15210)

Here are the steps

  1. we add a dummy production S' \to S to the set of all productions G
  2. NFA states are "canonical collection of LR(0)" items of G (including added state)
  3. For item E \to \alpha . X \beta (where X is any terminal or non-terminal symbol), we add transition (E \to \alpha . X \beta) \to^X (E \to \alpha X . \beta) (this represent we now go on checking whether shifted version can be satisfied)
  4. For item E \to \alpha . X \beta (where X is any non-terminal symbol) and production X \to \gamma, we add \epsilon-transition (E \to \alpha . X \beta) \to^\epsilon X \to .\gamma (this represent we now go on checking whether X can be satisfied)
  5. Every state is an accepting state (can still reject by taking in wrong symbol)
  6. The start state is S' \to .S

Example NFA determine handel

Example NFA determine handel

Valid Item: item X \to \beta . \gamma is valid for a viable prefix \alpha \beta if we have

S' \to^* \alpha X \omega \to \alpha \beta \gamma \omega

So this is saying after seeing \alpha \beta on the stack, we know item X \to \beta . \gamma production can be used if we read more token from the steam, then X \to \beta . \gamma is a valid item.

Note that DFA will terminate on the state that is a valid item.

Example: item T \to (.E) is valid for (, ((, (((, ...

Example of DFA converted fron NFA

Example of DFA converted fron NFA

SLR Parsing

LR(0) parsing: at state \alpha | t

However, there might be a reduce/reduce conflict or shift/reduce conflict.

For example, if a state contains both

E -> T. // this tell us to reduce T E -> T. + E // this tell us to shift until seeing E

then there is a conflict

We fix the issue by adding one more rule.

SLR parsing: at state \alpha | t

If there are still conflicts, then the grammary is not SLR.

You can see shift-reduce conflict in precedence: take E * E + E as example

SLR Parsing

  1. let M be DFA
  2. let |x_1...x_n\$ be initial configuration
  3. Repeat until configuration is S | \$
    1. Let \alpha | \omega be current configuration
    2. Run M on \alpha
    3. (don't need to check if M reject, since we will check below)
    4. M ends at state with items I, let a be next input 1. Shift if X \to \beta . a \gamma \in I 2. Reduce if X \to \beta . \in I \land a \in \text{Follow}(X) 3. Report parsing error if neither applies (containing situration when \alpha is not a viable prefix)

SLR Parsing Example

SLR Parsing Example

To improve efficiency of DFA, we can store not just symbol in the stack but associate each symbol with the DFA state. This is so that we don't need to run DFA all over again when there is tiny changes on top of the stack.

SLR Action Table

SLR Action Table

SLR Paring Algorithm

SLR Paring Algorithm

LR(1) is more powerful than SLR since lookahead is built in item. e.g. T -> . int * T, $ is an item with $ (or any terminals) being lookahead (allow more fine-grained lookahead than entire follow set).

Table of Content