Consider int * int + int, with production rules
E -> T + E | T
T -> int * T | int | (E)
Then if we do reduction greedily when we have T | * int + int,
we will get stuck since there is no production rules begin with * token.
Handel: a handel is a sequence of symbols on the stack that, once reduction is applied to the handel, will never get wrong (ie. always get to E)
there is no known efficient algorithm for recognize handels
On some context free grammar, there are heuristics that always guess correct handels
Note that handels only appear at the top of the stack, never inside.
Grammar
CF Grammars
Grammars:
All Context Free Grammar (CFG)
Unambiguous (CFG)
LR(k) CFG: deterministic
LALR: simplification
Simple LRG: we care about
Prefix
Viable Prefix: \alpha is a viable prefix if \alpha | \omega is a valid state. We know viable predix is a prefix of the handel.
For any grammar, the set of variable predix is a regular language. We can show this by constructing a DFA to recognize the viable predix.
Item (LR(0) items): items for the a production rule is putting a dot at each n+1 index.
Example: the items for T \to (E) are:
T -> .(E)
T -> (.E)
T -> (E.)
T -> (E).
Example: the items for T \to \epsilon is T \to .
Items of a production describes possible state of the parser that can eventually use that production.
Example: the items for T \to (E.) says that the current state of the parser is (E|) and we could use the production rule T \to (E), so we hope to see ) in the future.
Structure of the stack: the stack contains many prefixes of the right hand side of production rules:
\text{Prefix}_i will eventually reduce to X_i (if no error)
The reduced X_i from \text{Prefix}_i will eventually combine with \text{Prefix}_{i-1} to form another prefix of \alpha_{i - 1} (ie. there is X_{i - 1} \to \text{Prefix}_{i - 1} X_i \beta for some \beta)
by induction,
Example of Items
Recognizing Handel
To recognize handels, we use NFA to try all possible routes by feeding NFA our current stack. If NFA accepts if and only if our current stack is a handel. (similar to DPLab in 15210)
Here are the steps
we add a dummy production S' \to S to the set of all productions G
NFA states are "canonical collection of LR(0)" items of G (including added state)
For item E \to \alpha . X \beta (where X is any terminal or non-terminal symbol), we add transition (E \to \alpha . X \beta) \to^X (E \to \alpha X . \beta) (this represent we now go on checking whether shifted version can be satisfied)
For item E \to \alpha . X \beta (where X is any non-terminal symbol) and production X \to \gamma, we add \epsilon-transition (E \to \alpha . X \beta) \to^\epsilon X \to .\gamma (this represent we now go on checking whether X can be satisfied)
Every state is an accepting state (can still reject by taking in wrong symbol)
The start state is S' \to .S
Example NFA determine handel
Valid Item: item X \to \beta . \gamma is valid for a viable prefix \alpha \beta if we have
S' \to^* \alpha X \omega \to \alpha \beta \gamma \omega
So this is saying after seeing \alpha \beta on the stack, we know item X \to \beta . \gamma production can be used if we read more token from the steam, then X \to \beta . \gamma is a valid item.
Note that DFA will terminate on the state that is a valid item.
Example: item T \to (.E) is valid for (, ((, (((, ...
Example of DFA converted fron NFA
SLR Parsing
LR(0) parsing: at state \alpha | t
reduce X \to \beta: if DFA terminates in state contain X \to \beta.
shift: if DFA terminates in state contain X \to \beta.t\omega
However, there might be a reduce/reduce conflict or shift/reduce conflict.
For example, if a state contains both
E -> T. // this tell us to reduce T
E -> T. + E // this tell us to shift until seeing E
then there is a conflict
We fix the issue by adding one more rule.
SLR parsing: at state \alpha | t
reduce X \to \beta: if DFA terminates in state contain X \to \beta. and t \in \text{Follow}(X)
shift: if DFA terminates in state contain X \to \beta.t\omega
If there are still conflicts, then the grammary is not SLR.
You can see shift-reduce conflict in precedence: take E * E + E as example
we can apply E -> E * E.
we can apply E -> E. + E
It is really conflict resolution, but for more complicated grammar, things are different
SLR Parsing
let M be DFA
let |x_1...x_n\$ be initial configuration
Repeat until configuration is S | \$
Let \alpha | \omega be current configuration
Run M on \alpha
(don't need to check if M reject, since we will check below)
M ends at state with items I, let a be next input
1. Shift if X \to \beta . a \gamma \in I
2. Reduce if X \to \beta . \in I \land a \in \text{Follow}(X)
3. Report parsing error if neither applies (containing situration when \alpha is not a viable prefix)
SLR Parsing Example
To improve efficiency of DFA, we can store not just symbol in the stack but associate each symbol with the DFA state. This is so that we don't need to run DFA all over again when there is tiny changes on top of the stack.
SLR Action Table
SLR Paring Algorithm
LR(1) is more powerful than SLR since lookahead is built in item. e.g. T -> . int * T, $ is an item with $ (or any terminals) being lookahead (allow more fine-grained lookahead than entire follow set).