How do we decide shift and reduce?
Consider
int * int + int
, with production rulesE -> T + E | T T -> int * T | int | (E)
Then if we do reduction greedily when we haveT | * int + int
, we will get stuck since there is no production rules begin with*
token.
Handel: a handel is a sequence of symbols on the stack that, once reduction is applied to the handel, will never get wrong (ie. always get to E
)
there is no known efficient algorithm for recognize handels
On some context free grammar, there are heuristics that always guess correct handels
Note that handels only appear at the top of the stack, never inside.
Grammars:
All Context Free Grammar (CFG)
Unambiguous (CFG)
LR(k) CFG: deterministic
LALR: simplification
Simple LRG: we care about
Viable Prefix: \alpha is a viable prefix if \alpha | \omega is a valid state. We know viable predix is a prefix of the handel.
For any grammar, the set of variable predix is a regular language. We can show this by constructing a DFA to recognize the viable predix.
Item (LR(0) items): items for the a production rule is putting a dot at each n+1 index.
Example: the items for T \to (E) are:
T -> .(E) T -> (.E) T -> (E.) T -> (E).
Example: the items for T \to \epsilon is T \to .
Items of a production describes possible state of the parser that can eventually use that production.
Example: the items for T \to (E.) says that the current state of the parser is
(E|)
and we could use the production rule T \to (E), so we hope to see ) in the future.
Structure of the stack: the stack contains many prefixes of the right hand side of production rules:
Observe:
\text{Prefix}_i is prefix of X_i \to \alpha_i
\text{Prefix}_i will eventually reduce to X_i (if no error)
The reduced X_i from \text{Prefix}_i will eventually combine with \text{Prefix}_{i-1} to form another prefix of \alpha_{i - 1} (ie. there is X_{i - 1} \to \text{Prefix}_{i - 1} X_i \beta for some \beta)
by induction,
To recognize handels, we use NFA to try all possible routes by feeding NFA our current stack. If NFA accepts if and only if our current stack is a handel. (similar to DPLab in 15210)
Here are the steps
Valid Item: item X \to \beta . \gamma is valid for a viable prefix \alpha \beta if we have
So this is saying after seeing \alpha \beta on the stack, we know item X \to \beta . \gamma production can be used if we read more token from the steam, then X \to \beta . \gamma is a valid item.
Note that DFA will terminate on the state that is a valid item.
Example: item T \to (.E) is valid for
(
,((
,(((
, ...
LR(0) parsing: at state \alpha | t
reduce X \to \beta: if DFA terminates in state contain X \to \beta.
shift: if DFA terminates in state contain X \to \beta.t\omega
However, there might be a reduce/reduce conflict or shift/reduce conflict.
For example, if a state contains both
E -> T. // this tell us to reduce T E -> T. + E // this tell us to shift until seeing E
then there is a conflict
We fix the issue by adding one more rule.
SLR parsing: at state \alpha | t
reduce X \to \beta: if DFA terminates in state contain X \to \beta. and t \in \text{Follow}(X)
shift: if DFA terminates in state contain X \to \beta.t\omega
If there are still conflicts, then the grammary is not SLR.
You can see shift-reduce conflict in precedence: take E * E + E
as example
we can apply E -> E * E.
we can apply E -> E. + E
It is really conflict resolution, but for more complicated grammar, things are different
SLR Parsing
To improve efficiency of DFA, we can store not just symbol in the stack but associate each symbol with the DFA state. This is so that we don't need to run DFA all over again when there is tiny changes on top of the stack.
LR(1) is more powerful than SLR since lookahead is built in item. e.g.
T -> . int * T, $
is an item with$
(or any terminals) being lookahead (allow more fine-grained lookahead than entire follow set).
Table of Content