Note that the language (^i)^i | i \in \mathbb{N} cannot be expressed using regular langauge and is common in programming language.
Parser: taking a strings of tokens and produce a parse tree
Note that in some compiler, Lexer and Parser are in one component.
Context Free Grammar: whenever we see things, replacing using the production rule is also in language
a set of terminals: T (in the context of programming languages, terminals are tokens)
a set of non-terminals: N
a start symbol S \in N
a set of productions: X \to Y_1 ... Y_N for X \in N and Y_i \in N \cup T \cup \{\epsilon\}
How Context Free Grammar Work:
Example: in COOL, the lowercase are terminals and UPPERCASE are terminals
EXPR -> if EXPR then EXPR else EXPR fi
EXPR -> while EXPR loop EXPR pool
EXPR -> id
The last one is just a identifier.
So the definition of context language G is: with starting symbol S
Example: paraphrasis can be expressed as: - starting symbol: S - productions: S \to (S), S \to \epsilon - terminals: \{\{, \}\} - non-terminals: \{S\}
So far, context free grammar just give us the answer as decision problem, but we need to build a tree. And we need a nice error handling.
There will be cases it is necessary to modify the grammar for context-free parser to accept, unlike regular language.
Parse Tree: leaves are terminals and non-leaves are non-terminals. (For binary operation, the parse tree is trinary since you include your operation as token)
Left derivation: we always replacing left-most terminal
Right derivation: we always replacing right-most terminal
Other derivation: there can exist other derivation rules
Ambiguous: if a grammar has more than one parse tree
Example: id * id + id
is ambiguous with respect to the following rules
E \to E * E
E \to E + E
When a programming language is ambiguous, you are leaving up to compiler to pick one of many possible interpretation.
To fix ambiguous grammar, we can enforce an order by using E' instead of E.
Example: we want
if
E
then
if E then E else E
where the else
is associated tho the unmatched then
. The following rule is used:
E -> MIF // matched then
| UIF // unmatched then
MIF -> if E then MIF else MIF
| OTHER
UIF -> if E then E
| if E then MIF else UIF
It is impossible to automatically convert an ambiguous grammar to an unambiguous one. Usually, we define left or right associative
%left +
,%left *
.However, the parser does not understand associativity and does not always behave like associativity. Be caution!
Table of Content