Lecture 005

Parsing

Note that the language (^i)^i | i \in \mathbb{N} cannot be expressed using regular langauge and is common in programming language.

Parser: taking a strings of tokens and produce a parse tree

Note that in some compiler, Lexer and Parser are in one component.

Context Free Grammar: whenever we see things, replacing using the production rule is also in language

How Context Free Grammar Work:

  1. begin with a starting symbol S
  2. replace any non-terminal X in the string by the production rule
  3. repeat until everything is a terminal

Example: in COOL, the lowercase are terminals and UPPERCASE are terminals EXPR -> if EXPR then EXPR else EXPR fi

EXPR -> while EXPR loop EXPR pool

EXPR -> id

The last one is just a identifier.

So the definition of context language G is: with starting symbol S

L(G) = \{a_1 ... a_n | \forall i (a_i \in T \land S \to_{\text{in any number of steps}} a_1 ... a_n)\}

Example: paraphrasis can be expressed as: - starting symbol: S - productions: S \to (S), S \to \epsilon - terminals: \{\{, \}\} - non-terminals: \{S\}

So far, context free grammar just give us the answer as decision problem, but we need to build a tree. And we need a nice error handling.

There will be cases it is necessary to modify the grammar for context-free parser to accept, unlike regular language.

Parse Tree: leaves are terminals and non-leaves are non-terminals. (For binary operation, the parse tree is trinary since you include your operation as token)

Ambiguous: if a grammar has more than one parse tree

Example: id * id + id is ambiguous with respect to the following rules

When a programming language is ambiguous, you are leaving up to compiler to pick one of many possible interpretation.

To fix ambiguous grammar, we can enforce an order by using E' instead of E.

Example of enforcing pasing tree

Example of enforcing pasing tree

Example: we want

if
  E
then
  if E then E else E

where the else is associated tho the unmatched then. The following rule is used:

E -> MIF // matched then
   | UIF // unmatched then
MIF -> if E then MIF else MIF
     | OTHER
UIF -> if E then E
     | if E then MIF else UIF

It is impossible to automatically convert an ambiguous grammar to an unambiguous one. Usually, we define left or right associative %left +, %left *.

However, the parser does not understand associativity and does not always behave like associativity. Be caution!

Table of Content