Lecture 021

Context-Free Grammar & Parsing

Process

ATO: from input to char stream
Lexical analysis: from char stream to token stream (using regular expression)
parsing: from token stream to abstract syntax tree
type checking: from abstract syntax tree to typed abstract syntax tree
evaluation: from typed syntax tree to value stream

Questions

context-free (is context the state of an automata?) can a programing language be implemented by regular grammar are there any string that can't be captured by context-free grammar?

Context-Free

mixture(defined by myself): is a set of 0~n terminal with 0~n variables

derivable:

mixture a is derivable from mixture b iff there is a rule apply to a gives mixture b (in x steps)
L(G) = {a \in sigma* | S => a (in 0 or more rules/steps)}

3 ways to think about computation:

automata
language
functions

Grammar

Ambiguous: a grammar is ambiguous if there exists more than one left most derivation of a string

another way to identify regular language:

make a string that is the longest
select a sub string, concat them, still that language

Regular: {a^n} Context Free: {a^n b^n} Not-Context Free: {a^n b^n c^n}

Button up operator Precedence Parser: use stack to stack operations

Shift-Reduce Parser: having two stack, one for symbol, the other for operation Recursive-Descant Parsing (building a tree): a parse function for every non-terminal
Expression = Expression + Expression | Expression * Expression (left recursion, bad)
Expression = Term | Expression + Term
Term = Factor | Term * Factor
Factor = integer | Expression

Computing Theory

Computer

computer: takes a stream of input, provide a stream of output, with its internal state

it has an initial state
computer can only read one char at a time
once a char is read, the char is consumed (no backtrack)

states:

accepting state (output true=1)
non-accept state (output false=0)
transition: char required to transition from the current state to next state (that demand this character)

Deterministic Finite Automata (DFA)

Deterministic: every state has every possible input Finite: finite number of state Automata: machine

Q: finite set of states sigma: finite input alphabet delta: the transition function (Q * sigma -> Q, were delta is total) q0: the start state

computation: an ordered sequence of states traversed by automata

Language

language of (): M -> set

input: a specific automata
output: the set of accepting string

language: a set of accepting string

regular language: a set of accepting string, as long as there exists some DFA (automata) accept this language

non-regular language:

example: https://www.cs.wcupa.edu/rkline/fcs/re-pump.html (infinite state machine, due to infinite memory required)

Non-Deterministic Finite Automata (NFA)

Non-Deterministic Finite Automata: suddenly jump from one state to the other without taking any characters. (useful when concatenating two DFAs)

from one state, there can be more than one way to travel given a char
for compact implementation of Star and Union

Note: you can convert NFA to DFA

Grammar

Grammar: a formulated way to produce possible language (the set of actual strings) by iteratively replacing variables with terminals

Variables: input being replaced Terminals: output of variable, can contain variable

Regular Grammar

(associated with non-deterministic finite automata) Start Variable: the variable you start with Rules: (Variable * Terminal) only 4 types of rules are allowed

A -> \empty
A -> a (where a is a terminal)
A -> B (where b is a variable)
A -> aB (where a is a terminal) OR A -> Ba (where a is a terminal)

Note: you can convert regular grammar to regular language, therefore NFA and DFA

Context Free Grammar (CFG)

(non-deterministic finite automata with stack = push-down automata, this result infinite push) Rules: (Variable * Terminal) only 5 types of rules are allowed

A -> \empty
A -> a (where a is a terminal)
A -> B (where b is a variable)
A -> aB (where a is a terminal)
A -> Ba (where a is a terminal)

well this is equivalent of having only one rule

A -> x (where x can be variables or terminals, put whatever you want)

Now we can have L = {\empty, 01, 0011, 000111, ...} = {0^n 1^n | n >= 0} which is not regular anymore

Context Free Language: produced by context free grammar

All regular language are context-free language
All regular grammar are context-free grammar

Turning-equivalent computation

Finite Automata + 2 stacks

so that you can peek into stack (infinite take + push)

Table of Content