Regular expression gives a tool to answer whether the given string belong to a language set.
However, this is not enough. Our goal is to separate a long string into different token classes (different language sets)
Steps to do Lexical Analysis:
else
can be seen as keyword or identifier, since keyword has higher priority)R = Keyword + Identifier + ...
by union==
as compairson rather than two assignments)Regular expression is implemented using finite automata
Deterministic Finite Automata: no epsilon move, and one transition input per state (input determines path, faster)
Nondeterministic Finite Automata: epsilon move, can have multiple transitions for one input (input does not determine path, smaller)
Our convertion looks like follow:
Transition to NFA:
\epsilon: a state transition to accepting state through \epsilon
a: a state transition to accepting state through a
AB: say we have machine for A and machine for B, to build AB, we modified final state of A to non-accepting state and epsilon transition to B
A+B: see image above
A*: see image above
We implement DFA using adjacency matrix or adjacency sequence (same as graph problem).
Or if you want to save space:
Table of Content