Lecture 003

Lexical Analysis

Token Class (or Class)

Job of Lexical Analysis:

Example: x=0;\n\twhile (x < 10) { \n \tx++; \n } has

Fortran and Lookahead

In Fortran, white space is completely ignored:

Lookahead (left to right scan): So in fortran, in order to understand whether DO is a token on its own, we need to look for whether we have , or ..

A good implementation of compiler avoids lookahead.

Fortran is deigned this way because we accidentally put blankspace in punchcard.

It is impossible to completely avoid lookahead because when looking at =, you don't know whether it will be == in the end.

Programming Language 1: it is a programming language designed to have no keyword reserved. Consider the following program"

Unbounded lookahead: no way to bound the length of lookahead.

// QUESTION: why not just use white space??? Avoid lookahead?

Regular Language

We use regular language to define a token class. We use regular expression tyo match a lexemes to a token class.

The following is the grammar of regular expression

R = \epsilon
  | `c` in Sigma
  | R + R
  | RR
  | R*

Meaning Function

Meaning Function

Meaning Function

Meaning Function: L : syntax -> semantics

Semantics: language set, which is a set of string

There are many ways to arrange a sentence to express the same idea.

Why we need meaning function:

Note that L is never one-to-many. Otherwise, our programming language is ill-defined.

Regular Expression for Token Classes

Regular Expression for Token Class:

Some other regular expression syntax:

Table of Content