Lecture 009

Semantic Analysis:

all identifiers are declared
type check
inheritance relationship
classes defined only once
method in class defined only once
reversed identifiers are not misused
...

Scope

Static Scope: scope depends only on the program text, not run-time behavior.

Dynamic Scope: depends on execution of the program. (Lisp)

Identifiers:

class declarations (class name)
method definitions (method name)
let expression
formal parameters
attribute (fields in class) definitions
case expressions

Not all identifiers follow the most-closely nested rule. Class name should be able to accessed from anywhere. (Globally visible, or can be used before define). This means that we need multiple passes (generally more passes is better for code maintenance).

Symbol Table

For simple symbol table, we can use a stack.

whenever we have let expression, one of branch will be init so we add a symbol (variable) to stack when encountered init
we visit other descendents of let
after visits all descendents, back to node let and pop symbol from stack

careful that we don't want redefinition in function input

We need to implement: enter_scope(), find_symbol(x), add_symbol(x), check_scope(x), exit_scope()

Types

user declares types for identifiers. compiler infer types for every expression (type inference).

Type:

a set of values
a set of operations on those values

It is not obvious why we need type though. The assembly language can do any operation between any types without error.

Type is an invention to help human reasoning.

Different Languages on Typing:

Statically Typed: C, Java
- avoid overhead runtime type checking
- compiler catch programmer error
Dynamically Typed: Python, Perl, Lisp
- non-restrictive
- fast prototyping
Untyped: no type checking (machine code)

In practice, you see statically typed or dynamically typed language converge. - static people use "escape" to do unsafe type cast - dynamic people write optimization and add static typing for debugging

Type Checking

Formalism:

regular expression
context free grammars
logical rule of inference

We will use logical rule of inference in type checking

$\frac{\vdash \text{Hypothesis} ... \vdash \text{Hypothesis}}{\vdash \text{Conclusion}}$

The above reads if it is provable that Hypothesis and Hypthesis is true, then it is provable the Conclusion is true.

Sound: a type system is sound $e : T$ actually hold if $\vdash e : T$

Types are computed in a bottom-up pass over abstract syntax tree (AST)

Free Variable

Free Variable: a variable that is not defined within the expression

Example: x is free in let y in x + y but y is not.

Type Environment: a function $O : \text{ObjectIdentifier} \to \text{Types}$ It can give type for free variables. We can write $O \vdash e : T$

Overriding Types: $O[T / x]$ means we modifies $O$ such that $x$ map to type $T$ and all other entries stay the same.

Type environment is passed down to leaf from root

Subtyping

Subtyping: we define relation $\leq$ on classes. So $X \leq Y$ if $X$ (child) inherits from $Y$ (parent)

Consider if e0 then e1 else e2 fi, the result type can be either e1 or e2. The only thing we can guarantee is the common parent for e1 and e2 (smallest supertype larger than both e1 and e2)

lib(X, Y): least upper bound (common parent)

Signature

Function (method) and object identifiers typically live in different namespace (no conflict if you declear a variable that has the same name as function)

$M(C, f) = (T_1, ..., T_n, T_{n + 1})$

above means in class $C$ we have a method $f$ with input $x_1, ..., x_n$ of type $T_1, ..., T_n$ return a type of $T_{n +1}$ .

Full Type Environment:

$O$ : object mapping
$M$ : methods mapping
$C$ : current class for SELF_TYPE

Only dispatch rules need function signature $M$ , and all other need only $O$ .

To write code for TypeCheck for Let-Init, we can do:

TypeCheck(OMC, let x : T <- e0 in e1) = {
  T0 = TypeCheck(OMC, e0);
  T1 = TypeCheck(OMC.add(x : T), e1);
  assert subtype(T0, T1);
  return T1;
}

Table of Content