Semantic Analysis:
all identifiers are declared
type check
inheritance relationship
classes defined only once
method in class defined only once
reversed identifiers are not misused
...
Static Scope: scope depends only on the program text, not run-time behavior.
Dynamic Scope: depends on execution of the program. (Lisp)
Identifiers:
class declarations (class name)
method definitions (method name)
let expression
formal parameters
attribute (fields in class) definitions
case expressions
Not all identifiers follow the most-closely nested rule. Class name should be able to accessed from anywhere. (Globally visible, or can be used before define). This means that we need multiple passes (generally more passes is better for code maintenance).
For simple symbol table, we can use a stack.
whenever we have let
expression, one of branch will be init
so we add a symbol (variable) to stack when encountered init
we visit other descendents of let
after visits all descendents, back to node let
and pop symbol from stack
careful that we don't want redefinition in function input
We need to implement: enter_scope()
, find_symbol(x)
, add_symbol(x)
, check_scope(x)
, exit_scope()
user declares types for identifiers. compiler infer types for every expression (type inference).
Type:
a set of values
a set of operations on those values
It is not obvious why we need type though. The assembly language can do any operation between any types without error.
Type is an invention to help human reasoning.
Different Languages on Typing:
Statically Typed: C, Java
Dynamically Typed: Python, Perl, Lisp
Untyped: no type checking (machine code)
In practice, you see statically typed or dynamically typed language converge. - static people use "escape" to do unsafe type cast - dynamic people write optimization and add static typing for debugging
Formalism:
regular expression
context free grammars
logical rule of inference
We will use logical rule of inference in type checking
The above reads if it is provable that Hypothesis and Hypthesis is true, then it is provable the Conclusion is true.
Sound: a type system is sound e : T actually hold if \vdash e : T
Types are computed in a bottom-up pass over abstract syntax tree (AST)
Free Variable: a variable that is not defined within the expression
Example:
x
is free inlet y in x + y
buty
is not.
Type Environment: a function O : \text{ObjectIdentifier} \to \text{Types} It can give type for free variables. We can write O \vdash e : T
Overriding Types: O[T / x] means we modifies O such that x map to type T and all other entries stay the same.
Type environment is passed down to leaf from root
Subtyping: we define relation \leq on classes. So X \leq Y if X (child) inherits from Y (parent)
Consider if e0 then e1 else e2 fi
, the result type can be either e1
or e2
. The only thing we can guarantee is the common parent for e1
and e2
(smallest supertype larger than both e1
and e2
)
lib(X, Y)
: least upper bound (common parent)
Function (method) and object identifiers typically live in different namespace (no conflict if you declear a variable that has the same name as function)
above means in class C we have a method f with input x_1, ..., x_n of type T_1, ..., T_n return a type of T_{n +1}.
Full Type Environment:
O: object mapping
M: methods mapping
C: current class for SELF_TYPE
Only dispatch rules need function signature M, and all other need only O.
To write code for TypeCheck
for Let-Init, we can do:
TypeCheck(OMC, let x : T <- e0 in e1) = {
T0 = TypeCheck(OMC, e0);
T1 = TypeCheck(OMC.add(x : T), e1);
assert subtype(T0, T1);
return T1;
}
Table of Content