Review

Integer

Comparison

If there is a mixed signed and unsigned in expression, everything will be cast to unsigned (therefore using unsigned relation).

-x = ~x + 1 (we plus 1 because ~x + x = -1) (This rule is good except for T_Min)

Multiplication

Unsigned: w bits -> 2w bits

Signed: w bits -> 2w bits

Multiplication: discard bits

Power of 2 Multiplication:

Power of 2 Division:

Endian

Endian 2

Endian 2

Normalized

Normalized: exponent neither all 0s nor all 1s

Exponent (E)

Bias(constant) = 2^{k-1} - 1

exp: the actual bits stored as exponent

E = exp(unsigned) - Bias

Significand (M)

frac: the actual bits stored as significand

M = 1.frac

Denormalized

Denormalized: exponent all 0s

Exponent (E)

E(constant) = 1 - Bias = 1 - (2^{k-1} - 1) = -(2^{k-1}-2)

Significand (M)

M = 0.frac

Special

Special: exponent all 1s

infinity: exp = all 1s and frac = all 0s Nan: exp = all 1s and frac != all 0s

Assembly

movq [Source] [Dest] (unix) - immediate (constant integer): $0x400, $-533 - register: %rax, %r13 (but %rsp is reserved for special use) - memory: 8 bytes memory: (%rax) (cannot move form memory to memory)

D(Rb, Ri, S) = (Rb + Ri * S) + D (leaq: move but without memory access)

pushq [src] (7th argument is at %rsp)

call [label]

ret: pop %rip

TODO: what instruction should we remember?

TODO: linking

Array

multi dimensional array does not align up memory within itself

Memory

Stack: run-time. 8MB limit Heap: malloc() Data: global, static variable, string constant Text: Code, Libraries

SRAM: refresh in spacial locality (so memory access with spacial locality is good)

(memory address has 4 useless 0s)

TODO: union in lecture 008

Cold (compulsory) miss: when starts empty, first reference to the block (nothing in the cache right now)

Capacity miss: when working set is larger than the cache (no enough cache space to store all needed data for a program, cache is full and your data is not here)

Conflict miss: when multiple objects all map to the same level k block (hash collision: block i at Lk+1 must be placed in block i mod 4 at Lk, referencing blocks 0, 8, 0, 8,... will miss every time)

Write-back + Write-allocate + Copy-on-Write (Anonymous file)

When consider miss rate, we simulate most inner loop execute approach infinite time. Final miss rate is the sum of miss rates.

Page Table

Purpose of Virtual Memory

N = 2^n: number of address in virtual address space M = 2^m: number of addresses in physical address space P = 2^p: page size (bytes)

Virtual Address (VA)

Physical Address (PA)

%cr3: control register, store address of page table in physical memory

Address Example: 9+9+9+9+12=48 virtual address, 40+12=52 physical address, 4KB page size (2^12 byte page table, 8 byte payload per entry, 2^9 entries)

Optimization

Optimization Blockers:

Optimization

  1. use correct data structure for asymptotic complexity
  2. Good compiler flag (-O3)
  3. Watch out for optimization blocker
    • procedure calls
    • memory reference
  4. Optimize inner most loops
  5. exploit instruction-level parallelism
  6. avoid unpredictable branches
  7. cache friendly

Linking

TODO: lecture 13

Table of Content