# Review

## Integer

### Comparison

If there is a mixed signed and unsigned in expression, everything will be cast to unsigned (therefore using unsigned relation).

-x = ~x + 1 (we plus 1 because ~x + x = -1) (This rule is good except for T_Min)

### Multiplication

Unsigned: w bits -> 2w bits

• range: [0, (w^w - 1)^2 = 2^{2w+1} + 1]

Signed: w bits -> 2w bits

• range: [-2^{2w - 2} + 2^{w - 1}, 2^{2w - 2}]

• signed vs unsigned: discarded bits are different, but lower preserved bits are the same

Power of 2 Multiplication:

• u << k = u * 2^k (both signed and unsigned)

Power of 2 Division:

• u >> k = roundDown(u / 2^k)

• (x + (0x01 << k) - 1) >> k = roundZero(x / 2 ^k)

### Normalized

Normalized: exponent neither all 0s nor all 1s

#### Exponent (E)

Bias(constant) = 2^{k-1} - 1

exp: the actual bits stored as exponent

• [1, 2^{k}-2]

• float: [1, 254]

• double: [1, 2046]

• other: [1, 2^k]

• increase as the represented number increase

E = exp(unsigned) - Bias

• [-(2^{k-1}-2), 2^{k-1}-1]

• float: [-126, 127]

• double: [-1022, 1023]

#### Significand (M)

frac: the actual bits stored as significand

• increase as the represented number increase

M = 1.frac

• float/double: [1.0, 2.0)

### Denormalized

Denormalized: exponent all 0s

#### Exponent (E)

E(constant) = 1 - Bias = 1 - (2^{k-1} - 1) = -(2^{k-1}-2)

• original exp = 0, original E = -127

• now exp = 1, now E = -126

M = 0.frac

### Special

Special: exponent all 1s

infinity: exp = all 1s and frac = all 0s Nan: exp = all 1s and frac != all 0s

## Assembly

• %rsp: stack pointer

• %rbp: base pointer

• %r10, %r11: caller-saved but not argument

• condition codes: set by cmpq, testq, jne(not equal / not zero)

• CF: Carry Flag (for unsigned)
• ZF: Zero Flag
• SF: Sign Flag (for signed)
• OF: Overflow Flag (for signed)
• ja (unsigned) may be optimized to do if (x > 6 || x < 0) when x is signed.
• cmov: val = Test ? Then_Expr : Else_Expr when safe (calculate both result. don't risk deference, side effects, expensive calculation)

movq [Source] [Dest] (unix) - immediate (constant integer): $0x400, $-533 - register: %rax, %r13 (but %rsp is reserved for special use) - memory: 8 bytes memory: (%rax) (cannot move form memory to memory)

D(Rb, Ri, S) = (Rb + Ri * S) + D (leaq: move but without memory access)

pushq [src] (7th argument is at %rsp)

• sub 0x08, %rsp

• mov [src], (%rsp)

call [label]

• push %rip

• jmp [label]

ret: pop %rip

TODO: what instruction should we remember?

## Array

multi dimensional array does not align up memory within itself

## Memory

Stack: run-time. 8MB limit Heap: malloc() Data: global, static variable, string constant Text: Code, Libraries

SRAM: refresh in spacial locality (so memory access with spacial locality is good)

(memory address has 4 useless 0s)

TODO: union in lecture 008

Cold (compulsory) miss: when starts empty, first reference to the block (nothing in the cache right now)

Capacity miss: when working set is larger than the cache (no enough cache space to store all needed data for a program, cache is full and your data is not here)

Conflict miss: when multiple objects all map to the same level k block (hash collision: block i at Lk+1 must be placed in block i mod 4 at Lk, referencing blocks 0, 8, 0, 8,... will miss every time)

Write-back + Write-allocate + Copy-on-Write (Anonymous file)

When consider miss rate, we simulate most inner loop execute approach infinite time. Final miss rate is the sum of miss rates.

### Page Table

Purpose of Virtual Memory

• Uses main memory efficiently by using DRAM as cache tool on disk

• lazy mode
• shared memory for library
• Simplifies memory management to provide abstraction of linear address space to each program

• Isolate one process' memory from another (or from kernel) to provide security

$N = 2^n$: number of address in virtual address space $M = 2^m$: number of addresses in physical address space $P = 2^p$: page size (bytes)

• VPO: virtual page offset (same as PPO)

• VPN: virtual page number (what gets translated from)

• TLBL: TBL index
• TLBT: TLB tag

• PPO: physical page offset (same as VPO)

• PPN: physical page number (what gets translated to )

%cr3: control register, store address of page table in physical memory

• $VPO = PPO = log_2(\text{page size})$ because a level of page table only stores PPN (assume PPO are zeros, therefore force table to be page-size-aligned) and some extra bits

• $log_2(\frac{\text{page size}}{\text{8 byte payload}}) * n + \text{VPO} = \text{Virtual Address}$

## Optimization

Optimization Blockers:

• Procedure Calls: cannot optimize procedural call out of loop (side-effects)

• use inline function
• Memory Aliasing: two input might be same region of memory

• add restrict keyword

Optimization

1. use correct data structure for asymptotic complexity
2. Good compiler flag (-O3)
3. Watch out for optimization blocker
• procedure calls
• memory reference
4. Optimize inner most loops
5. exploit instruction-level parallelism
6. avoid unpredictable branches
7. cache friendly