If there is a mixed signed and unsigned in expression, everything will be cast to unsigned (therefore using unsigned relation).
-x = ~x + 1 (we plus 1 because ~x + x = -1) (This rule is good except for T_Min)
Unsigned: w bits -> 2w bits
Signed: w bits -> 2w bits
Multiplication: discard bits
Power of 2 Multiplication:
Power of 2 Division:
u >> k = roundDown(u / 2^k)
(x + (0x01 << k) - 1) >> k = roundZero(x / 2 ^k)
Normalized: exponent neither all 0s nor all 1s
Bias(constant) = 2^{k-1} - 1
exp: the actual bits stored as exponent
[1, 2^{k}-2]
float: [1, 254]
double: [1, 2046]
other: [1, 2^k]
increase as the represented number increase
E = exp(unsigned) - Bias
[-(2^{k-1}-2), 2^{k-1}-1]
float: [-126, 127]
double: [-1022, 1023]
frac: the actual bits stored as significand
M = 1.frac
Denormalized: exponent all 0s
E(constant) = 1 - Bias = 1 - (2^{k-1} - 1) = -(2^{k-1}-2)
original exp = 0, original E = -127
now exp = 1, now E = -126
M = 0.frac
Special: exponent all 1s
infinity: exp = all 1s and frac = all 0s Nan: exp = all 1s and frac != all 0s
%rsp
: stack pointer
%rbp
: base pointer
%r10, %r11
: caller-saved but not argument
condition codes: set by cmpq
, testq
, jne
(not equal / not zero)
CF
: Carry Flag (for unsigned)ZF
: Zero FlagSF
: Sign Flag (for signed)OF
: Overflow Flag (for signed)ja
(unsigned) may be optimized to do if (x > 6 || x < 0)
when x is signed.cmov
: val = Test ? Then_Expr : Else_Expr
when safe (calculate both result. don't risk deference, side effects, expensive calculation)movq [Source] [Dest]
(unix)
- immediate (constant integer): $0x400
, $-533
- register: %rax
, %r13
(but %rsp
is reserved for special use)
- memory: 8 bytes memory: (%rax)
(cannot move form memory to memory)
D(Rb, Ri, S) = (Rb + Ri * S) + D
(leaq
: move but without memory access)
pushq [src]
(7th argument is at %rsp
)
sub 0x08, %rsp
mov [src], (%rsp)
call [label]
push %rip
jmp [label]
ret
: pop %rip
TODO: what instruction should we remember?
TODO: linking
multi dimensional array does not align up memory within itself
Stack: run-time. 8MB limit Heap: malloc() Data: global, static variable, string constant Text: Code, Libraries
SRAM: refresh in spacial locality (so memory access with spacial locality is good)
(memory address has 4 useless 0s)
TODO: union in lecture 008
Cold (compulsory) miss: when starts empty, first reference to the block (nothing in the cache right now)
Capacity miss: when working set is larger than the cache (no enough cache space to store all needed data for a program, cache is full and your data is not here)
Conflict miss: when multiple objects all map to the same level k block (hash collision: block i
at Lk+1
must be placed in block i mod 4
at Lk
, referencing blocks 0, 8, 0, 8,...
will miss every time)
Write-back + Write-allocate + Copy-on-Write (Anonymous file)
When consider miss rate, we simulate most inner loop execute approach infinite time. Final miss rate is the sum of miss rates.
Purpose of Virtual Memory
Uses main memory efficiently by using DRAM as cache tool on disk
Simplifies memory management to provide abstraction of linear address space to each program
Isolate one process' memory from another (or from kernel) to provide security
N = 2^n: number of address in virtual address space M = 2^m: number of addresses in physical address space P = 2^p: page size (bytes)
Virtual Address (VA)
VPO: virtual page offset (same as PPO)
VPN: virtual page number (what gets translated from)
Physical Address (PA)
PPO: physical page offset (same as VPO)
PPN: physical page number (what gets translated to )
%cr3
: control register, store address of page table in physical memory
Address Example: 9+9+9+9+12=48 virtual address, 40+12=52 physical address, 4KB page size (2^12 byte page table, 8 byte payload per entry, 2^9 entries)
VPO = PPO = log_2(\text{page size}) because a level of page table only stores PPN (assume PPO are zeros, therefore force table to be page-size-aligned) and some extra bits
log_2(\frac{\text{page size}}{\text{8 byte payload}}) * n + \text{VPO} = \text{Virtual Address}
Optimization Blockers:
Procedure Calls: cannot optimize procedural call out of loop (side-effects)
Memory Aliasing: two input might be same region of memory
restrict
keywordOptimization
TODO: lecture 13
Table of Content