Reading from Address:
(primitive datatype should not span multiple cache line, but this can happen)
Direct Mapped:
Direct Mapped: there are 1 line per set
E-Way associative: there are E lines per set (no identical tag within a set)
Background: copies of data exists in L1, L2, L3, Main-Mem, Disk
Write-hit Policy
Write-through: update all copies in all location
Write-back: defer write until replacement happen
dirty bit
for each line to indicate whether one of more byte is updatedWrite-miss Policy
Write-allocate: load into cache, then update
No-write-allocate: write straight to memory, without loading to cache
Typical Combination:
i-cache: store code, don't change
d-cache: store data
Miss Rate: percentage of miss per access Hit Time: time to check and deliver a line to CPU (typically 4 cycle for L1, 10 cycle for L2) Miss Penalty: typically 50-200 cycles for main memory
(Therefore, 99% hits is twice as good as 97%, so we talk about miss rate rather than hit rate)
Read throughput: read bandwidth Memory Mountain: measured read throughput as a function of spatial and temporal locality
Aggressive Pre-fetching: Guessing access pattern to fetch before CPU request
Spacial Locality: slope as stride increases
Temporal Locality: switch from L1, L2, to L3... as more bytes need to be accessed
We only care about the inner-most loop
from Youtube
red indicate cache miss
yellow indicate cache hits
Optimized
No Blocking: \frac{9}{8}n^3 miss
Blocking: \frac{1}{4B}n^3 miss
Use largest block size B, such that B satisfies 3B^2 < C (Fit three blocks in cache!)
Table of Content