Lecture 011

Basic of Malloc

void *malloc(size_t size):

for small size heap allocation
success: return pointer to a memory block of at least size bytes with 16-byte alignment (on x86-64); if size == 0, return NULL
fail: return NULL; set errno to ENOMEM

void free(void *p):

return the block pointed at by p to pool of available memory
require: p come from allocation

Other functions:

calloc: set allocated blocks to 0
realloc: change size of previously allocated block
sbrk: used only internally, tracking the heap size based on system call

Programming Assignment

Simple Implementation

In mm-reference.c implement mm-malloc and mm_free

word-size: 8 byte (nothing is smaller than 8 byte)
align: 2 word size = 16 byte

Explicit Allocators:

no restriction on size input
can't reorder or buffer allocation request
must be allocated to free memory (no overlap of two memory)
must satisfy 16 byte alignment
cannot manipulate allocated memory, but can manipulate free space
cannot move allocated blocks (except for realloc, move only the block passed into it)

Performance

Aggregate payload: $P_k$

malloc(p) results in a block with a payload of p bytes
$P_k$ is the sum of currently allocated payloads not being freed

Current heap size: $H_k$

$H_k$ monotonically non-decrease
heap size only grow when allocator uses sbrk
heap size including overhead

Overhead (after k+1 requests): $O_k = H_k / (max_{i \leq k}) - 1.0$

fraction of heap space that is not program data

Utilization: $\frac{\text{peak payload}}{\text{heap size}}$

Simple Benchmark Visualization 10 blocks

Fragmentation

Internal Fragmentation

alignment requirement
headers to store memory info about allocated block (before / after payload); (since header is a word (8 byte), a block can be at most $2^{64}$ byte virtual memory)
other policy about minimum size of an allocated block
Perfect fit
Perfect fit: 1.5% overhead (given we can freely move allocated blocks)

External Fragmentation: fragmentation by discontinue allocation

Best Fit: 8.3% overhead
Best fit

Implementation

Tracking the Size

Header: store a word before allocated block to indicate block size (in bytes, including itself, payload, and padding)

note: header is not aligned; payload is aligned TODO? Why 16 byte align
Visualization of Header

Tracking the Next Free Block

Implicit list: use block size (header) and additional tag (indicate whether the block is allocated/free) as a linked-list (footer if double-linked list are used for coalesce), Not used in practice for malloc/free because of lineartime allocation.

more jumps
allocate: linear time worst case (all blocks)
free: constant time worst case
overhead: depend on placement policy

Explicit list: store address of next free block in freed space

less jumps
allocate: linear time worst case (free blocks, faster when memory full)
overhead: more internal fragmentation due to the fact the smallest free block must have at least 4 words instead of 2 (so that we can have 1 word payload)
require freed blocks

Segregated free list: multiple linked list to track difference size of free block

allocate: O(log(n)) for power of two size class
overhead: first fit search approximates best-fit search (equivalent when one size class for each size)

Blocks sorted by size: balanced tree (red-black tree) with pointers within each free block, and the length used as a key

Policy Summary

Placement policy:

First fit: Search list from beginning, choose first free block that fits, fragmentation at the beginning
Next fit: Like first fit, but search list starting where previous search finished, worse fragmentation
Best fit: choose the best free block (still greedy)
fit lines performance
segregated free lists
Trades off lower throughput for less fragmentation

Splitting policy:

How much internal fragmentation are we willing to tolerate?

Coalescing policy:

Immediate coalescing: coalesce each time free is called
Deferred coalescing: try to improve performance of free by deferring coalescing until needed.

Segregated List (Seglist)

Seglist: create lists for each size classes free blocks (separate classes for small size, one class for larger size [2^i+1, 2^{i+1}])

size class: 32, 48, 64, 80, 96, 112, 128, 256, 512, 1024, ... (we have 128 Byte global space for assignment, 16 pointers)

Placement policy:

search list for block size $m >= n$
if found: split block, place fragment; else: try next larger class
either found, or request using sbrk()

Freeing policy: coalesce and place on appropriate list

Explicit Free List

Explicit List:

allocated block (x16 byte)
- header (1 word)
  - previous allocated (1 bit)
  - allocated (1 bit)
- previous free address (1 word = 8 byte)
- next free address (1 word = 8 byte)
free block (x16 byte)
- header (1 word)
  - previous allocated (1 bit)
  - allocated (1 bit)
- footer (1 word)
  - previous allocated (1 bit)
  - allocated (1 bit)

Splitting Policy: maintain free block to where it was

Splitting Policy

Freeing Policy:

Address-ordered: O(n) time, don't use
Address-unordered: O(1) time, worse fragmentation
- Last-in-first-out: insert at the beginning
- First-in-first-out: insert at the end

Coalescing Policy:

Coalescing Case 1
Coalescing Case 2
Coalescing Case 3
Coalescing Case 4
Doubly-linked list

Implicit List

Implicit list:

header: a word, store size of overall block (including header, payload, footer, multiple of 16 byte = 2 words)
- TODO: can header take up space reserved for alignment?
- allocated tag: 1 bit overlaps with the end of header store whether the block is allocated or free (since block size is multiple of 16 byte due to 16 byte alignment, the word that store the size of block will always have four zeros at the end. We use the last one to store tag.)
- Previous Allocated Tag
- previous allocated tag: 1 bit tag update on free process; only read and coalesce previous block when previous allocated is 0.
  - Free Case 1
  - Free Case 2
  - Free Case 3
  - Free Case 4
  - during free, if 1, do nothing (preserve [previous allocated tag])
  - if 0, merge with previous block by looking at previous footer (preserve previous footer)
  - after free, update next block's [previous allocated tag]
payload: head of payload aligned with 16 byte
footer: a word, same as header, used for coalesce left (backward), only exist in free block
beginning, end of heap block: we need heap header and footer to prevent checking if end of or beginning of the heap when freeing block

Implicit list visualizations:

Overall Picture with Start and End Block
Splitting
Right Coalesce
Boundary Tags

typedef uint64_t word_t;
typedef struct block {
  word_t header;
  unsigned char payload[0];
} block_t;

void *get_payload(block_t B) {
  return (void *)(B->payload);
}

block_t *get_block((void *) bp) {
  return (block_t *) ((unsigned char *) bp - offsetof(block_t, payload))
}

bool get_alloc(block_t B) {
  return B->header & 0x01;
}

uint64_t get_size(block_t B) {
  return B->header & ~0xfL;
}

block_t new_block(...) {
  ...
  block->header = size | 0x01;
}

static block_t *find_next(block_t *block) {
  return (block_t *) ((unsigned char *) block + get_size(block));
}

// @brief choose first free block that fits (11.9% overhead)
//        O(n_block) time
//        may cause fragments at the beginning
static block_t *first_fit(size_t asize) {
  block_t *block;
  for (block = heap_start; block != heap_end; block = find_next(block)) {
    if (!get_alloc(block) && asize <= get_size(block)) return block;
  }
  return NULL;
}

// @brief search starting where previous search finished (21.6% overhead)
//        faster than [first_fit] to avoid re-scanning
//        but worse fragmentation
static block_t *next_fit(size_t asize) {

}

// @brief search entire list, choose best free block (8.3% overhead)
//        with fewest byte left over
//        improve memory utilization
//        slower than [first_fit]
//        greedy algorithm, not guarantee optimality
static block_t *best_fit(size_t asize) {

}

// @brief perfect fit, not achivable (1.6% overhead)
static block_t *best_fit(size_t asize) {

}

// @WARNING: incomplete code
void write_header(block_t *block, size_t asize, bool allocated) {
  ...
}
static void split_block(block_t *block, size_t asize) {
  size_t block_size = get_size(block);

  if ((block_size - asize) >= min_block_size) {
    write_header(block, asize, true);
    write_footer(block, asize, true);
    block_t *block_next = find_next(block);
    // below code should assert (block_size - asize > 0)
    write_header(block_next, block_size - asize, false);
    write_footer(block_next, block_size - asize, false);
    ...
  }
  ...
}

// free will `coalesce_right` two free blocks together

// @brief from header of a block to its footer
const size_t dsize = 2*sizeof(word_t);
static word_t *header_to_footer(block_t *block) {
  size_t asize = get_size(block);
  return (word_t *) (block->payload + asize - dsize);
}

static word_t *find_prev_footer(block_t *block) {
  return &(block->header) - 1;
}

void *mm_malloc(size_t size) {
  // block should be multiple of 16 bytes
  // payload is at the start of the alignment
  // round_up(n, m) = m * ((n + m - 1) / m)
  size_t asize = round_up(size + dsize, dsize);
  block_t *block = find_fit(asize);

  if (block == NULL) return NULL;

  size_t block_size = get_size(block);
  write_header(block, block_size, true);
  write_footer(block, block_size, true);

  split_block(block, asize);

  return header_to_payload(block);
}

void mm_free(void *bp) {
  block_t *block = payload_to_header(bp);
  size_t size = get_size(block);

  write_header(block, size, false);
  write_footer(block, size, false);

  coalesce_block(block);
}

Additional Information

D. Knuth, The Art of Computer Programming, vol 1, 3rd edition, Addison Wesley, 1997 (The classic reference on dynamic storage allocation)

Wilson et al, “Dynamic Storage Allocation: A Survey and Critical Review”, Proc. 1995 Int’l Workshop on Memory Management, Kinross, Scotland, Sept, 1995. (Comprehensive survey on csapp.cs.cmu.edu)

Table of Content