Lecture 007

Macros

'#define': use it when a number could not be derived from the program itself. Often it is some hardware specification constant. So if you later changed some hardware settings, you don't need to find and replace (and often missing other regions).

Multiline Macros

Do not put multiple lines in #define because:

#define INC_TWICE(x) ++x; ++x

if (ac) INC_TWICE(ac);

will compile to

if (ac) ++ac; ++ac;

Which is not expected.

Parentheses

#define TWICE(x) 2*x

TWICE(1+q) /* 2*1+q != 2*(1+q) */

Plus Plus

#define MAX(x,y) ((x) > (y) ? (x) : (y))
MAX(a++, b++) /* compiler is free to do either increment before comparison or after comparison, behavior is undefined */

Synchronization

1 Memory Barrier Instruction

Pipe predictor may reorder memory operations for optimization. To prevent this, we need memory barrier instructions.

MFENCE, LFENCE, SFENCE (x86) ISB, DMB, DSB (ARM) EIEIO (IBM PowerPC)

See Here for more details about memory model and memory barrier. Also see "Release Consistency".

But for now, it is safe to assume x86 does not reorder memory operations.

2 Operations

We have multiple implementations of each methods:

uniprocessor vs multiprocessor
special hardware vs special algorithm
different os techniques
performance tuning for special cases

Atomic Instruction Sequence

Mutex Assumption:

short sequence of instructions
if other people interleave the same sequence, bad
if other people interleave other related sequence, bad
probability of collision is low

Instructions should be faster than syscall during non-collision

Voluntary De-scheduling

I process may want the scheduler to not run itself.

The code exhibits the following pattern:

LOCK_WORLD();
while (!(ready = scan_world())) {
  UNLOCK_WORLD();
  WAIT_FOR(event);
  LOCK_WORLD();
}

signal

3 Critical Sections Properties

Mutual Exclusion: at most one thread is executing each critical section at any time.

Progress: the program that decide who runs the critical section should be finish within bounded time. (Note that threads who don't want to run in critical section might not answer, and therefore block the program forever.)

Bounded Waiting: "I can guarantee that I getinto critical section after other threads get into critical section no more than 2 times."

2-process Solution

This solution assumes:

multiple threads (1 CPU but with scheduling, or multiple CPU)
we have shared memory, but no locking/atomic instructions
no threads "runs at zero speed"

Notation: i = 0 means us, j = 1 means other process. These are thread-local variables.

volatile int turn = 0;
while (turn != 1) continue;
... critical section ...
turn = j;

Above code satisfies mutual exclusion, but not progress or bounded waiting because T0 never tries to enter critical section, T1 will wait forever.

However, the following code magically work:

volatile boolean want[2] = {false, false};
volatile int turn = 0;

// entry section
want[i] = true;
turn = j;
while (want[j] && turn == j) continue;

// critical section
... critical section ...

// exit section
want[i] = false;

If you can synchronize between 2 processes, you can synchronize between N processes in $O(\log N)$ time using tree structure.

People do not use this algorithm in practice.

N-process "Bakery Algorithm"

People do not use this algorithm in practice either.

inspired by "take a number" system in waiting for bakery shop.

// entry section
// pick a number
// (note that on modern hardware, we need memory barrier here)
choosing[i] = true;
number[i] = max(number[0], ..., number[N-1]) + 1;
choosing[i] = false;

// proving that I have the smallest number
for (j = 0; j < n; ++j) {
  while (choosing[j]) continue; // wait if j is picking a number
  while (number[j] != 0 &&
         (number[j] < number[i] ||
         (number[j] == number[i] && j < i))) continue; // wait if j has smaller number
}

// critical section
... critical section ...

// exit section
number[i] = 0;

Table of Content