Lecture 008

Include

What belongs to .h:

types declarations (not definitions)
exported interface routines ("public methods")
constants (#define or enum)
macros
data items exported by module (try to avoid this)

no code

For internal data structure and export, you should put stuff in [NAME]_private.h file (and don't release to public).

Atomic Instruction

Code sequence looks like

mutex_lock(&data->lock)
// some code depends on data object
mutex_unlock(&data->lock)

In the old fashion, there is a pin called lock in memory bus. When set to low, no other processor can read or write to memory. People don't do this anymore.

Lock might be implemented as: XCHG instruction

// test&set
int32 xchg(int32 *lock, int32 val) {
  register int old;
  old = *lock; /* "bus is locked" */
  *lock = val; /* "bus is locked" */
  return old;
}

There are other instructions such as XADD, CMPXCHG, CMPXCHG8B (consult intel-sys.pdf and intel-isr.pdf)

Using mutex

int lock_avaliable = 1;
bool i_won = xchg(&lock_avaliable, 0);

// loop if you must...
while(!xchg(&lock_avaliable, 0)) continue;

... critical section ...

// unlocking
xchg(&lock_avaliable, 1); /* expect 0, otherwise we have bug */

Mutual Exclusion: (Yes) there is only 1 one in the system Progress: (Yes) someone will get the 1 eventually Bounded Waiting: (No) it's purely by chance

So we let the won-person decide who to run next.

// unlocking
int j = (i + 1) % n;
while ((j != i) && !waiting[j]) {
  j = (j + 1) % n; // find next process waiting
}
if (j == i) {
  // if no one is waiting, I run myself
  xchg(&lock_avaliable, 1); /* expect 0, otherwise we have bug */
} else {
  // wake up j
  waiting[j] = false;
}
return;

Note that for atomic to work, all access to this memory region should use atomic instructions. (including unlocking)

Issue:

everybody should know the size of thread population n (or an upper bound)
when we expect 0 thread waiting (which is true by our assumption), then unlocking takes $O(n)$ .

Uniprocess Environment

Lock: xchg() loop will waste time. Unlock:

most likely we will get the lock right away. (unfair)
if we decide which thread to run next, it's up to the OS scheduler to schedule next thread. So if scheduler is unfair, right thread will never run.

Multiprocessor Environment

Lock: spin-waiting is useful because critical section is short. Unlock: faireness depends on memory hardware.

Snooping (Load-linked, Store-conditional)

Since bus-lock is harmful, a solution is to split XCHG into 2 parts:

Load-linked(addr): fetch old value
Store-conditional(addr, val): store new value
- if nobody else stored that address in between, then it stores successfully
- if someone touched it: instruction fails and set error code

The above is snooping: watches for conflicting traffic.

Lock Bit

Lock Bit:

lock bus for 32 instructions
disable interrupts
pass 32 instruction will trigger exception
any exception (e.g. page fault, zero divide) will unlock bus

The benifit is you can implement above (test&set, compare&swap, semaphore) in 32 instructions.

Syscall

On uniprocessor: kernel can easily disable interrupt On multiprocessor: kernel can issue inter-processor interrupt (IPI) to other CPU and let them wait for my work.

But syscall is expensive: we want to be fast when nobody wants the lock.

Read more: "Fast Mutual Exclusion for Uniprocessors" by Bershad, Redell, Ellis.

Table of Content