What belongs to .h:
types declarations (not definitions)
exported interface routines ("public methods")
constants (#define or enum)
macros
data items exported by module (try to avoid this)
no code
For internal data structure and export, you should put stuff in [NAME]_private.h file (and don't release to public).
Code sequence looks like
mutex_lock(&data->lock)
// some code depends on data object
mutex_unlock(&data->lock)
In the old fashion, there is a pin called lock in memory bus. When set to low, no other processor can read or write to memory. People don't do this anymore.
Lock might be implemented as: XCHG instruction
// test&set
int32 xchg(int32 *lock, int32 val) {
register int old;
old = *lock; /* "bus is locked" */
*lock = val; /* "bus is locked" */
return old;
}
There are other instructions such as
XADD,CMPXCHG,CMPXCHG8B(consultintel-sys.pdfandintel-isr.pdf)
Using mutex
int lock_avaliable = 1;
bool i_won = xchg(&lock_avaliable, 0);
// loop if you must...
while(!xchg(&lock_avaliable, 0)) continue;
... critical section ...
// unlocking
xchg(&lock_avaliable, 1); /* expect 0, otherwise we have bug */
Mutual Exclusion: (Yes) there is only 1 one in the system Progress: (Yes) someone will get the 1 eventually Bounded Waiting: (No) it's purely by chance
So we let the won-person decide who to run next.
// unlocking
int j = (i + 1) % n;
while ((j != i) && !waiting[j]) {
j = (j + 1) % n; // find next process waiting
}
if (j == i) {
// if no one is waiting, I run myself
xchg(&lock_avaliable, 1); /* expect 0, otherwise we have bug */
} else {
// wake up j
waiting[j] = false;
}
return;
Note that for atomic to work, all access to this memory region should use atomic instructions. (including unlocking)
Issue:
everybody should know the size of thread population n (or an upper bound)
when we expect 0 thread waiting (which is true by our assumption), then unlocking takes O(n).
Lock: xchg() loop will waste time.
Unlock:
most likely we will get the lock right away. (unfair)
if we decide which thread to run next, it's up to the OS scheduler to schedule next thread. So if scheduler is unfair, right thread will never run.
Lock: spin-waiting is useful because critical section is short. Unlock: faireness depends on memory hardware.
Since bus-lock is harmful, a solution is to split XCHG into 2 parts:
Load-linked(addr): fetch old value
Store-conditional(addr, val): store new value

The above is snooping: watches for conflicting traffic.
Lock Bit:
lock bus for 32 instructions
disable interrupts
pass 32 instruction will trigger exception
any exception (e.g. page fault, zero divide) will unlock bus
The benifit is you can implement above (test&set, compare&swap, semaphore) in 32 instructions.
On uniprocessor: kernel can easily disable interrupt On multiprocessor: kernel can issue inter-processor interrupt (IPI) to other CPU and let them wait for my work.
But syscall is expensive: we want to be fast when nobody wants the lock.
Read more: "Fast Mutual Exclusion for Uniprocessors" by Bershad, Redell, Ellis.
Table of Content