COP 5611: OPERATING SYSTEMS -Spring 2001- Ted Baker
#define spin_lock_init(x) do { (x)->lock = 0; } while(0)
The above is semantically the same as "x->lock =0".
The "do{ ... }while()" is a syntactic wrapper that prevents spin_lock_init from being used as an expression, i.e., makes sure it belongs to the syntactic category statement. In general, it is important for the execution order of synchronization operations to be uniquely defined, as is the case for statement execution order. In contrast, the compiler is free to reorder the evaluation of subexpressions, subject to the limits of parenthesization.
This is a conventional way of enclosing a block of code in a macro. To see why simple braces "{ ... }" would not do as well, consider the following:
if (foo) spin_lock_init(x); else do_something_else();
This looks (and should be) perfectly legal, but if you protected the macro only with '{ }' you would get the following:
if (foo) { (x)->lock = 0; }; else do_something_else();
The {} block will be interpreted as the if-block (ok so far). The semicolon will be taken as a null statement. That is, since it is not an "else" the compiler concludes that the "if" doesn't have a matching "else". When it then sees the "else" on the next line it will treat that as a syntax error. A "do {} while(0)" is the correct way of making sure that the semicolon following the macro always gets eaten by the parser. A decent compiler will not generate any extra machine instructions for this degenerate form of loop, so it does not add execution time overhead.
#ifndef __SMP__ ... #define spin_lock_init(lock) do { } while(0) ... #else /* __SMP__ */ ... #define spin_lock_init(x) do { (x)->lock = 0; } while(0) ... #endif /* __SMP__ */
If the system is compiled without the symbol __SMP__ defined (the symbol selects support for Symmetric Multi-Processor, or Shared-memory Multi-Processor) support there is no need for spin-locks, so in that case null bodies are provided for all the operations on spinlocks.
typedef struct { volatile unsigned int lock; } spinlock_t;
The value of x.lock is an integer. The value is zero if-and-only-if the lock is in the unlocked stated.
What does the word "volatile" mean here? Why is this essential?
#define spin_lock_string \ "\n1:\t" \ "lock ; btsl $0,%0\n\t" \ "jc 2f\n" \ ".section .text.lock,\"ax\"\n" \ "2:\t" \ "testb $1,%0\n\t" \ "jne 2b\n\t" \ "jmp 1b\n" \ ".previous"
If the new-line and tab characters are expanded, the above in-line assembly language code appears as follows:
1: // a label lock // makes the next instruction atomic btsl $0,%0 // bit test and set long jc 2f // .section .text.lock,"ax" 2: testb $1,%0 jne 2b jmp 1b .previous
The "bit test and set long" instruction is explained in terms of register transfers as follows:
CF <- Bit(BitBase, BitOffset) Bit(BitBase, BitOffset) <- 1
That is, the Carry Flag (CF) is set if the value of the 0th bit of register operand %0 before the instruction executes is 1, and in any case sets the 0th bit of %0 to 1. Here, $0 denotes an immediate operand with value zero.
Thus, the net effect of the first three instructions is to jump to label 2 if the lock is not free, and otherwise seize the lock.
.section .text.lock,"ax"
The above pseudo-operation causes the assembler to put the following block of code, up to the next .previous pseudo-operation, out-of-line, in a different text (i.e., code) section. We expect that usually the lock will be free and execution will fall through the jc 2f instruction. When it does it will fall through to the block of code following the .previous pseudo-operation. It is because the target is in a different text segment that the conditional jump uses a long address, indicated by the letter "l".
2: testb $1,%0 // compare the low-order byte against 1 jne 2b // continue looping if the value is still 1 jmp 1b // otherwise, go back to label 1 and retry
The above code does the "spinning", i.e., it repeated checks the value of the lock until the holder frees it, which the holder will do by setting the value of the lock to zero. This could have been done using a simple loop on the btsl, but each time that instruction executes is forces a "locked cycle" on the memory bus, which hurts performance for all CPUs. It is is much more efficient to just do a normal fetch on the value of the lock until it is updated, since the fetch will come from the local cache of the CPU until the value in memory is changed by the lock holder. The CPU has bus snooping logic that will notice the write, update (or invalidate) the local cache, and cause the new value of the lock to be fetched when it has changed. Of course, another CPU might be also trying to grab the lock, so it is necessary to go back to the btsl in order to repeat the attempt at atomically seizing the lock.
This kind of lock is not perfect. In particulate, it has the following problems:
The effects of problem (1) can be bounded if the lengths of the critical sections (blocks of code over which the lock is held) for each lock are kept very short, and the frequencies of execution of critical sections are kept low.
Problems (2) and (3) can be addressed by using a more sophisticated locking algorithm, that takes into account priorities. It is an interesting question how much additional execution time overhead is incurred by using such an algorith (which is necessarily somewhat slower), and how the processor utilization lost due to this overhead compares against the gain in utilizable processing capacity due to reduced priority inversion.
#define spin_unlock_string \ "lock ; btrl $0,%0"
The above Bit Test and Reset (btrl) instruction stores the value of the zeroth bit of the operand into the CF flag, and clears the bit in the memory operand.