Implementing synchronization

Deadlock

Deadlock is a synchronization bug that involves an infinite wait. This makes it rather easy to detect. Deadlock occurs when the following four conditions are met.

Mutual Exclusion: Only one thread can hold a mutex at a time
Hold & Wait : This happens when a thread holding a lock blocks waiting
No Preemption: There is no way for a thread that holds a lock to release it
Circular Wait: A thread is waiting for a resource that is being held by another resource

If any one of these conditions does not hold, there will not be deadlock.

Wait Graph

A wait graph is a representation of resource and thread dependencies that can be used to detect deadlock.

◻ : A square represents a resource
◯ : A circle represents a thread.
◻ → ◯: The resource is held by the thread
◯ ← ◻: The thread is waiting for the resource

Lock Ordering

Threads I and II are trying to execute the exch instruction. Thread I wants to execute exch a b and thread II is trying to execute exch b a. Access to a and b are protected with mutexes. The locks involved in this scenario are as follows:

1 TI lock[a]
2 TI lock[b]
3 TII lock[a]
4 TII lock[b]

Executing these instructions in the order 1, 2, 3, 4 will result in no deadlock.

Executing these instructions in the order 1, 3, 2, 4 will result in a cycle in the waitgraph and thus deadlock.

Acquiring locks in different orders can lead to cycles in the wait graph. One way to avoid deadlock is to ensure that all code acquires locks in a predefinied order. Lock ordering defines a property such that if a thread holds more than one mutex, then there is a single order in which they were acquired.

Implementing a Bank

Imagine that you have a bank that supports the following transactions.

set : set the bank balance for the bank account to a value
get : return the bank balance associated with a bank account

Two people are trying to withdraw funds from a shared bank account. Person A and Person B both issue a get request and see that the bank account has $10 in it. Then, Person A and Person B both withdraw $5 from the bank account and set the balance to $5. Theoretically, the bank account should be left with $0 inside it. However, in this scenario the bank account is left with $5 in the account.

The problem here is that the increment and decrement operations are not atomic. Between the time person B checks their account balance and the time person B executes a set request to withdraw $5, the bank account balance has changed.

Compare and Swap

The compare and swap (CAS) instruction allows us to perform computations atomically. The compare and swap instruction is atomic but it can be decomposed in the following way.

 int compare_and_swap(unsigned *value, unsigned expected, unsigned desired) { 
   if (*value == expected) { 
     *value = desired; 
     return 1;
   }
   else { 
     return 0; 
   }
 }

Compare and swap will only succeed if the value to be edited matches the expected value. If the expected value does not match the current value, the call will fail. Typically, compare and swap is called in a loop because after a failed call, the solution is to update the value parameter and try again.

Referring back to the bank example, we can replace the set instruction with a CAS instruction.

Person A and Person B issue get requests normally and see that there are $10 in the bank account.
Person A issues CAS(balance, 10, 5) to withdraw $5 and set the new balance to $5.
Person B issues the same CAS(balance, 10, 5) instruction but it will fail because the instruction expected the balance to be $10 but it is $5.

At this point, another get request should be issued to update the balance and call CAS again with a new balance value.

Atomic instructions

This section concerns the behavior of an x86-64 processor, not the C abstract machine.

Many instructions have atomic effect
- Loading an aligned 8-byte value (or 1, 2, 4-byte value)
- Storing an aligned 8-byte value (or 1, 2, 4-byte value)
- Thus, when multiple threads simultaneously store 8-byte values at the same address, any load of that address will see a value that was stored—not, like, some crazy combination of values
- (Note: this only applies to aligned loads and stores—if a load or store crosses a cache-line boundary, a load can see a crazy combination of values!)
But many instructions do not have atomic effect!
- Consider incq (%rax), which increments the 64-bit integer value whose address is %rax
- That instruction logically comprises a load (uint64_t tmp = \*(%rax);), an arithmetic operation (tmp = tmp + 1;), and a store (\*(%rax) = tmp;)
- The load and the store each have atomic effect (assuming an aligned address)—but the instruction as a whole does not!
- Say a memory location holds zero initially, and then two threads each call incq on that location 100,000 times each. If incq was atomic, the final value stored in that location would always be 200,000. But in real processors, the final result might be much less!
- x86-64 instructions that combine reads and writes have non-atomic effects by default.
So how can we build a lock?
- Mutual exclusion locks effectively require reading and writing the same variable in one atomic step!
- Although smart people have figured out lock implementations that appear to work without atomic read/write instructions (see Lamport’s bakery algorithm), those algorithms do not work on modern machines!
The solution is atomic instructions.
- x86-64 supports special instructions that implement reading and writing in one atomic step.
- This is usually indicated by a lock prefix on the instruction.
- For example, lock incq (%rax) does have atomic effect.
C compilers give programmers access to atomic instructions through special functions.
The most fundamental atomic instruction is compare-and-swap, also known as cmpxchg (compare-exchange).
- Using compare-and-swap, you can implement any other atomic function (though not necessarily as efficiently!). TYPE cmpxchg(TYPE* object, TYPE expected, TYPE desired) { // The following is implemented in one atomic step: TYPE actual = *object; if (actual == expected) *object = desired; return actual; }
- Compare-and-swap can, for example, implement “atomic multiply by 7 and add 2”. Here’s how: uint64_t atomic_mul7_add2(uint64_t* object) { while (1) { uint64_t expected = *object; if (cmpxchg(object, expected, expected * 7 + 2) == expected) return; } }
- Compare-and-swap is almost always used in a loop, because the swap might fail. In this it resembles the condition-variable wait operation.
- In GCC, this is called __sync_val_compare_and_swap or (though the interface is slightly different) __atomic_compare_exchange.