Overview
In this lecture, we discuss multithreading, low-level synchronization, and synchronization objects.
Increment recap
- Four threads, each executing
++n10,000,000 times - Results like 30,000,000 (!?) or 12,583,593 (!?!?!?)
- A data race is a race condition involving memory access
- Two or more threads access the same memory location concurrently;
- Without explicit synchronization;
- And at least one access is a write (safe to read the same location).
- Data races cause undefined behavior
- Thou shalt not data race
Atomic data types address data races
- Accesses involving normal data types, such as
intandlong, are vulnerable to data races - Atomic data types, such as
std::atomic<int>, are not- “Atomos” means “indivisible”
- Operations on atomic data types are explicitly synchronized
- Concurrent accesses do not cause data races
- Support operations like
++and.exchange() - The result is always “correct”
- …What does “correct” mean?
Serializability
- Concurrent operations should often obey a correctness condition called serializability
- An implementation is serializable when:
- The result of running operations concurrently
- Always equals the result of running the same operations one at a time,
in some order
- That is, serially
Serializable increment
- What are the possible outcomes of running the operations in
incr-atomic.ccone at a time, in some order?
incr-atomic.cc
std::atomic<T>types in C++ support atomic read-modify-write operationsstd::atomic<T> x; ... ++x ...: the++xhas atomic effect and is an explicit synchronization operation- Compiles to
lock addl $0x1, (%rdi)- The
lockprefix makes this an atomic instruction
- The
Atomic data types are necessary, but not sufficient
- Processor caches mean that normal memory accesses are not serializable
- Atomic operations force processor caches to serialize memory accesses
- At the granularity of single instructions
- Foundation of all synchronization
- But atomic data types are limited in scope
- Simple 1–8 byte values
- Many synchronization problems don’t map directly onto atomic operations
- We need synchronization objects that leverage atomic data types, and operating system functions, to build useful semantics
CS 61 Bank
- Each account has a balance
- No account balance may go negative
Basic account operations
deposit(amt)- Add
amtto account balance
- Add
try_withdraw(amt)- If account balance is less than
amt, return false - Otherwise, deduct
amtfrom balance and return true
- If account balance is less than
C++ sketch
class acct {
long bal;
void deposit(long amt) {
bal += amt;
}
bool try_withdraw(long amt) {
if (bal < amt) {
return false;
}
bal -= amt;
return true;
}
};
Concurrent access
- Data races!
- Violates the Fundamental Law of Synchronization
- Undefined behavior
Atomics eliminate data races
class acct {
std::atomic<long> bal;
// ... same `deposit` and `try_withdraw`
};
- No data races!
Atomics don’t eliminate all race conditions
- Despite absence of data races, this code allows account balances to go negative
- This is because when code executes concurrently and there are no data races, the overall effect of a set of threads is the same as some interleaving of the C++-level expressions (reads and writes)
- Concurrently-executing
try_withdrawcalls on the same account can cause trouble
Race condition example
acct awith initial balance 10- Thread T1 executes
a.try_withdraw(8) - Concurrently, thread T2 executes
a.try_withdraw(7)
- Thread T1 executes
- Possible order:
T1: a.try_withdraw(8) a.bal T2: a.try_withdraw(7) if (a.bal < 8) ... 10 10 if (a.bal < 7) ... a.bal -= 8; 2 -5 a.bal -= 7; -5 return true; return true; -5
Critical sections
- The
try_withdrawcode implicitly assumes thatbalis unmodified during its execution - That is, the
ifandbal -= amtstatements oftry_withdrawform a critical section: no other thread should modifybalduring those statements - To avoid race conditions, we must enforce this critical section
- Mutexes (locks)
How to delimit a critical section
- Start from an access to a shared object (an object that is subject to reads and writes from different threads)
- Expand to include all nearby code that refers to that object
Global mutex
std::mutex bm;
class acct {
std::atomic<long> bal;
void deposit(long amt) {
bm.lock();
bal += amt;
bm.unlock();
}
bool try_withdraw(long amt) {
bm.lock();
if (bal < amt) {
bm.unlock();
return false;
}
bal -= amt;
bm.unlock();
return true;
}
};
bmprotects access to any account’s balance- No two account balances may be modified simultaneously
Functions and lock state
- Irritatingly,
try_withdrawmust release the lock in two places:bool try_withdraw(long amt) { bm.lock(); if (bal < amt) { bm.unlock(); return false; } bal -= amt; bm.unlock(); return true; } - Why?
- In general, functions should leave the state of all system mutexes the same as upon entry
- (Functions should not modify system state unnecessarily)
- So if a function acquires a lock (
bm.lock()), it should generally release it before returning (bm.unlock()) - On all paths
Guard pattern
- Commonly, a function will acquire a lock on entry, and should release that lock just before returning
- If your locks work that way, a C++ coding pattern called lock guards will improve your code
bool try_withdraw(long amt) { std::unique_lock guard(bm); if (bal < amt) { return false; } bal -= amt; return true; } - The lock
bmis acquired when theguardobject is initialized, and released whenguardis destroyed (just before function return) - No need for explicit
unlock(), no worries about multiple exit paths
Mutexes and atomics
- Now that a mutex
bmprotects the account balance, we no longer need atomics
std::mutex bm;
class acct {
long bal;
void deposit(long amt) {
std::unique_lock guard(bm);
bal += amt;
}
bool try_withdraw(long amt) {
std::unique_lock guard(bm);
if (bal < amt) {
return false;
}
bal -= amt;
return true;
}
};
- When synchronization for on object is provided by mutexes, it is generally
better to leave that object non-atomic
- Atomics tell the reader “this object might be accessed at any time”
- Correctness is harder to reason about, induces a feeling of dread
- Sanitizers can’t catch “data races” on atomics (because there are no data races on atomics)
- But in this case, we need a separate method to read the balance
long balance() const { std::unique_lock guard(bm); return bal; }
Lock granularity
- The global mutex is an instance of coarse-grained locking
- One mutex covers many different objects
- Simpler to code (acquire one mutex to protect most state)
- Less space required for synchronization objects
- Less parallelism
- If we want more parallelism, implement fine-grained locking
- One mutex covers few objects
- More difficult to code (programmer must know how synchronization objects match state; may need to acquire several locks to protect several objects)
- More space required for synchronization objects
- More parallelism
Fine-grained account locking
class acct {
long bal;
std::mutex am;
void deposit(long amt) {
std::unique_lock guard(am);
bal += amt;
}
bool try_withdraw(long amt) {
std::unique_lock guard(am);
if (bal < amt) {
return false;
}
bal -= amt;
return true;
}
};
Compare–exchange
class acct {
std::atomic<long> bal;
void deposit(long amt) {
bal += amt;
}
bool try_withdraw(long amt) {
long tbal = bal;
while (true) {
if (tbal < amt) {
return false;
}
if (bal.compare_exchange_strong(tbal, tbal - amt)) {
return true;
}
}
}
}