Overview
In this lecture, we discuss multithreading, low-level synchronization, and synchronization objects.
Update on racer
programs
- On an unloaded machine,
./racer-block
ran more than 1,000,000 times without error - On a busy machine,
./racer-block
experienced an error about 1 in 300 times! ./racer-blockvar
experienced an error about 1 in 750 times- A version of
./racer-blockvar
that tried to minimize the race condition experienced an error 1 in 25,000 times - Maybe, in practice, 1 in 25,000 times on a heavily-loaded machine (1 in \gg1,000,000 times on an unloaded machine) is acceptable?
- But often any race condition problem is a serious error!
Multitasking, multiprocessing, multithreading
- Multitasking
- A computer’s resources are divided among multiple tasks
- The CPU runs one task at a time
- The operating system switches between tasks: saves one task (process) state, runs another
- Each process (task) has one logical thread of control
- Each process memory image is accessed by one thread at a time (simple!)
- Software handles switching between tasks
- Multiprocessing
- Computer has two or more CPUs
- (Or a CPU and a GPU or another kind of accelerator)
- Each CPU runs one task at a time, but multiple tasks run simultaneously
- Still, each process memory image accessed by one thread at a time
- Software may need to coordinate the CPUs
- Multithreading
- A process can have two or more logical threads of control
- Those threads access a process memory image simultaneously
- Goal: Use CPU resources for a single task
- Goal: Use more natural APIs for multitasking (e.g., timeouts)
- Need synchronization to coordinate those threads!
Example threads
#include <cstdio>
#include <thread>
void threadfunc1() {
printf("Hello from thread 1\n");
}
void threadfunc2() {
printf("Hello from thread 2\n");
}
int main() {
std::thread t1(threadfunc1);
std::thread t2(threadfunc2);
t1.join();
t2.join();
}
std::thread th(f...)
creates a threadth
runningf(...)
th.join()
waits forth
to complete
Thread functions can take arguments
#include <cstdio>
#include <thread>
void threadfunc(int n) {
printf("Hello from thread %d\n", n);
}
int main() {
std::thread t1(threadfunc, 1);
std::thread t2(threadfunc, 2);
t1.join();
t2.join();
}
- Passing reference arguments is a PITA (need
std::ref
; seethreadref.cc
)
Threads run concurrently
- Like processes in the case of
fork
: no guarantee they run in a specific order, unless that order is enforced by the programmer synch1/threadex3.cc
Processes vs. threads
fork()
- Cloned program image
- New identity
- Cloned environment view
- Same underlying environment
std::thread
- Same program image! (Share the same primary memory)
- Same identity!
- Same environment view!
- Same underlying environment
- What’s different?
- Different local variables
- New set of registers (held in kernel)
- New stack (allocated in primary memory)
synch1/threadstack.cc
Detached threads
- A thread can be joinable or detached
- A joinable thread must be joined (via
th.join()
) before the program exits th.detach()
detaches the thread- It cannot be joined
- It automatically dies when the program exits
synch1/threadjoin.cc
Advantages of threads for synchronization
- Coexisting in the same, non-isolated memory image makes some forms of synchronization easier to express and understand
- Threads allow a single process to compute using multiple CPUs, which is sometimes critical for good performance
- It also opens disgusting avenues for bugs
A toy task
- Increment a variable 40,000,000 times
incr-basic.cc
What?!
- Look at
incr-basic.cc
- Use
objdump -d incr-basic.o
to look at the assembly
incr-basic-noopt
Data races
- A data race is a kind of race condition
- It occurs when two or more threads in a single process access the same
memory location concurrently
- Without explicit synchronization (explained soon)
- At least one access is a write (safe to read the same location)
- Data races cause undefined behavior
The Fundamental Law of Synchronization
THOU SHALT NOT HAVE DATA RACES!!!
- If two threads access the same memory location concurrently and without explicit synchronization, both accesses must be reads
Explicit synchronization
- Data races cause problems at many levels: compiler, cache and processor hardware, …
- Explicit synchronization operations force compiler and cache to behave correctly in the presence of concurrent writes
- Explicit synchronization operations are specified by the C++ standard
- Certain operations on certain synchronization-related data types
- For example, atomic data types
Atomic operations
- Some computer operations are atomic operations
- These operations have atomic effect
- An atomic operation occurs at a single indivisible instant
- Every other operation in the system appears to occur either entirely before or entirely after the atomic operation
- Atomic effects are possible at many interfaces
- Primary memory
- System calls (e.g.,
pipe
,open
,fork
)
addl $0x1,(%rdi)
- Doesn’t this instruction have atomic effect on x86-64 hardware?
- No
- The CPU implements this instruction as three “micro-ops”
- First, load
(%rdi)
into a secret register - Second, compute on that secret register
- Third, store the result to
(%rdi)
- Furthermore, these micro-ops use the x86-64 processor cache in a way that’s vulnerable to anomalies in the case of data races
- But
lock addl $0x1,(%rdi)
does have atomic effect!
Atomic instructions
- Simple loads and stores have atomic effect
- If they are aligned
- Read-modify-write instructions do not typically have atomic effect
- CPUs have special, more-expensive read-modify-write instructions that do have atomic effect
incr-atomic.cc
std::atomic<T>
types in C++ support atomic read-modify-write operationsstd::atomic<T> x; ... ++x ...
: the++x
has atomic effect and is an explicit synchronization operation- Compiles to
lock addl $0x1, (%rdi)
- The
lock
prefix makes this an atomic instruction
- The
- Concurrent accesses to
std::atomic
s do not cause data races
Mutual exclusion
- Atomic effect can be modeled as exclusion
- One thread of control (CPU or process thread) gains temporary exclusive access to some data
- Atomic instructions and atomic data types provide inexpensive exclusion, but to tiny data chunks (e.g., single integers)
- Synchronization objects are higher-level concepts that help implement exclusion on larger or more complex data types
std::mutex
- A
std::mutex
synchronization object provides interface of a lock m.lock()
: Acquire exclusive access to the mutex- Block until no other thread has exclusive access to the mutex
- Atomically mark mutex as acquired
m.unlock()
: Release access to the mutex- Mark mutex as released
- All code between
m.lock()
andm.unlock()
is subject to mutual exclusion
incr-mutex.cc
std::scoped_lock
- Acquire a mutex (or several mutexes) for a block of code
- Automatically releases the mutex when the block exits
incr-scopedlock.cc
Question
- How can we implement a mutex?
- Polling? Blocking?
incr-spinlock.cc
- The
spinlock
version polls - The
mutex
version blocks