Synchronization 1: Threads and atomicity

Overview

In this lecture, we discuss multithreading, low-level synchronization, and synchronization objects.

Full notes on synchronization

Update on `racer` programs

On an unloaded machine, ./racer-block ran more than 1,000,000 times without error
On a busy machine, ./racer-block experienced an error about 1 in 300 times!
./racer-blockvar experienced an error about 1 in 750 times
A version of ./racer-blockvar that tried to minimize the race condition experienced an error 1 in 25,000 times
Maybe, in practice, 1 in 25,000 times on a heavily-loaded machine (1 in \gg1,000,000 times on an unloaded machine) is acceptable?
But often any race condition problem is a serious error!

Multitasking, multiprocessing, multithreading

Multitasking
- A computer’s resources are divided among multiple tasks
- The CPU runs one task at a time
- The operating system switches between tasks: saves one task (process) state, runs another
- Each process (task) has one logical thread of control
- Each process memory image is accessed by one thread at a time (simple!)
- Software handles switching between tasks
Multiprocessing
- Computer has two or more CPUs
- (Or a CPU and a GPU or another kind of accelerator)
- Each CPU runs one task at a time, but multiple tasks run simultaneously
- Still, each process memory image accessed by one thread at a time
- Software may need to coordinate the CPUs
Multithreading
- A process can have two or more logical threads of control
- Those threads access a process memory image simultaneously
- Goal: Use CPU resources for a single task
- Goal: Use more natural APIs for multitasking (e.g., timeouts)
- Need synchronization to coordinate those threads!

Example threads

synch1/threadex1.cc

#include <cstdio>
#include <thread>

void threadfunc1() {
    printf("Hello from thread 1\n");
}

void threadfunc2() {
    printf("Hello from thread 2\n");
}

int main() {
    std::thread t1(threadfunc1);
    std::thread t2(threadfunc2);
    t1.join();
    t2.join();
}

std::thread th(f...) creates a thread th running f(...)
th.join() waits for th to complete

Thread functions can take arguments

synch1/threadex2.cc

#include <cstdio>
#include <thread>

void threadfunc(int n) {
    printf("Hello from thread %d\n", n);
}

int main() {
    std::thread t1(threadfunc, 1);
    std::thread t2(threadfunc, 2);
    t1.join();
    t2.join();
}

Passing reference arguments is a PITA (need std::ref; see threadref.cc)

Threads run concurrently

Like processes in the case of fork: no guarantee they run in a specific order, unless that order is enforced by the programmer
synch1/threadex3.cc

Processes vs. threads

fork()
- Cloned program image
- New identity
- Cloned environment view
- Same underlying environment
std::thread
- Same program image! (Share the same primary memory)
- Same identity!
- Same environment view!
- Same underlying environment
What’s different?
- Different local variables
- New set of registers (held in kernel)
- New stack (allocated in primary memory)
- synch1/threadstack.cc

Detached threads

A thread can be joinable or detached
A joinable thread must be joined (via th.join()) before the program exits
th.detach() detaches the thread
- It cannot be joined
- It automatically dies when the program exits
synch1/threadjoin.cc

Advantages of threads for synchronization

Coexisting in the same, non-isolated memory image makes some forms of synchronization easier to express and understand
Threads allow a single process to compute using multiple CPUs, which is sometimes critical for good performance
It also opens disgusting avenues for bugs

A toy task

Increment a variable 40,000,000 times
incr-basic.cc

What?!

Look at incr-basic.cc
Use objdump -d incr-basic.o to look at the assembly

`incr-basic-noopt`

Data races

A data race is a kind of race condition
It occurs when two or more threads in a single process access the same memory location concurrently
- Without explicit synchronization (explained soon)
- At least one access is a write (safe to read the same location)
Data races cause undefined behavior

The Fundamental Law of Synchronization

THOU SHALT NOT HAVE DATA RACES!!!

If two threads access the same memory location concurrently and without explicit synchronization, both accesses must be reads

Explicit synchronization

Data races cause problems at many levels: compiler, cache and processor hardware, …
Explicit synchronization operations force compiler and cache to behave correctly in the presence of concurrent writes
Explicit synchronization operations are specified by the C++ standard
- Certain operations on certain synchronization-related data types
- For example, atomic data types

Atomic operations

Some computer operations are atomic operations
- These operations have atomic effect
An atomic operation occurs at a single indivisible instant
- Every other operation in the system appears to occur either entirely before or entirely after the atomic operation
Atomic effects are possible at many interfaces
- Primary memory
- System calls (e.g., pipe, open, fork)

`addl $0x1,(%rdi)`

Doesn’t this instruction have atomic effect on x86-64 hardware?
No
- The CPU implements this instruction as three “micro-ops”
- First, load (%rdi) into a secret register
- Second, compute on that secret register
- Third, store the result to (%rdi)
- Furthermore, these micro-ops use the x86-64 processor cache in a way that’s vulnerable to anomalies in the case of data races
But lock addl $0x1,(%rdi) does have atomic effect!

Atomic instructions

Simple loads and stores have atomic effect
- If they are aligned
Read-modify-write instructions do not typically have atomic effect
CPUs have special, more-expensive read-modify-write instructions that do have atomic effect

`incr-atomic.cc`

std::atomic<T> types in C++ support atomic read-modify-write operations
std::atomic<T> x; ... ++x ...: the ++x has atomic effect and is an explicit synchronization operation
Compiles to lock addl $0x1, (%rdi)
- The lock prefix makes this an atomic instruction
Concurrent accesses to std::atomics do not cause data races

Mutual exclusion

Atomic effect can be modeled as exclusion
One thread of control (CPU or process thread) gains temporary exclusive access to some data
Atomic instructions and atomic data types provide inexpensive exclusion, but to tiny data chunks (e.g., single integers)
Synchronization objects are higher-level concepts that help implement exclusion on larger or more complex data types

`std::mutex`

A std::mutex synchronization object provides interface of a lock
m.lock(): Acquire exclusive access to the mutex
- Block until no other thread has exclusive access to the mutex
- Atomically mark mutex as acquired
m.unlock(): Release access to the mutex
- Mark mutex as released
All code between m.lock() and m.unlock() is subject to mutual exclusion

`incr-mutex.cc`

`std::scoped_lock`

Acquire a mutex (or several mutexes) for a block of code
Automatically releases the mutex when the block exits
incr-scopedlock.cc

Question

How can we implement a mutex?
Polling? Blocking?

`incr-spinlock.cc`

The spinlock version polls
The mutex version blocks

Synchronization 1: Threads and atomicity

Overview

Update on racer programs

Multitasking, multiprocessing, multithreading

Example threads

Thread functions can take arguments

Threads run concurrently

Processes vs. threads

Detached threads

Advantages of threads for synchronization

A toy task

What?!

incr-basic-noopt

Data races

The Fundamental Law of Synchronization

THOU SHALT NOT HAVE DATA RACES!!!

Explicit synchronization

Atomic operations

addl $0x1,(%rdi)

Atomic instructions

incr-atomic.cc

Mutual exclusion

std::mutex

incr-mutex.cc

std::scoped_lock

Question

incr-spinlock.cc

Update on `racer` programs

`incr-basic-noopt`

`addl $0x1,(%rdi)`

`incr-atomic.cc`

`std::mutex`

`incr-mutex.cc`

`std::scoped_lock`

`incr-spinlock.cc`