Synchronization 1: Atomicity

Overview

In this lecture, we discuss multithreading, low-level synchronization, and synchronization objects.

Multitasking, multiprocessing, multithreading

Multitasking
- A computer’s resources are divided among multiple tasks
- The CPU runs one task at a time
- The operating system switches between tasks: saves one task (process) state, runs another
- Each process (task) has one logical thread of control
- Each process memory image is accessed by one thread at a time (simple!)
- Software handles switching between tasks
Multiprocessing
- Computer has two or more CPUs
- (Or a CPU and a GPU or another kind of accelerator)
- Each CPU runs one task at a time, but multiple tasks run simultaneously
- Still, each process memory image accessed by one thread at a time
- Software may need to coordinate the CPUs
Multithreading
- A process can have two or more logical threads of control
- Those threads access a process memory image simultaneously
- Goal: Use CPU resources for a single task
- Goal: Use more natural APIs for multitasking (e.g., timeouts)
- Problem: Synchronization

Example threads

#include <cstdio>
#include <thread>

void print1() {
    printf("Hello from thread 1\n");
}

void print2() {
    printf("Hello from thread 2\n");
}

int main() {
    std::thread t1(print1);
    std::thread t2(print2);
    t1.join();
    t2.join();
}

std::thread th(f...) creates a thread th running f(...)
th.join() waits for th to complete

Thread functions can take arguments

synch1/threadex2.cc

#include <cstdio>
#include <thread>

void printn(int n) {
    printf("Hello from thread %d\n", n);
}

int main() {
    std::thread t1(printn, 1);
    std::thread t2(printn, 2);
    t1.join();
    t2.join();
}

Passing reference arguments is a PITA (need std::ref; see threadex3.cc)

Detached threads

A thread can be joinable or detached
A joinable thread must be joined (via th.join()) before the program exits
th.detach() detaches the thread
- It cannot be joined
- It automatically dies when the program exits
synch1/threadex4.cc

Processes vs. threads

fork()
- Cloned program image
- New identity
- Cloned environment view
- Same underlying environment
std::thread
- Same program image! (Share the same primary memory)
- Same identity!
- Same environment view!
- Same underlying environment
What’s different?
- Different local variables
- New set of registers (held in kernel)
- New stack (allocated in primary memory)

Same program image, different stacks

synch1/threadex5.cc

Advantages of threads for synchronization

Coexisting in the same, non-isolated memory image makes some forms of synchronization easier to express and understand
It also opens disgusting avenues for bugs

A toy task

Increment a variable 40,000,000 times

`incr-basic.cc`

What?!

Look at incr-basic.cc
Use objdump -d incr-basic.o to look at the assembly

`incr-basic-noopt`

Data races

A data race is a kind of race condition
It occurs when two or more threads in a single process access the same memory location concurrently
- Without explicit synchronization (explained soon)
- At least one access is a write (safe to read the same location)
Result: undefined behavior

The Fundamental Law of Synchronization

THOU SHALT NOT HAVE DATA RACES!!!

If two threads access the same memory location concurrently and without synchronization, both accesses must be reads

Atomic operations

Some computer operations are atomic operations
- These operations have atomic effect
An atomic operation occurs at a single indivisible instant
- Every other operation in the system appears to occur either entirely before or entirely after the atomic operation
Atomic effects are possible at many interfaces
- Primary memory
- System calls (e.g., pipe, open, fork)

`addl $0x1,(%rdi)`

Doesn’t this operation have atomic effect?
No
- The CPU implements this operation as three “micro-ops”
- First, load (%rdi) into a secret register
- Second, compute on that secret register
- Third, store the result to (%rdi)
Example

Atomic instructions

Simple loads and stores have atomic effect
- If they are aligned
Read-modify-write instructions do not typically have atomic effect
CPUs have special, more-expensive read-modify-write instructions that do have atomic effect

`incr-atomic.cc`

std::atomic<T> types in C++ support atomic read-modify-write operations
std::atomic<T> x; ... ++x ...: the ++x has atomic effect
Compiles to lock addl $0x1, (%rdi)
- The lock prefix makes this an atomic instruction
Concurrent accesses to std::atomics do not cause data races

Mutual exclusion

Atomic effect can be modeled as exclusion
One thread of control (CPU or process thread) gains temporary exclusive access to some data
Atomic instructions and atomic data types provide inexpensive exclusion, but to tiny data chunks (e.g., single integers)
Synchronization objects are higher-level concepts that help implement exclusion on larger or more complex data types

`std::mutex`

A std::mutex synchronization object provides interface of a lock
m.lock(): Acquire exclusive access to the mutex
- Block until no other thread has exclusive access to the mutex
- Atomically mark mutex as acquired
m.unlock(): Release access to the mutex
- Mark mutex as released
All code between m.lock() and m.unlock() is subject to mutual exclusion

`incr-mutex.cc`

`std::scoped_lock`

Acquire a mutex (or several mutexes) for a block of code
Automatically releases the mutex when the block exits
incr-scopedlock.cc

Question

How can we implement a mutex?
Polling? Blocking?

`incr-spinlock.cc`

The spinlock version polls
The mutex version blocks

Synchronization 1: Atomicity

Overview

Multitasking, multiprocessing, multithreading

Example threads

Thread functions can take arguments

Detached threads

Processes vs. threads

Same program image, different stacks

Advantages of threads for synchronization

A toy task

incr-basic.cc

What?!

incr-basic-noopt

Data races

The Fundamental Law of Synchronization

THOU SHALT NOT HAVE DATA RACES!!!

Atomic operations

addl $0x1,(%rdi)

Atomic instructions

incr-atomic.cc

Mutual exclusion

std::mutex

incr-mutex.cc

std::scoped_lock

Question

incr-spinlock.cc

process4/timedwait-threads.cc

`incr-basic.cc`

`incr-basic-noopt`

`addl $0x1,(%rdi)`

`incr-atomic.cc`

`std::mutex`

`incr-mutex.cc`

`std::scoped_lock`

`incr-spinlock.cc`

`process4/timedwait-threads.cc`