Section 9: Threads and atomics

Threads in C++

std::thread's constructor takes the name of the function to run on the new thread and any arguments, like std::thread t(func, param1, ...).

Given a thread t, t.join() blocks the current thread until the thread being joined exits. (This is roughly analogous to waitpid.) t.detach() sets the t thread free. Each thread object must be detached or joined before it goes out of scope (is destroyed).

An example of how we might combine these is:

#include <thread>

void foo(...) {
    ...
}

void bar(...) {
    ...
}

int main() {
    {
        std::thread t1(foo, ...);
        t1.detach();
    }

    {
        std::thread t2(bar, ...);
        t2.join();
    }
}

Exercise: What can this code print? (sample01.cc)

int main() {
    for (int i = 0; i != 3; ++i) {
        std::thread t(printf, "%d\n", i + 1);
        t.join();
    }
}

Exercise: What can this code print? (sample02.cc)

int main() {
    for (int i = 0; i != 3; ++i) {
        std::thread t(printf, "%d\n", i + 1);
        t.detach();
    }
}

Exercise: What can this code print? (sample03.cc)

int main() {
    for (int i = 0; i != 3; ++i) {
        std::thread t(printf, "%d\n", i + 1);
    }
}

Threads in C

C doesn't have built-in threads or synchronization, but a common library is pthread (POSIX threads). Since we're using C++ threads in this class we won't talk about details here, but be aware that it exists in case you're ever working in C. (On Linux, standard C++ threads are implemented using the pthread library.)

Mutexes in C++

Recall the high-level definition of a mutex: we want some way to synchronize access to data that is accessed simultaneously by multiple threads to avoid race conditions. Mutexes (also known as locks) achieve this by providing mutual exclusion, meaning that only one thread can be running in a particular critical section at any time.

The abstract interface for a mutex or lock is:

lock: takes hold of the lock in a blocking fashion
try lock: takes hold of the lock in a polling fashion; returns a boolean indicating whether the lock attempt succeeded
unlock: instantly lets go of the lock

Note that it is disallowed to lock a lock that you already hold or unlock a lock that you do not hold. Depending on the implementation or language you're using, these actions might explicitly fail or—even worse—silently fail, and cause deadlock or incorrect synchronization.

If you've ever had the pleasure of participating in an icebreaker, you can think of a single lock as the object that is passed around to allow people to speak. You have to acquire the object before you can speak, and you release it so that another person can use it once you're done. Because there's one object and only one person can hold it at once, only one person can speak at a time, achieving mutual exclusion.

For multiple mutexes, a better analogy is the locks on a set of bathroom stalls. When you enter a stall (the critical section) you lock the lock associated with that stall. Then, you perform whatever operation needs to be synchronized inside the stall. Finally, once you're done, you unlock the stall and go on your merry way. This system ensures that each stall can only have one person in it at a time. Here the difference between lock and try lock is better motivated: if one particular stall is full, we might want to check to see if other stalls are available instead of waiting for someone to leave that one.

`std::mutex`

Here are some of the tools you can use to synchronize multithreaded C++ code (and here is some even more comprehensive documentation).

C++ has a few different kinds of locks built in. std::mutex is the basic mutex. Once we declare and construct a std::mutex m, we can call:

m.lock()
m.try_lock()
m.unlock()

which correspond exactly to the abstract lock operations defined above. A typical use might look like this, though later we're going to talk about an easier way to get the same behavior:

#include <mutex>

int i;
std::mutex m; // protects write access to i

void increase() {
    m.lock();
    i++;
    m.unlock();
}

...

Fancier mutex semantics

std::recursive_mutex is like a normal mutex, but a thread that holds a lock may recursively acquire it again (as long as they recursively release it the same number of times). This might be useful if you need to acquire several locks, and sometimes some of those locks might actually be the same lock: with recursive locks, you don't need to keep track of which ones you already hold, to avoid acquiring the same lock twice.

std::shared_lock is an instantiation of a different kind of lock altogether: a readers-writer lock, or a shared-exclusive lock. The idea behind this kind of lock is that rather than just have one level of access--mutual exclusion--there are two levels of access: shared access and exclusive access. When a thread holds the lock with shared access, that guarantees that all other threads that hold the lock have shared access. When a thread holds the lock with exclusive access, that guarantees that it is the only thread to have any kind of access. We call this a readers-writer lock because the two levels of access often correspond to multiple reader threads, which require the shared data not to change while they're accessing it, but don't modify it themselves; and a single writer thread, which needs to be the only thread with access to the data it's modifying. You do not need the functionality of std::shared_lock for the problem set, and in fact it is not particularly helpful for this particular application.

The only mutex you need for the problem set is std::mutex, though std::recursive_mutex may simplify some logic for simpong61.

`std::lock` avoids deadlock

C++ also defines some handy tools to acquire locks while avoiding deadlock. Instead of calling .lock() on the mutex itself, you can pass several mutexes as arguments to the function std::lock, which magically acquires the locks in an order that avoids deadlock. For example:

#include <mutex>

std::mutex a;
std::mutex b;

void worker1() {
    std::lock(a, b);
    ...
    a.unlock(); // the order of these doesn't matter,
    b.unlock(); // since unlocking can't cause deadlock
}

void worker2() {
    std::lock(b, a); // look ma, no deadlock!
    ...
    b.unlock();
    a.unlock();
}

...

`std::unique_lock` and `std::scoped_lock` unlock for you

The std::scoped_lock and std::unique_lock objects automate some aspects of locking, because they are capable of automatically unlocking. This is really convenient—it’s very easy to forget a call to unlock if a function has multiple return statements in it.

To use a scoped lock, you declare an object at the point where you would normally call lock(). This object is typically called a guard, since it exists to “guard” the lock. You must pass in a reference to the std::mutex that you intend to lock. For example, here’s a version of the increase function from above:

#include <mutex>

int i;
std::mutex m; // protects write access to i

void increase() {
    std::unique_lock guard(m);
    i++;
}

The declaration of guard will block until m can be locked, and then m is automatically unlocked when the function returns and guard goes out of scope (the variable stops existing). This is an instance of an idiom called Resource acquisition is initialization (RAII), which is the idea that an object's resources are automatically acquired and released by its constructor and destructor, which are triggered by scope.

Here’s how std::scoped_lock (which is the multi-lock version of std::unique_lock) would simplify the code above:

#include <mutex>

std::mutex a;
std::mutex b;

void worker1() {
    std::scoped_lock guard(a, b);
    ...
}

void worker2() {
    std::scoped_lock guard(b, a);
    ...
}

...

The std::unique_lock has .lock(), .try_lock(), and .unlock() member functions, in addition to the initialization/deinitialization style, and these can be useful sometimes. (The std::scoped_lock has no member functions at all: you can only use it by declaring it and then letting it go out of scope later.)

Condition variables in C++

Suppose you have several threads waiting for some event to happen, which doesn't neatly correspond to a single unlock event. (For example, to go back to the bathroom example, we might want people lining up to wait for any stall to open, not some particular stall door to be unlocked.) We could achieve this behavior naively by polling, but we've seen that this is wasteful. Condition variables are a synchronization primitive built on top of mutexes that allows us to form a queue of threads, which wait in line for a wake up call. (Note that condition variables usually do not actually guarantee that they function as FIFO queues; an arbitrary thread may be the head of the queue.)

The abstract interface for a condition variable is:

wait: go to sleep on the condition variable and a held lock until signaled
notify one or signal: wake up one thread waiting on the CV
notify all or broadcast: wake up all threads waiting on the CV

Once we create a condition variable std::condition_variable cv in C++, we can call:

cv.wait(std::unique_lock<std::mutex>& lock)
cv.notify_one()
cv.notify_all()

Notice that cv.wait requires that you use std::unique_lock to lock your mutex. The lock must be locked when calling wait(). wait() unlocks the associated mutex while waiting, and re-locks the mutex before it returns.

Condition variables are almost always used in loops that look like this:

std::unique_lock<std::mutex> lock;
...
while (!CONDITION) {
    cv.wait(lock);
}

In this code:

CONDITION is some expression, such as phase != 1.
The CONDITION can safely be evaluated when lock is locked.
cv is associated with CONDITION in the following way: Any code that might change the truth value of CONDITION will call cv.notify_all().

The loop is important because in most cases, a condition variable wait can wake up spuriously—in other words, by the time the wait operation returns, the condition has gone back to false.

If you want to use a condition variable with something other than a std::mutex, you'll have to use std::condition_variable_any instead.

Synchronization: Counting to three

This program is called onetwothree.cc:

#include <thread>
#include <mutex>
#include <condition_variable>
#include <cstdio>

/* G1 */

void a() {
    /* A1 */
    printf("1\n");
    /* A2 */
}

void b() {
    /* B1 */
    printf("2\n");
    /* B2 */
}

void c() {
    /* C1 */
    printf("3\n");
    /* C2 */
}

int main() {
    /* M1 */
    std::thread at(a);
    std::thread bt(b);
    std::thread ct(c);
    /* M2 */
    ct.join();
    bt.join();
    at.join();
}

Exercise: What are the possible outputs of this program as is?

Exercise: The program will never exit with the output 1\n2\n. Why not?

Exercise: Add code to this program so that the program only prints 1\n2\n3\n. Your code must only use atomics, and you can only change the locations marked with comments.

G1: std::atomic<int> phase;
A2: phase.store(1);
B1: while (phase.load() != 1) {}
B2: phase.store(2);
C1: while (phase.load() != 2) {}

Exercise: Does your atomics-based code wait using blocking or polling?

Exercise: Add code to this program so that the program only prints 1\n2\n3\n. Your code must only use std::mutex and/or std::condition_variable, and you must not ever unlock a mutex in a thread different from the thread that locked the mutex. Again, only change the locations marked with comments.

G1: int phase = 0; std::mutex mutex; std::condition_variable cv;

A2: std::unique_lock<std::mutex> guard(mutex);
    phase = 1;
    cv.notify_all();

B1: std::unique_lock<std::mutex> guard(mutex);
    while (phase != 1) {
        cv.wait(lock);
    }

B2: phase = 2;
    cv.notify_all();

C1: std::unique_lock<std::mutex> guard(mutex);
    while (phase != 2) {
        cv.wait(lock);
    }

Deadlock

A deadlock is a situation in which two or more threads block forever, because each thread is waiting for a resource that’s held by another thread.

Deadlock in multithreaded programming commonly involves mutual exclusion locks. For instance, thread 1 might have mutex m1 locked while it tries to acquire mutex m2, while thread 2 has m2 locked while it tries to acquire m1. Neither thread will ever make progress because each is waiting for the other.

There are simple ways to avoid deadlock in practice. One is to lock at most one mutex at a time: no deadlock! Another is to have a fixed order in which mutexes are locked, called a lock order. For instance, given lock order m1, m2, then thread 2 above cannot happen (it’s acquiring locks against the lock order).

Deadlocks can also occur with other resources, such as space in pipe buffers (remember the extra credit in pset 3?) and even pavement.

Bathroom synchronization

Ah, the bathroom. Does anything deserve more careful consideration? Is anything more suitable for poop jokes?

Bathroom 1: Stall privacy

In this exercise, the setting is a bathroom with K stalls numbered 0 through K–1. A number of users enter the bathroom; each has a preferred stall. Each user can be modeled as an independent thread that repeatedly calls the following code:

void user(int preferred_stall) {
    poop_into(preferred_stall);
}

Each user has a preferred stall that never changes, and two users should never poop_into the same stall at the same time.

You probably can intuitively smell that this bathroom model has a synchronization problem. Answer the following questions without referring to the solutions first.

Question: What synchronization property is missing?

Question: Provide synchronization using a single mutex, i.e., using coarse-grained locking. Your solution should be correct (it should avoid gross conditions [which are race conditions in the context of a bathroom-themed synchronization problem]), but it need not be efficient. Your solution will consist of (1) a definition for the mutex, which must be a global variable, and (2) new code for user(). You don’t need to sweat the syntax.

Question: Provide synchronization using fine-grained locking. Your solution should be more efficient than the previous solution. Specifically, it should allow the bathroom to achieve full utilization: it should be possible for all K stalls to be in use at the same time. You will need to change both the global mutex—there will be more than one global mutex—and the code for user().

Bathroom 2: Stall indifference

The setting is the same, except that now users have no predetermined preferred stall:

void user() {
    int preferred_stall = random() % K;
    poop_into(preferred_stall);
}

Question: Adapt your solution from Bathroom 1 to work for this variant.

Question: As written, the user code picks a preferred stall and then waits for that stall to become available. This isn’t as efficient as possible, since multiple users might pick the same stall by chance. Write a synchronization solution where each user chooses any available stall. First, use a single mutex coupled with a global data structure representing stall occupancy.

std::mutex bmutex;
bool occupied[K];

void user() {
    int preferred_stall = -1;
    while (preferred_stall == -1) {
        std::unique_lock<std::mutex> guard(bmutex);
        for (int stall = 0; stall != K; ++stall) {
            if (!occupied[stall]) {
                preferred_stall = stall;
                occupied[stall] = true;
                break;
            }
        }
    }

    poop_into(preferred_stall);

    std::unique_lock<std::mutex> guard(bmutex);
    occupied[preferred_stall] = false;
}

Question: Repeat this function, but this time use multiple mutexes and no other global data structure.

std::mutex smutex[K];

void user() {
    while (true) {
        int preferred_stall = random() % K;
        if (smutex[preferred_stall].try_lock()) {
            poop_into(preferred_stall);
            smutex[preferred_stall].unlock();
            return;
        }
    }
}

Question: Compare these two implementations in terms of the number of std::mutex lock attempts it might take to poop.

Question: These solutions implement a polling style. Why?

Question: Implement a blocking implementation. You will need a condition_variable.

std::mutex bmutex;
std::condition_variable cv;
bool occupied[K];

int find_unoccupied_stall() {
    for (int i = 0; i != K; ++i) {
        if (!occupied[i]) {
            return i;
        }
    }
    return -1;
}

void user() {
    std::unique_lock<std::mutex> lock(bmutex);
    int preferred_stall;
    while (true) {
        preferred_stall = find_unoccupied_stall();
        if (preferred_stall != -1) {
            occupied[preferred_stall] = true;
            break;
        }
        cv.wait(lock);
    }
    lock.unlock();

    poop_into(preferred_stall);

    lock.lock();
    occupied[preferred_stall] = false;
    cv.notify_all();
    lock.unlock();
}

Bathroom 3: Pooping philosophers

To facilitate the communal philosophical life, Plato decreed that philosophy should be performed in a bathroom with K thrones and K curtains, arranged like so:

Each philosopher sits on a throne with two curtains, one on each side. The philosophers normally argue with one another. But sometimes a philosopher’s got to poop, and to poop one needs privacy. A pooping philosopher must first draw the curtains on either side; philosopher X will need curtains X and (X+1) mod K. For instance, here, philosophers 0 and 2 can poop simultaneously, but philosopher 4 cannot poop until philosopher 0 is done (because philosopher 4 needs curtain 0), and philosopher 1 cannot poop until philosophers 0 and 2 are both done:

Philosophers’ bathroom with curtains drawn

Question: In high-level terms, what is the mutual exclusion problem here—that is, what resources are subject to mutual exclusion?

A philosopher could be described like this:

void philosopher(int x) {
    while (true) {
        argue();
        draw curtains x and (x + 1) % K;
        poop();
        release curtains x and (x + 1) % K;
    }
}

Question: Write a correctly-synchronized version of this philosopher. Your solution should not suffer from deadlock, and philosophers should block (not poll) while waiting to poop. First, use std::scoped_lock—which can (magically) lock more than one mutex at a time while avoiding deadlock.

Question: Can you do it without std::scoped_lock?

Sure; we just enforce our own lock order.

std::mutex curtains[K];
void philosopher(int x) {
    while (true) {
        argue();
        int l1 = std::min(x, (x + 1) % K);
        int l2 = std::max(x, (x + 1) % K);
        std::unique_lock<std::mutex> guard1(curtains[l1]);
        std::unique_lock<std::mutex> guard2(curtains[l2]);
        poop();
    }
}