Threads and Concurrency

Thread Overview

A thread is a logical flow of control that runs in the context of a process

A process can have multiple threads, each of which can be run on a separate CPU in order to speed up execution of the process
- However, in order to be meaningfully faster, one must be able to break the process into distinct parts that do not heavily rely on each other, if at all. When doing CPU bound tasks, it only makes sense to have as many threads as CPUs, as otherwise you will have to pay the cost of scheduling all the threads but will not be able to actually run them at once. However, if you have a mix of CPU and I/O tasks in your program, it can make sense to have more threads than cores, so that while some thread is waiting I/O, the other threads can be doing useful work on the CPU.
Threads within the same process share the same virtual address space and file descriptors
Each thread has its own stack, stack pointer, program counter, and other CPU registers
A thread is the unit that the OS uses to schedule the processor

The following figure shows the memory organization of a single-threaded and a multi-threaded process.

Useful POSIX Thread APIs

This link to the pthread library http://pubs.opengroup.org/onlinepubs/009695399/basedefs/pthread.h.html

provides an API to create/destroy threads
provides spinlock, mutex, and CV API
note that parts of the API may or may not be implemented, depending on the flavor of UNIX you are using (e.g., macOS does not have pthread_spin_\*)

Creating and Destroying Threads

int pthread_create(
  pthread_t *restrict thread,
  const pthread_attr_t *restrict attr,
  void *(*start_routine)(void *),
  void *restrict arg);
)

thread is a pointer to a thread-ID variable with type pthread_t. This variable is often local (but doesn’t have to be). Once the thread is created, pthread_create will set \*thread to equal the ID of the newly-created thread.
attr are optional attributes; we’ll always pass NULL.
When the thread starts executing, it will call start_routine(arg). Thus, start_routine is the thread function, and arg is an argument used to pass local data.

void pthread_exit(void* value_ptr)

Terminate the thread with a value that can be returned when another thread calls pthread_join on this thread’s ID

int pthread_join(pthread_t thread, void** value_ptr)

Wait for the thread to exit, collect the exit value and store it in \*value_ptr (pass NULL for value_ptr if you don’t care about the exit value), and then clean up the exited thread’s resources

int pthread_detach(pthread_t pthread)

Detach the thread so that when that pthread exits, resources are automatically cleaned up
Cannot call pthread_join on a detached thread

For more details and examples on how to use these functions, you can access their man pages, e.g. man pthread_create.

Important Note

In order to avoid segmentation faults, when using pthread_create, the arguments are passed as a void\* — a generic pointer! This has two major consequences of which you should be aware:

Anything can be passed in: any pointer can be autocast to a void\*, and there is consequently no typechecking when you pass arguments via pthread_create.
The void\* argument of start_routine can be cast to anything: a void\* can also be cast to any other pointer type without compiler warnings.

This means that you, as the programmer, must explicitly be aware of the types being passed in. If you expect a size_t\* going in, then you must manually ensure that all calls pass in a size_t\*. The compiler will not do this for you. Otherwise, you might cast into the wrong type, and consequently encounter segmentation faults left and right.

Synchronization

Concurrency can cause race conditions, an undesirable side effect. Therefore, we need to synchronize access to shared data across threads within our program. To do so, we must ensure mutual exclusion in critical sections:

Critical section: a piece of code that accesses shared resource that must not be concurrently accessed by more than one thread
Mutual exclusion: property of code such that no two threads are in the same critical section at the same time.

Mutual exclusion requires hardware support (e.g. atomic test and set, atomic compare and swap, load-link/store-conditional). Given this basic hardware support, we can build higher level synchronization primitives. We will discuss 5 types of synchronization objects next: mutex, condition variable, spinlock, semaphore, and monitor.

Mutex (Mutual exclusion lock)

Only one thread can “own” the mutex (be between lock & unlock) at any given moment
Only the owning thread, which locked the mutex, may unlock the mutex
Other threads will block until they can acquire the mutex
Used to provide mutual exclusion on critical sections

Operations on a mutex (we give the general operation first, then the pthread-specific function name):

lock(mutex \*): block until the mutex is unlocked, then atomically lock it (and set its state as “locked by this thread”).
- e.g., pthread_mutex_lock(pthread_mutex_t\*);
unlock(mutex \*): the mutex must be locked by this thread. The call unlocks the mutex (and sets its state to “unlocked”).
- e.g., pthread_mutex_unlock(pthread_mutex_t\*);
trylock(mutex \*): locks the mutex if it is not currently locked; otherwise returns an error.
- pthread_mutex_trylock(pthread_mutex_t\*): The pthread version returns 0 on success (i.e., the mutex is now locked by this thread), or EBUSY on error (i.e., the mutex was locked by a different thread). This call never blocks. Note that in pthread functions, error conditions are not returned via the global errno variable; rather, they are returned directly. (This is why functions like pthread_create() return their return values via pointer arguments.)

Condition Variable

Condition variable (CV): a synchronization object that represents a condition (a boolean expression that can be either true or false)

Condition variables allow threads to safely block until a condition becomes true
They are always paired with a mutex; that is, the predicate/condition should be covered by the CV’s associated mutex, meaning that changes to the predicate’s value require holding that mutex.
Think of a condition variable as a waiting area, where you use a mutex to protect access to a shared resource, and others wait until that shared resource enters a particular state (defined by the condition)
- When you wake up, you atomically grab the mutex and proceed (that’s why wait requires a mutex)
- Use signal to wake up one thread to proceed in the waiting area
- Use broadcast to wake up all threads in the waiting area
- A thread that makes the predicate become true should call signal or broadcast to inform any waiting threads.
- However, due to race conditions, it is almost always possible for the condition to become false again in between the signal and the moment when the waiting thread wakes up. For this reason, wait is almost always called in a loop that checks the condition.
Use a condition variable when you want to wait for the program to enter some specific condition, and use a lock when you simply want mutual exclusion

Operations on a CV:

wait(cv\*, mutex\*): the mutex must be locked by this thread. The call atomically unlocks the mutex and adds the current thread to the CV’s wait set. Once it’s woken up, the thread re-locks the mutex before returning from wait. What would happen if the thread slept with its mutex locked?
- e.g., pthread_cond_wait(pthread_cond_t\*, pthread_mutex_t\*)
signal(cv\*): wake up one thread in the CV’s wait set. It is not an error if no threads are waiting on the CV.
- e.g., pthread_cond_signal(pthread_cond_t\*);
broadcast(cv\*): wake up all threads in the CV’s wait set (if any).
- e.g., pthread_cond_broadcast(pthread_cond_t\*);

For example, let's understand the following code together:

myLock is the mutex that protects the access to the shared resource counter
Thread A uses a condition variable myCV to wait until counter is no longer less than 10. When the condition becomes true, it wakes up and atomically grab myLock to proceed
Note that we wrap the wait() call in a while loop because the thread may wake up prematurely when the condition is not true
Thread B atomically increments counter using the lock. If the count reaches 10, it signals one thread waiting on myCV to wake up (i.e., Thread A).

Check out this link for more details: http://cs61.seas.harvard.edu/wiki/2016/ConditionVariables

Spinlock

A special-purpose mutex for very short critical sections—otherwise it is a waste of CPU power
Just like a mutex, only one thread can own a spinlock at any given moment
Just like a mutex, only the owning thread may unlock the spinlock
Unlike mutex, which causes other threads to block while waiting to acquire the mutex, other threads will spin (e.g. while (1) loop) until they can acquire the spin lock
It is used to provide mutual exclusion on short critical sections that do not put the thread to sleep, so do not use it when doing I/O

Semaphore

Used to keep track of how many of a set of resources are available
Binary semaphores (where the value can be either 0 or 1) are used to implement locks
Value can only be changed using special functions:
- P decrements the semaphore if the value is non-zero, meaning a resource has been acquired
- V increments the semaphore, meaning a resource has been freed

Monitor

A higher-level synchronization primitive
It is a class, an object, or a module that wraps synchronization beneath an interface
Caller does not need to perform the synchronization
Might be familiar from Java's synchronized, which is a per-object monitor

Deadlock

A deadlock is not a synchronization primitive, but rather a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. Requirements for a deadlock:
- Mutual exclusion on resources, i.e. resources cannot be shared between threads
- Hold and wait: a thread holds onto the first resource while waiting for the next one, as opposed to giving up the first when it sees that the second is not available
- No preemption: the operating system cannot deallocate resources once they have been allocated—deallocation must be done by the thread that has the resource
- Circular wait graph: Thread 1 holds resource A and is waiting on resource B, and Thread 2 holds resource B and is waiting for resource A
  - Key: resources are not guaranteed to be obtained in a specific order! If Thread 1 and Thread 2 both needed to obtain resource A before obtaining resource B, the wait graph would not be circular
All of these conditions must hold true for deadlock to occur, so breaking any one of them will remove the threat of deadlock

Exercise: Bathroom Synchronization

A series of fun bathroom-themed conceptual exercises for you to get familiar with synchronization primitives!

Stall Privacy

In this exercise, the setting is a bathroom with K stalls numbered 0 through K–1. A number of users enter the bathroom; each has a preferred stall. Each user can be modeled as an independent thread running the following code:

void user(int preferred_stall) {
    poop_into(preferred_stall);
}

You probably can intuitively smell that this bathroom model has a synchronization problem. Answer the following questions without referring to the solutions first.

Q1. What synchronization property is missing?

Q2. Where is the critical section in user, and when do two users conflict?

Q3. Provide synchronization using a single mutex, i.e., using coarse-grained locking. You can use pthread syntax for this mutex, or make up your own syntax (as long as the meaning is clear). Your solution should be correct (it should avoid gross conditions [which are race conditions in the context of a bathroom-themed synchronization problem]), but it need not be efficient. Your solution will consist of (1) a definition for the mutex, which must be a global variable, and (2) new code for user().

Q4. Provide synchronization using fine-grained locking. Your solution should be more efficient than the previous solution. Specifically, it should allow the bathroom to achieve full utilization: it should be possible for all K stalls to be in use at the same time. You will need to change both the global mutex — there will be more than one global mutex — and the code for user().

Stall choice

We use the same setting as above, except that users have no preferred stall:

void user(void) {
    int preferred_stall = random() % K;
    poop_into(preferred_stall);
}

Q5. Adapt this function to use your synchronization solution from Q4.

Q6. As written, user picks a preferred stall and then waits for that stall to become available. This isn’t as efficient as possible, since multiple users might pick the same stall by chance. Write a synchronization solution where each user chooses any available stall. First, use a single mutex coupled with a global data structure representing stall occupancy.

Any correct solution will contain at least two different calls to lock (or trylock) functions and two different calls to unlock functions. A solution with only one lock call has either a race condition, a deadlock, or underutilization (at most one user in the bathroom at a time).

Q7. Second, repeat your solution using multiple mutexes and no additional global variables.

Q8. Compare these solutions in terms of the number of synchronization calls made per user.

Bathroom capacity

Your solution in Q8 allows an arbitrary number of users to wait in the bathroom for a stall. The fire marshal says no no: at most K + 4 occupants are allowed at a time (one per stall, plus 4 using the sink). We want a user model like this:

void user(void) {
    while (1) {
        if (the bathroom has fewer than K+4 occupants) {
            enter bathroom;
            poop in empty stall;
            leave bathroom;
            return;
        }
    }
}

Q9. Which parts of this user model must be executed atomically to avoid race conditions where the bathroom has more than K+4 occupants? That is, which critical section avoids overoccupancy?

Q10. Write a synchronization solution that avoids both overoccupancy and gross (race) conditions. Use only one or more mutexes — no condition variables. Tell us what global state you use.

As synchronization problems get more complicated, more code is required to solve them. You may want to sketch pseudocode or copy and paste working code from earlier exercises.

Q11. Is your solution blocking or polling?

Q12. Improve your solution’s CPU usage by adding one or more condition variables. In your solution, a user thread should block until there’s a very good chance that the bathroom has fewer than K+4 occupants. Specify what predicate is associated with each condition variable.

Plumbing protection

Q13. Harvard has old plumbing, and if more than 3 users flush at the same time, the bathroom will explode. Write code that avoids this gross condition, starting from your solution to Q12. Your user threads should block until they have a chance to flush, and it should be possible for 3 users to flush simultaneously. This will involve another condition variable (why?).

void user(void) {
    while (1) {
        if (the bathroom has fewer than K+4 occupants) {
            enter bathroom;
            poop in empty stall;
            while (3 or more users are flushing) {
                block; // heh
            }
            flush;
            leave bathroom;
            return;
        }
    }
}

Poopers and cleaners

Bathrooms sometimes need to be cleaned. In this version of the synchronization problem, there are two kinds of users, poopers and cleaners. A bathroom may have one or more poopers in it, or one or more cleaners, but it should never have both a pooper and a cleaner. The threads can be modeled this way:

void pooper(void) {
    while (1) {
        if (the bathroom has no cleaners) {
            enter bathroom;
            poop;
            leave bathroom;
            return;
        }
    }
}

void cleaner(void) {
    while (1) {
        if (the bathroom has no poopers) {
            enter bathroom;
            clean;
            leave bathroom;
            return;
        }
    }
}

For simplicity’s sake, we’ve dropped most of the earlier parts (stall synchronization, bathroom capacity, and plumbing protection).

Q14. Write a solution for this problem that involves mutexes only. As usual, this solution will cause some users to poll.

Q15. Write a solution for this problem involving one condition variable. A user should block until they have some chance to enter the bathroom. However, since you have only one condition variable, your solution will necessarily involve false wakeups (you may wake up a user who cannot enter the bathroom yet).

Q16. Many solutions for Q15 should involve the broadcast operation, and would go wrong if they used signal. Does your solution have this property? Explain.

Q17. Write a solution for this problem involving two condition variables.

Pooping philosophers

To facilitate the communal philosophical life, Plato decreed that philosophy should be performed in a bathroom with K thrones and K curtains, arranged like so:

Each philosopher sits on a throne with two curtains, one on each side. The philosophers normally argue with one another. But sometimes a philosopher’s got to poop, and to poop one needs privacy. A pooping philosopher must first draw the curtains on either side; philosopher X will need curtains X and (X+1) mod K. For instance, here, philosophers 0 and 2 can poop simultaneously, but philosopher 4 cannot poop until philosopher 0 is done (because philosopher 4 needs curtain 0), and philosopher 1 cannot poop until philosophers 0 and 2 are both done:

Q18. In high-level terms, what is the mutual exclusion problem here—that is, what resources are subject to mutual exclusion?

Q19. A philosopher could be described like this:

void philosopher(int x) {
    while (1) {
        argue();
        draw curtains x and (x + 1) % K;
        poop();
        release curtains x and (x + 1) % K;
    }
}

Write a correctly-synchronized version of this philosopher. Your solution should not suffer from deadlock, and philosophers should block (not poll) when they are waiting to poop.

We strongly suggest you attempt each question before checking the answers!

Solutions: https://cs61.seas.harvard.edu/wiki/2017/Section12Solutions