Threads and Concurrency
Thread Overview
A thread is a logical flow of control that runs in the context of a process
- A process can have multiple threads, each of which can be run on a
separate CPU in order to speed up execution of the process
- However, in order to be meaningfully faster, one must be able to break the process into distinct parts that do not heavily rely on each other, if at all. When doing CPU bound tasks, it only makes sense to have as many threads as CPUs, as otherwise you will have to pay the cost of scheduling all the threads but will not be able to actually run them at once. However, if you have a mix of CPU and I/O tasks in your program, it can make sense to have more threads than cores, so that while some thread is waiting I/O, the other threads can be doing useful work on the CPU.
- Threads within the same process share the same virtual address space and file descriptors
- Each thread has its own stack, stack pointer, program counter, and other CPU registers
- A thread is the unit that the OS uses to schedule the processor
The following figure shows the memory organization of a single-threaded and a multi-threaded process.
Useful POSIX Thread APIs
This link to the pthread
library
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/pthread.h.html
- provides an API to create/destroy threads
- provides spinlock, mutex, and CV API
- note that parts of the API may or may not be implemented, depending on
the flavor of UNIX you are using (e.g., macOS does not have
pthread_spin_\*
)
Creating and Destroying Threads
int pthread_create(
pthread_t *restrict thread,
const pthread_attr_t *restrict attr,
void *(*start_routine)(void *),
void *restrict arg);
)
thread
is a pointer to a thread-ID variable with typepthread_t
. This variable is often local (but doesn’t have to be). Once the thread is created,pthread_create
will set\*thread
to equal the ID of the newly-created thread.attr
are optional attributes; we’ll always passNULL
.- When the thread starts executing, it will call
start_routine(arg)
. Thus,start_routine
is the thread function, andarg
is an argument used to pass local data.
void pthread_exit(void* value_ptr)
- Terminate the thread with a value that can be returned when another
thread calls
pthread_join
on this thread’s ID
int pthread_join(pthread_t thread, void** value_ptr)
- Wait for the
thread
to exit, collect the exit value and store it in\*value_ptr
(passNULL
forvalue_ptr
if you don’t care about the exit value), and then clean up the exited thread’s resources
int pthread_detach(pthread_t pthread)
- Detach the thread so that when that
pthread
exits, resources are automatically cleaned up - Cannot call
pthread_join
on a detached thread
For more details and examples on how to use these functions, you can
access their man
pages, e.g. man pthread_create
.
Important Note
In order to avoid segmentation faults, when using pthread_create
,
the arguments are passed as a void\*
— a generic pointer! This has
two major consequences of which you should be aware:
- Anything can be passed in: any pointer can be autocast to a
void\*
, and there is consequently no typechecking when you pass arguments viapthread_create
. - The
void\*
argument ofstart_routine
can be cast to anything: avoid\*
can also be cast to any other pointer type without compiler warnings.
This means that you, as the programmer, must explicitly be aware of
the types being passed in. If you expect a size_t\*
going in, then
you must manually ensure that all calls pass in a size_t\*
. The
compiler will not do this for you. Otherwise, you might cast into the
wrong type, and consequently encounter segmentation faults left and
right.
Synchronization
Concurrency can cause race conditions, an undesirable side effect. Therefore, we need to synchronize access to shared data across threads within our program. To do so, we must ensure mutual exclusion in critical sections:
- Critical section: a piece of code that accesses shared resource that must not be concurrently accessed by more than one thread
- Mutual exclusion: property of code such that no two threads are in the same critical section at the same time.
Mutual exclusion requires hardware support (e.g. atomic test and set, atomic compare and swap, load-link/store-conditional). Given this basic hardware support, we can build higher level synchronization primitives. We will discuss 5 types of synchronization objects next: mutex, condition variable, spinlock, semaphore, and monitor.
Mutex (Mutual exclusion lock)
- Only one thread can “own” the mutex (be between lock & unlock) at any given moment
- Only the owning thread, which locked the mutex, may unlock the mutex
- Other threads will block until they can acquire the mutex
- Used to provide mutual exclusion on critical sections
Operations on a mutex (we give the general operation first, then the
pthread
-specific function name):
lock(mutex \*)
: block until the mutex is unlocked, then atomically lock it (and set its state as “locked by this thread”).- e.g.,
pthread_mutex_lock(pthread_mutex_t\*);
- e.g.,
unlock(mutex \*)
: the mutex must be locked by this thread. The call unlocks the mutex (and sets its state to “unlocked”).- e.g.,
pthread_mutex_unlock(pthread_mutex_t\*);
- e.g.,
trylock(mutex \*)
: locks the mutex if it is not currently locked; otherwise returns an error.pthread_mutex_trylock(pthread_mutex_t\*)
: The pthread version returns 0 on success (i.e., the mutex is now locked by this thread), orEBUSY
on error (i.e., the mutex was locked by a different thread). This call never blocks. Note that in pthread functions, error conditions are not returned via the globalerrno
variable; rather, they are returned directly. (This is why functions likepthread_create()
return their return values via pointer arguments.)
Condition Variable
Condition variable (CV): a synchronization object that represents a
condition (a boolean expression that can be either true
or
false
)
- Condition variables allow threads to safely block until a condition
becomes
true
- They are always paired with a mutex; that is, the predicate/condition should be covered by the CV’s associated mutex, meaning that changes to the predicate’s value require holding that mutex.
- Think of a condition variable as a waiting area, where you use a mutex
to protect access to a shared resource, and others wait until that
shared resource enters a particular state (defined by the condition)
- When you wake up, you atomically grab the mutex and proceed (that’s why wait requires a mutex)
- Use signal to wake up one thread to proceed in the waiting area
- Use broadcast to wake up all threads in the waiting area
- A thread that makes the predicate become true should call
signal
orbroadcast
to inform any waiting threads. - However, due to race conditions, it is almost always possible for
the condition to become
false
again in between the signal and the moment when the waiting thread wakes up. For this reason,wait
is almost always called in a loop that checks the condition.
- Use a condition variable when you want to wait for the program to enter some specific condition, and use a lock when you simply want mutual exclusion
Operations on a CV:
wait(cv\*, mutex\*)
: the mutex must be locked by this thread. The call atomically unlocks the mutex and adds the current thread to the CV’s wait set. Once it’s woken up, the thread re-locks the mutex before returning fromwait
. What would happen if the thread slept with its mutex locked?- e.g.,
pthread_cond_wait(pthread_cond_t\*, pthread_mutex_t\*)
- e.g.,
signal(cv\*)
: wake up one thread in the CV’s wait set. It is not an error if no threads are waiting on the CV.- e.g.,
pthread_cond_signal(pthread_cond_t\*);
- e.g.,
broadcast(cv\*)
: wake up all threads in the CV’s wait set (if any).- e.g.,
pthread_cond_broadcast(pthread_cond_t\*);
- e.g.,
For example, let's understand the following code together:
myLock
is the mutex that protects the access to the shared resourcecounter
- Thread A uses a condition variable
myCV
to wait untilcounter
is no longer less than 10. When the condition becomes true, it wakes up and atomically grabmyLock
to proceed - Note that we wrap the wait() call in a
while
loop because the thread may wake up prematurely when the condition is not true - Thread B atomically increments counter using the lock. If the count
reaches 10, it signals one thread waiting on
myCV
to wake up (i.e., Thread A).
Check out this link for more details: http://cs61.seas.harvard.edu/wiki/2016/ConditionVariables
Spinlock
- A special-purpose mutex for very short critical sections—otherwise it is a waste of CPU power
- Just like a mutex, only one thread can own a spinlock at any given moment
- Just like a mutex, only the owning thread may unlock the spinlock
- Unlike mutex, which causes other threads to block while waiting to
acquire the mutex, other threads will spin (e.g.
while (1)
loop) until they can acquire the spin lock - It is used to provide mutual exclusion on short critical sections that do not put the thread to sleep, so do not use it when doing I/O
Semaphore
- Used to keep track of how many of a set of resources are available
- Binary semaphores (where the value can be either 0 or 1) are used to implement locks
- Value can only be changed using special functions:
P
decrements the semaphore if the value is non-zero, meaning a resource has been acquiredV
increments the semaphore, meaning a resource has been freed
Monitor
- A higher-level synchronization primitive
- It is a class, an object, or a module that wraps synchronization beneath an interface
- Caller does not need to perform the synchronization
- Might be familiar from Java's
synchronized
, which is a per-object monitor
Deadlock
- A deadlock is not a synchronization primitive, but rather a
situation in which two or more competing actions are each waiting for
the other to finish, and thus neither ever does. Requirements for a
deadlock:
- Mutual exclusion on resources, i.e. resources cannot be shared between threads
- Hold and wait: a thread holds onto the first resource while waiting for the next one, as opposed to giving up the first when it sees that the second is not available
- No preemption: the operating system cannot deallocate resources once they have been allocated—deallocation must be done by the thread that has the resource
- Circular wait graph: Thread 1 holds resource A and is waiting on
resource B, and Thread 2 holds resource B and is waiting for
resource A
- Key: resources are not guaranteed to be obtained in a specific order! If Thread 1 and Thread 2 both needed to obtain resource A before obtaining resource B, the wait graph would not be circular
- All of these conditions must hold true for deadlock to occur, so breaking any one of them will remove the threat of deadlock
Exercise: Bathroom Synchronization
A series of fun bathroom-themed conceptual exercises for you to get familiar with synchronization primitives!
Stall Privacy
In this exercise, the setting is a bathroom with K stalls numbered 0 through K–1. A number of users enter the bathroom; each has a preferred stall. Each user can be modeled as an independent thread running the following code:
void user(int preferred_stall) {
poop_into(preferred_stall);
}
You probably can intuitively smell that this bathroom model has a synchronization problem. Answer the following questions without referring to the solutions first.
Q1. What synchronization property is missing?
Q2. Where is the critical section in user, and when do two users conflict?
Q3. Provide synchronization using a single mutex, i.e., using
coarse-grained locking. You can use pthread
syntax for this
mutex, or make up your own syntax (as long as the meaning is clear).
Your solution should be correct (it should avoid gross conditions
[which are race conditions in the context of a bathroom-themed
synchronization problem]), but it need not be efficient. Your
solution will consist of (1) a definition for the mutex, which must be a
global variable, and (2) new code for user()
.
Q4. Provide synchronization using fine-grained locking. Your
solution should be more efficient than the previous solution.
Specifically, it should allow the bathroom to achieve full
utilization: it should be possible for all K stalls to be in use at the
same time. You will need to change both the global mutex — there will be
more than one global mutex — and the code for user()
.
Stall choice
We use the same setting as above, except that users have no preferred stall:
void user(void) {
int preferred_stall = random() % K;
poop_into(preferred_stall);
}
Q5. Adapt this function to use your synchronization solution from Q4.
Q6. As written, user
picks a preferred stall and then waits for
that stall to become available. This isn’t as efficient as possible,
since multiple users might pick the same stall by chance. Write a
synchronization solution where each user chooses any available stall.
First, use a single mutex coupled with a global data structure
representing stall occupancy.
Any correct solution will contain at least two different calls to
lock
(or trylock
) functions and two different calls to
unlock
functions. A solution with only one lock call has either a
race condition, a deadlock, or underutilization (at most one user in
the bathroom at a time).
Q7. Second, repeat your solution using multiple mutexes and no additional global variables.
Q8. Compare these solutions in terms of the number of synchronization calls made per user.
Bathroom capacity
Your solution in Q8 allows an arbitrary number of users to wait in the bathroom for a stall. The fire marshal says no no: at most K + 4 occupants are allowed at a time (one per stall, plus 4 using the sink). We want a user model like this:
void user(void) {
while (1) {
if (the bathroom has fewer than K+4 occupants) {
enter bathroom;
poop in empty stall;
leave bathroom;
return;
}
}
}
Q9. Which parts of this user model must be executed atomically to avoid race conditions where the bathroom has more than K+4 occupants? That is, which critical section avoids overoccupancy?
Q10. Write a synchronization solution that avoids both overoccupancy and gross (race) conditions. Use only one or more mutexes — no condition variables. Tell us what global state you use.
As synchronization problems get more complicated, more code is required to solve them. You may want to sketch pseudocode or copy and paste working code from earlier exercises.
Q11. Is your solution blocking or polling?
Q12. Improve your solution’s CPU usage by adding one or more condition variables. In your solution, a user thread should block until there’s a very good chance that the bathroom has fewer than K+4 occupants. Specify what predicate is associated with each condition variable.
Plumbing protection
Q13. Harvard has old plumbing, and if more than 3 users flush at the same time, the bathroom will explode. Write code that avoids this gross condition, starting from your solution to Q12. Your user threads should block until they have a chance to flush, and it should be possible for 3 users to flush simultaneously. This will involve another condition variable (why?).
void user(void) {
while (1) {
if (the bathroom has fewer than K+4 occupants) {
enter bathroom;
poop in empty stall;
while (3 or more users are flushing) {
block; // heh
}
flush;
leave bathroom;
return;
}
}
}
Poopers and cleaners
Bathrooms sometimes need to be cleaned. In this version of the synchronization problem, there are two kinds of users, poopers and cleaners. A bathroom may have one or more poopers in it, or one or more cleaners, but it should never have both a pooper and a cleaner. The threads can be modeled this way:
void pooper(void) {
while (1) {
if (the bathroom has no cleaners) {
enter bathroom;
poop;
leave bathroom;
return;
}
}
}
void cleaner(void) {
while (1) {
if (the bathroom has no poopers) {
enter bathroom;
clean;
leave bathroom;
return;
}
}
}
For simplicity’s sake, we’ve dropped most of the earlier parts (stall synchronization, bathroom capacity, and plumbing protection).
Q14. Write a solution for this problem that involves mutexes only. As usual, this solution will cause some users to poll.
Q15. Write a solution for this problem involving one condition variable. A user should block until they have some chance to enter the bathroom. However, since you have only one condition variable, your solution will necessarily involve false wakeups (you may wake up a user who cannot enter the bathroom yet).
Q16. Many solutions for Q15 should involve the broadcast operation, and would go wrong if they used signal. Does your solution have this property? Explain.
Q17. Write a solution for this problem involving two condition variables.
Pooping philosophers
To facilitate the communal philosophical life, Plato decreed that philosophy should be performed in a bathroom with K thrones and K curtains, arranged like so:
Each philosopher sits on a throne with two curtains, one on each side. The philosophers normally argue with one another. But sometimes a philosopher’s got to poop, and to poop one needs privacy. A pooping philosopher must first draw the curtains on either side; philosopher X will need curtains X and (X+1) mod K. For instance, here, philosophers 0 and 2 can poop simultaneously, but philosopher 4 cannot poop until philosopher 0 is done (because philosopher 4 needs curtain 0), and philosopher 1 cannot poop until philosophers 0 and 2 are both done:
Q18. In high-level terms, what is the mutual exclusion problem here—that is, what resources are subject to mutual exclusion?
Q19. A philosopher could be described like this:
void philosopher(int x) {
while (1) {
argue();
draw curtains x and (x + 1) % K;
poop();
release curtains x and (x + 1) % K;
}
}
Write a correctly-synchronized version of this philosopher. Your solution should not suffer from deadlock, and philosophers should block (not poll) when they are waiting to poop.
We strongly suggest you attempt each question before checking the answers!
Solutions: https://cs61.seas.harvard.edu/wiki/2017/Section12Solutions