Synchronization exercises

Exercises not as directly relevant to this year’s class are marked with ⚠️.

SYNCH-1. Threads

The following code performs a matrix multiplication, c = ab, where a, b, and c are all square matrices of dimension sz. It uses the cache-friendly ikj index ordering.

struct square_matrix {
    size_t sz;
    double* v;    // array of `sz * sz` doubles, row-major order

    double& elt(size_t i, size_t j) {
        assert(i < this->sz && j < this->sz);
        return this->v[i * this->sz + j];
    }
};

void matrix_multiply(square_matrix& c, square_matrix& a, square_matrix& b) {
    assert(a.sz == b.sz && b.sz == c.sz);
    for (size_t x = 0; x != c.sz * c.sz; ++x) {
        c.v[x] = 0.0;
    }
    for (size_t i = 0; i != c.sz; ++i) {
        for (size_t k = 0; k != c.sz; ++k) {
            for (size_t j = 0; j != c.sz; ++j) {
                c.elt(i, j) += a.elt(i, k) * b.elt(k, j);
            }
        }
    }
}

But matrix multiplication is a naturally parallelizable problem.

QUESTION SYNCH-1A. Complete this code, which should perform the multiplication using c.sz parallel threads, one per row of c.

void matrix_multiply_thread(square_matrix& c, square_matrix& a, square_matrix& b,
                            size_t i) {
    /* YOUR CODE HERE */
}

void matrix_multiply_p(square_matrix& c, square_matrix& a, square_matrix& b) {
    assert(a.sz == b.sz && b.sz == c.sz);
    for (size_t x = 0; x != c.sz * c.sz; ++x) {
        c.v[x] = 0.0;
    }
    std::vector<std::thread> ts;
    for (size_t i = 0; i != c.sz; ++i) {
        ts.push_back(std::thread(matrix_multiply_thread, std::ref(c), std::ref(a), std::ref(b), i));
    }
    for (size_t i = 0; i != c.sz; ++i) {
        ts[i].join();
    }
}

For the next two parts, consider this alternate code for parallel matrix_multiply, and assume a correct matrix_multiply_thread implementation.

void matrix_multiply_alt(square_matrix& c, square_matrix& a, square_matrix& b) {
    assert(a.sz == b.sz && b.sz == c.sz);
    for (size_t x = 0; x != c.sz * c.sz; ++x) {
        c.v[x] = 0.0;
    }
    std::vector<std::thread> ts;
    for (size_t i = 0; i != c.sz; ++i) {
        ts.push_back(std::thread(matrix_multiply_thread, std::ref(c), std::ref(a), std::ref(b), i));
        ts[i].join();
    }
}

QUESTION SYNCH-1B. True or false? matrix_multiply_alt produces correct results.

QUESTION SYNCH-1C. How does matrix_multiply_alt differ from matrix_multiply_p?

QUESTION SYNCH-1D. On single-core machines, the kij order performs almost as fast as the ikj order. Describe the changes that would be required to make matrix_multiply_p and matrix_multiply_thread use kij order (and produce correct results).

SYNCH-2. Synchronization and concurrency

Most synchronization objects have at least two operations. Mutual-exclusion locks support lock and unlock; condition variables support wait and signal. One of the earliest synchronization objects invented, the semaphore, supports P and V.

In this problem, you’ll work with a synchronization object with only one operation, which we call a hemiphore. Hemiphores behave like the following; it is very important that you understand this pseudocode.

struct hemiphore {
    int value = 0;

    // Block until the hemiphore has `this->value >= bound`, then ATOMICALLY
    // increment its value by `delta`.
    void H(int bound, int delta) {
        // This is pseudocode; a real hemiphore implementation would block, not spin, and would
        // ensure that the test and the increment happen in one atomic step.
        // You may assume that `this->value` never overflows.
        while (this->value < bound) {
            sched_yield();
        }
        this->value += delta;
    }
};

Application code should access the hemiphore only through the H operation.

QUESTION SYNCH-2A. Use hemiphores to implement mutual-exclusion locks. Fill out the code below. (You may not need to fill in every empty slot. You may use standard C constants; for example, INT_MIN is the smallest possible value for a variable of type int, which on an x86-64 machine is −2147483648.)

struct mutex {
    hemiphore h;
    // YOUR CODE HERE (if necessary)

    // Initialize the mutex.
    mutex() {
        // YOUR CODE HERE (if necessary)
    }

    // Lock the mutex.
    void lock() {
        // YOUR CODE HERE
    }

    // Unlock the mutex, which must be locked.
    void unlock() {
        // YOUR CODE HERE
    }
};

QUESTION SYNCH-2B. Use hemiphores to implement condition variables. Fill out the code below. You may assume that the implementation of mutex is your hemiphore-based implementation from above (so, for instance, wait may access the hemiphore m.h). See the Hints at the end of the question.

struct condition_variable {
    mutex internal;
    hemiphore h;
    // YOUR CODE HERE (if necessary)

    // Initialize the condition variable.
    condition_variable() {
        // YOUR CODE HERE (if necessary)
    }

    // Wake up one thread waiting on the condition variable (if any are waiting).
    void notify_one() {
        // YOUR CODE HERE
    }

    // Block until the condition variable is signaled. The mutex argument `m` must be locked by
    // the current thread; it is unlocked before the wait begins and re-locked after the wait ends.
    // There must be no sleep-wakeup race conditions: if thread 1 has `m` locked and executes
    // `cv.wait(m)`, no other thread is waiting on `cv`, and thread 2 executes `m.lock();
    // cv.notify_one(); m.unlock()`, then thread 1 will always receive the signal (i.e., wake up).
    void wait(mutex& m) {
        // A true C++ condition_variable would take a `std::unique_lock`—we elide that here.
        // YOUR CODE HERE
    }
};

Hints. For full credit:

QUESTION SYNCH-2C. Use C++ standard mutexes and condition variables to implement hemiphores. Fill out the code below; see the hints after the question.

struct hemiphore {
    std::mutex m;
    std::condition_variable_any cv;
    int value = 0;
    // YOUR CODE HERE (if needed)

    // Initialize the hemiphore.
    hemiphore() {
        // YOUR CODE HERE (if needed)
    }

    // Block until the hemiphore has `this->value >= bound`, then ATOMICALLY
    // increment its value by `delta`.
    void H(int bound, int delta) {
        // YOUR CODE HERE
    }
};

QUESTION SYNCH-2D. Consider the following two threads, which use a shared hemiphore h with initial value 0.

Thread 1                      Thread 2

h.H(1000, 1);                 while (1) {
printf("Thread 1 done\n");        h.H(0, 1);
                                  h.H(0, -1);
                              }

Thread 2 will never block, and the hemiphore’s value will alternate between 1 and 0. Thread 1 will never reach the printf, because the hemiphore’s value never reaches 1000. However, in most people’s first implementation of hemiphores using standard mutexes and condition variables, Thread 1 will not block. Every call to h.H in Thread 2 will effectively wake up Thread 1. Though Thread 1 will then check the hemiphore’s value and immediately go back to sleep, doing so wastes CPU time.

Design an implementation of hemiphores using pthread mutexes and condition variables that solves this problem. In your revised implementation, Thread 1 above should block forever. For full credit, write C++ code (without worrying too much about C++ syntax). For partial credit, write pseudocode or English describing your design.

Hint. One working implementation uses a vector of “waiter” objects, where each waiter object is on a different thread’s stack, as initially sketched below. You can use such objects or not as you please.

struct hemiphore_waiter {
    // YOUR CODE HERE (if necessary)
};

struct hemiphore {
    std::mutex m;
    int value = 0;
    std::vector<hemiphore_waiter*> waiters;
    // YOUR CODE HERE (if necessary)

    hemiphore() {
        // YOUR CODE HERE (if necessary)
    }

    void H(int bound, int delta) {
        hemiphore_waiter hw;
        // YOUR CODE HERE
    }
};

SYNCH-3. Pipes and synchronization

In the following questions, you will implement a mutex using a pipe, and a limited type of pipe using a mutex.

QUESTION SYNCH-3A. In this question, you are to implement mutex functionality using a pipe. Fill in the definitions of the mutex operations. You may assume that no errors occur.

struct pipe_mutex {
    int fd[2];
    // YOUR CODE HERE (if necessary)

    pipe_mutex() {
        int r = pipe(this->fd);
        assert(r == 0);
        // YOUR CODE HERE (if necessary)
    }

    void lock() {
        // YOUR CODE HERE
    }

    void unlock() {
        // YOUR CODE HERE
    }
};

In the next questions, you will help implement pipe functionality using an in-memory buffer and a mutex. This “mutex pipe” will only work between threads of the same process (in contrast to a regular pipe, which also works between processes). An initial implementation of mutex pipes is as follows; you will note that it contains no mutexes.

        struct mutex_pipe {
/* 1*/      char bbuf_[BUFSIZ];
/* 2*/      size_t bpos_;
/* 3*/      size_t blen_;

            mutex_pipe() {
/* 4*/          this->bpos_ = this->blen_ = 0;
/* 5*/          memset(this->bbuf_, 0, BUFSIZ);
            }

            // Read up to `sz` bytes from this mutex_pipe into `buf` and return the number of bytes
            // read. If no bytes are available, wait until at least one byte can be read.
            ssize_t read(char* buf, size_t sz) {
/* 6*/           size_t pos = 0;
/* 7*/           while (pos < sz && (pos == 0 || this->blen_ != 0)) {
/* 8*/               if (this->blen_ != 0) {
/* 9*/                   buf[pos] = this->bbuf_[this->bpos_];
/*10*/                   ++this->bpos_;
/*11*/                   this->bpos_ = this->bpos_ % BUFSIZ;
/*12*/                   --this->blen_;
/*13*/                   ++pos;
/*14*/               }
/*15*/           }
/*16*/           return pos;
            }

            // Write up to `sz` bytes from `buf` into this mutex_pipe and return the number of bytes
            // written. If no space is available, wait until at least one byte can be written.
            ssize_t write(const char* buf, size_t sz) {
/*17*/          size_t pos = 0;
/*18*/          while (pos < sz && (pos == 0 || this->blen_ < BUFSIZ)) {
/*19*/              if (this->blen_ != BUFSIZ) {
/*20*/                  size_t bindex = this->bpos_ + this->blen_;
/*21*/                  bindex = bindex % BUFSIZ;
/*22*/                  this->bbuf_[bindex] = buf[pos];
/*23*/                  ++this->blen_;
/*24*/                  ++pos;
/*25*/              }
/*26*/          }
/*27*/          return pos;
            }
        };

QUESTION SYNCH-3B. What’s another name for this data structure?

It would be wise to work through an example. For example, assume BUFSIZ == 4, and figure out how the following calls would behave.

mutex_pipe mp;
mp.write("Hi", 2);
mp.read(buf, 4);
mp.write("Test", 4);
mp.read(buf, 3);

First let’s reason about this code in the absence of threads.

QUESTION SYNCH-3C. Which of the following changes could, if made in isolation, result in undefined behavior when a mutex pipe was used? Circle all that apply.

  1. Removing line 4
  2. Removing line 5
  3. Removing “|| this->blen_ < BUFSIZ” from line 18
  4. Removing line 21
  5. Removing lines 23 and 24

QUESTION SYNCH-3D. Which of the following changes could, if made in isolation, cause a mutex_pipe::read to return incorrect data (that is, the byte sequence produced by read will not equal the byte sequence passed to write)? Circle all that apply.

  1. Removing line 4
  2. Removing line 5
  3. Removing “|| this->blen_ < BUFSIZ” from line 18
  4. Removing line 21
  5. Removing lines 23 and 24

QUESTION SYNCH-3E. Which of the following changes could, if made in isolation, cause a call to mutex_pipe::write to never return (when a correct implementation would return)? Circle all that apply.

  1. Removing line 4
  2. Removing line 5
  3. Removing “|| this->blen_ < BUFSIZ” from line 18
  4. Removing line 21
  5. Removing lines 23 and 24

QUESTION SYNCH-3F. Write an invariant for a mutex_pipe’s blen_ member. An invariant is a statement about the value of blen_ that is always true. Write your invariant in the form of an assertion; for full credit give the most specific true invariant you can. (“blen_ is a unsigned integer” is unspecific, but true; “blen_ == 4” is specific, but false.)

QUESTION SYNCH-3G. Write an invariant for bpos_. For full credit give the most specific true invariant you can.


In the remaining questions, you will add synchronization objects and operations to make your mutex pipe work in a multithreaded program.

QUESTION SYNCH-3H. Add a std::mutex to the mutex_pipe and use it to protect the mutex pipe from race condition bugs. For full credit, your solution must not deadlock—if one thread is reading from a pipe and another thread is writing to the pipe, then both threads must eventually make progress. Describe all changes required, with reference to specific line numbers.

QUESTION SYNCH-3I. Your solution to the last question likely has poor utilization. For instance, a thread that calls mutex_pipe on an empty mutex pipe will spin forever, rather than block. Introduce one or more condition variables so that read will block until data is available. Write one or more snippets of C code and give line numbers after which the snippets should appear.

SYNCH-4. Race conditions

Most operating systems support process priority levels, where the kernel runs higher-priority processes more frequently than lower-priority processes. A hypothetical Unix-like operating system called “Boonix” has two priority levels, normal and batch. A Boonix parent process changes the priority level of one of its children with this system call:

Note that a process cannot change its own batch status.

You’re writing a Boonix shell that can run commands with batch priority. If c->isbatch is nonzero, then c should run with batch priority, as should its children. Your command::make_child function looks like this:

        void command::make_child() {
/* 1*/      ... // maybe create a pipe
/* 2*/      this->pid = fork();
/* 3*/      if (this->pid == 0) {
/* 4*/          ... // handle pipes and redirections
/* 5*/          (void) execvp(...);
/* 6*/          perror("execvp");
/* 7*/          _exit(EXIT_FAILURE);
/* 8*/      }
/* 9*/      assert(this->pid > 0);
/*10*/      if (this->isbatch) {
/*11*/          setbatch(this->pid);
/*12*/      }
/*13*/      ... // clean up pipes and such
        }

This shell has two race conditions, one more serious.

QUESTION SYNCH-4A. In some cases, a child command will change to batch priority after it starts running. Briefly describe how this can occur.

QUESTION SYNCH-4B. In some cases, a child command, or one of its own forked children, could run forever with normal priority. Briefly describe how this can occur.


In the remaining questions, you will fix these race conditions in three different ways. The first uses a new system call:

QUESTION SYNCH-4C. Use isbatch to prevent both race conditions. Write a snippet of C code and give the line number after which it should appear. You should need one code snippet.

QUESTION SYNCH-4D. Use the pipe system call and friends to prevent both race conditions. Write snippets of C code and give the line numbers after which they should appear. You should need several snippets. Make sure you clean up any extraneous file descriptors before running the command or returning from command::make_child.

QUESTION SYNCH-4E. Why should the pipe solution be preferred to the isbatch solution? A sentence, or the right single word, will suffice.

QUESTION SYNCH-4F. Suggest a change to the setbatch system call’s behavior that could fix both race conditions, and say how to use this new setbatch in command::make_child. Write one or more snippets of C code and give the line numbers after which they should appear.

SYNCH-5. Minimal minimal minimal synchronization synchronization synchronization

Minimalist composer Philip Glass, who prefers everything minimal, proposes the following implementation of condition variables based on mutexes. He’s only implementing wait and notify_one at first.

struct pg_condition_variable {
    std::mutex cvm;

    pg_condition_variable() {
        // start the mutex in LOCKED state!
        this->cvm.lock();
    }

    void wait(std::mutex& m) {
        m.unlock();
        this->cvm.lock();   // will block until a thread calls `notify_one`
        m.lock();
    }

    void notify_one() {
        this->cvm.unlock();
    }
};

Philip wants to use his condition variables to build a bank, where accounts support these operations:

Here’s Philip’s code.

      struct pg_acct {
          unsigned long balance;
          std::mutex m;
          pg_condition_variable cv;

          void deposit(unsigned amt) {
/*D1*/        this->m.lock();
/*D2*/        this->balance += amt;
/*D3*/        this->cv.notify_one();
/*D4*/        this->m.unlock();
          }

          void withdraw(unsigned amt) {
/*W1*/        this->m.lock();
/*W2*/        while (this->balance < amt) {
/*W3*/            this->cv.wait(this->m);
/*W4*/        }
/*W5*/        this->balance -= amt;
/*W6*/        this->m.unlock();
          }
      };

Philip’s friend Pauline Oliveros just shakes her head. “You got serious problems,” she says, pointing at this section of the C++ standard:

The expression m.unlock() shall…have the following semantics:

Requires: The calling thread shall own the mutex.

This means that the when m.unlock() is called, the calling thread must have previously locked the mutex, with no intervening unlocks. The penalty for deviance is undefined behavior.

QUESTION SYNCH-5A. Briefly explain how Philip’s code can trigger undefined behavior.


To fix this problem, Philip changes his condition variable and account to use a new type, fast_mutex, instead of std::mutex. This type is inspired by Linux’s “fast” mutexes. It’s OK to unlock a fast_mutex more than once, and it’s OK to unlock a fast_mutex on a different thread than the thread that locked it.

A fast_mutex has one important member, value, which can be 0 (unlocked) or 1 (locked).

Below, we've begun to write out an execution where Philip’s code is called by two threads. We write the line numbers each thread executes and the values in a after each line. We’ve left lines blank for you to fill in if you need to.

T1 T2 a.balance a.m.value a.cv.cvm.value
Initial state: 5 0 1
a.deposit(10) a.withdraw(12)
after W1 5 1 1
after D1 5 1 1
(T1 blocks on a.m)
 
 
 
 
 
 

QUESTION SYNCH-5B. Assuming T2’s call to withdraw eventually completes, what are the final values for a.balance, a.m.value, and a.cv.cvm.value?

QUESTION SYNCH-5C. In such an execution, which line of code (W1–5) unblocks Thread T1?

QUESTION SYNCH-5D. In such an execution, which, if any, line(s) of code (D1–4 and/or W1–5) set a->cv.cvm.value to zero?


QUESTION SYNCH-5E. For any collection of deposit and withdraw calls, Philip’s code will always ensure that the balance is valid. (There are other problems—a withdraw call might block forever when it shouldn’t—but the balance will be OK.) Why? List all that apply.

  1. Access to balance is protected by a condition variable.
  2. Access to balance is protected by a mutex.
  3. Arithmetic expressions like this->balance += amt; have atomic effect.

SYNCH-6. Weensy threads

This is problem is difficult!!

Betsy Ross is changing her WeensyOS to support threads. There are many ways to implement threads, but Betsy wants to implement threads using the ptable array. “After all,” she says, “a thread is just like a process, except it shares the address space of some other process!”

Betsy has defined a new system call, sys_create_thread, that starts a new thread running a given thread function, with a given argument, and a given initial stack pointer:

typedef void* (*thread_function)(void*);
pid_t sys_create_thread(thread_function f, void* arg, void* stack_ptr);

The system call’s return value is the ID of the new thread, which Betsy thinks should use the process ID space.

Betsy’s kernel contains the following code:

// in syscall()
case SYSCALL_FORK:
    return handle_fork(current);

case SYSCALL_CREATE_THREAD:
    return handle_create_thread(current);


uint64_t handle_fork(proc* p) {
    // Find a free process; return `nullptr` if all out
    proc* np = find_free_process();
    if (!np) {
        return -1;
    }

    // Copy the input page table and allocate new pages using `vmiter`
    np->pagetable = copy_pagetable(p->pagetable);
    if (!np->pagetable) {
        return -1;
    }

    // Finish up
    np->regs = p->regs;
    np->regs.reg_rax = 0;
    np->state = P_RUNNABLE;
    return np->pid;
}

uint64_t handle_create_thread(proc* p) {
    // Whoops! Got a revolution to run, back later
    return -1;
}

QUESTION SYNCH-6A. Complete her handle_create_thread implementation. Assume for now that the thread function never exits, and don’t worry about reference counting issues (for page tables, for instance). You may use the helper functions shown above, including find_free_process and copy_pagetable, if you need them; or you may use any functions or objects from the WeensyOS handout code, including vmiter and kalloc.

Recall that system call arguments are passed according to the x86-64 calling convention: first argument in %rdi, second in %rsi, third in %rdx, etc.

QUESTION SYNCH-6B. Betsy’s friend Prince Dimitri Galitzin thinks Betsy should give processes even more flexibility. He suggests that the sys_create_thread system call should take a full set of registers (as x86_64_registers*), rather than just a new instruction pointer and a new stack pointer. That way, the creating thread can supply all registers to the new thread. But Betsy points out that this design would allow a thread to violate kernel isolation by providing carefully-planned register values for x86_64_registers.

Which x86-64 registers could be used in Dimitri’s design to violate kernel isolation? List all that apply.

  1. reg_rax, which contains the thread’s %rax register.
  2. reg_rip, which contains the thread’s instruction pointer.
  3. reg_cs, which contains the thread’s privilege level, which is 3 for unprivileged.
  4. reg_rflags, which contains the EFLAGS_IF flag, which indicates that the thread runs with interrupts enabled.
  5. reg_rsp, which contains the thread’s stack pointer.

Now Betsy wants to handle thread exit. She introduces two new system calls, sys_exit_thread and sys_join_thread:

void sys_exit_thread(void* exit_value);
void* sys_join_thread(pid_t thread);

sys_exit_thread causes the calling thread to exit with the given exit value. sys_join_thread behaves like pthread_join or waitpid. If thread corresponds is a thread of the same process, and thread has exited, sys_join_thread cleans up the thread and returns its exit value; otherwise, sys_join_thread returns (void*) -1.

(The exit_value feature differs from C++ threads, which don’t have exit values.)

QUESTION SYNCH-6C. Is the sys_join_thread specification blocking or polling?

Betsy makes the following changes to WeensyOS internal structures to support thread exit.

  1. She adds a void* p_exit_value member to struct proc.
  2. She adds a new process state, P_EXITED, that corresponds to exited threads.

QUESTION SYNCH-6D. Complete the case for SYSCALL_EXIT_THREAD. (Don’t worry about the last thread in a process; you may assume it always calls sys_exit rather than sys_exit_thread.)

case SYSCALL_EXIT_THREAD:

QUESTION SYNCH-6E. Complete the following helper function.

// Test whether `test_pid` is the PID of a thread in the same process as `p`.
// Return 1 if it is; return 0 if `test_pid` is an illegal PID, it corresponds to
// a freed process, or it corresponds to a thread in a different process.
int is_thread_in(pid_t test_pid, proc* p) {

QUESTION SYNCH-6F. Complete the case for SYSCALL_JOIN_THREAD in syscall(). Remember that a thread may be successfully joined at most once: after it is joined, its PID is made available for reallocation.

case SYSCALL_JOIN_THREAD:

QUESTION SYNCH-6G. Advanced extra credit. In Weensy threads, if a thread returns from its thread function, it will execute random code, depending on what random garbage was stored in its initial stack in the return address position. But Betsy thinks she can implement better behavior entirely at user level, where the value returned from the thread function will automatically be passed to sys_thread_exit. She wants to make two changes:

  1. She’ll write a two- or three-instruction function called thread_exit_vector.
  2. Her create_thread library function will write a single 8-byte value to the thread’s new stack before calling sys_create_thread.

Explain how this will work. What instructions will thread_exit_vector contain? What 8-byte value will create_thread write to the thread’s new stack? And where will that value be written relative to sys_create_thread’s stack_ptr argument?

SYNCH-7. Fair synchronization

C++ standard mutexes are unfair: some threads might succeed in locking the mutex more often than others. For example, a simple experiment on Linux shows that if threads repeatedly try to lock the same mutex, some threads lock the mutex 1.13x more often than others. (Other kinds of lock are even less fair: some threads can lock a spinlock 3.91x more often than others!)

QUESTION SYNCH-7A. What is the name of the problem that would occur if one particular thread never locked the mutex, even though other threads locked and unlocked the mutex infinitely often?

To avoid unfairness, threads must take turns. One fair mutex implementation is called a ticket lock; this (incorrect, unsynchronized) code shows the basic idea.

struct ticket_mutex {
    unsigned now = 0;   // “now serving”
    unsigned next = 0;  // “next ticket”

    void lock() {
        unsigned t = this->next;    // mark this thread’s place in line
        ++this->next;               // next thread gets new place
        while (this->now != t) {    // wait for my turn
        }
    }

    void unlock() {
        ++this->now;
    }
};

QUESTION SYNCH-7B. Describe an instance of undefined behavior that could occur if multiple threads called ticket_mutex::lock on the same ticket mutex at the same time.

QUESTION SYNCH-7C. Fix lock and unlock using C++ atomics. Alternately, for partial credit, say which regions of code must be executed atomically.


The ticket lock implementation above uses polling. That will perform well if critical sections are short, but blocking is preferable if critical sections are long. Here’s a different ticket lock implementation:

      struct ticket_mutex {
/*T1*/    unsigned now;
/*T2*/    unsigned next;
/*T3*/    std::mutex m;

          void lock() {
/*L1*/        this->m.lock();
/*L2*/        unsigned t = this->next++;
/*L3*/        while (this->now != t) {
/*L4*/            this->m.unlock();
/*L5*/            sched_yield();
/*L6*/            this->m.lock();
/*L7*/        }
/*L8*/        this->m.unlock();
          }

          void unlock() {
/*U1*/        this->m.lock();
/*U2*/        ++this->now;
/*U3*/        this->m.unlock();
          }
      };

This ticket lock implementation uses std::mutex, which blocks, but the implementation itself uses polling.

QUESTION SYNCH-7D. Which line or lines of code mark this implementation as using polling?

QUESTION SYNCH-7E. Change the implementation to truly block. Include line numbers indicating where your code will go.

QUESTION SYNCH-7F. Most solutions to part E wake up blocked threads more than is strictly necessary. The ideal number of blocking calls is one: each thread should block at most once and wake up only when its turn comes. But the simplest correct solution will wake up each blocked thread a number of times proportional to the number of blocked threads.

Change your solution so that when there are 4 or fewer threads, every thread wakes up only when its turn comes. (Your solution must, of course, work correctly for any number of threads.) If your solution already works this way, you need not do anything here.

SYNCH-8. Futex

In class, we implemented a mutual-exclusion lock using atomics like this:

       class mutex {
/* P1*/    std::atomic<int> value_ = 0;
/* P2*/    void lock() {
/* P3*/        while (value_.swap(1) == 1) {
/* P4*/        }
/* P5*/    }
/* P6*/    void unlock() {
/* P7*/        value_.store(0);
/* P8*/    }
       };

Recall that atomic<T>::swap(T x) atomically swaps this atomic’s value with x and returns the old value.

QUESTION SYNCH-8A. Which line of code marks this as a polling mutex, or spinlock, rather than a blocking mutex? Give the best single-line answer.

QUESTION SYNCH-8B. Which of the following statements are true, assuming that the surrounding program uses mutex correctly? List all that apply.

  1. The mutex is locked if the atomic variable value_ holds value 1.
  2. It is OK to replace the loop condition in line P3 with “value_.load() == 1”.
  3. It is OK to replace the loop condition in line P3 with “value_.swap(1) >= 0”.
  4. It is OK to replace line P7 with “value_.swap(0);”.
  5. None of the above.

In the rest of this part, you will build a blocking mutex (like the real std::mutex): a call to lock should block (without using CPU) until the mutex is unlocked. This requires operating system support.

QUESTION SYNCH-8C. Gō Mifune proposes to add the following system calls:

He uses them as follows:

       class mutex {
/* X1*/    std::atomic<int> value_ = 0;
/* X2*/    void lock() {
/* X3*/        while (value_.swap(1) == 1) {
/* X4*/            xblock();
/* X5*/        }
/* X6*/    }
/* X7*/    void unlock() {
/* X8*/        value_.store(0);
/* X9*/        xwake();
/*X10*/    }
       };

But this implementation suffers from a sleep–wakeup race condition. Describe how this can occur with two threads by listing lines of code executed by the threads, starting from the following and ending with T2 blocked forever and the mutex unlocked.

Initially T1 has locked the mutex.
T2: X2
T1: X7

QUESTION SYNCH-8D. Julia Evans suggests instead using system calls inspired by Linux’s futex.

She uses them as follows:

       class mutex {
/* F1*/    std::atomic<int> value_ = 0;
/* F2*/    void lock() {
/* F3*/        while (value_.swap(1) == 1) {
/* F4*/            futex_wait(value_, 1);
/* F5*/        }
/* F6*/    }
/* F7*/    void unlock() {
/* F8*/        value_.store(0);
/* F9*/        futex_wake(value_);
/*F10*/    }
       };

In what ways does Julia’s solution improve over Gō’s implementation? List all that apply.

  1. It properly ensures mutual exclusion for each mutex (and Gō’s does not).
  2. It is free of sleep–wakeup race conditions.
  3. The futex_wait() call on line F4 can suffer from fewer mistaken wakeups than the xblock() call on line X4. (A “mistaken wakeup” occurs when a thread wakes up only to immediately block again.)
  4. In the best case, a lock/unlock cycle requires zero system calls.
  5. None of the above.

SYNCH-9. System call implementation

QUESTION SYNCH-9A. Which registers hold (1) the system call number, (2) the first argument, and (3) the return value for WeensyOS system calls? (Refer to kernel.cc or process.hh if unsure.)


Julia Evans decides to implement the futex-like system calls from the previous problem on her thread-enabled version of WeensyOS. Again:

Note that the x arguments are passed as if they were pointers.

QUESTION SYNCH-9B. List all that apply.

  1. The kernel should validate that the x arguments are multiples of 4.
  2. The kernel should validate that the x arguments are valid addresses that point to user-readable memory.
  3. The kernel should validate that the expected argument is a multiple of 4.
  4. The kernel should validate that the expected argument is a valid address that points to user-readable memory.
  5. None of the above.

QUESTION SYNCH-9C. Here’s a badly-broken initial implementation of the futex_wait system call in WeensyOS.

       case SYSCALL_FUTEX_WAIT: {
/* W1*/    uintptr_t x = current->regs.{some register};
/* W2*/    uintptr_t expected = current->regs.{some register};
/* W3*/    ... return -1 if x and/or expected are invalid ...
/* W4*/
/* W5*/    std::atomic<int>* x_ptr = (std::atomic<int>*) x;
/* W6*/    if (x_ptr->load() != expected) {
/* W7*/        return -1;
/* W8*/    } else {
/* W9*/        while (true) {
/*W10*/        }
/*W11*/        return 0;
/*W12*/    }
       }

What problems does this code have? List all that apply.

  1. std::atomic methods are implemented using system calls, so std::atomic cannot be used in the kernel.
  2. The x_ptr address computed on line W5 is a physical address, so it cannot be dereferenced.
  3. The x_ptr address computed on line W5 came from a different address space, so dereferencing it may access the wrong memory.
  4. The return statements on lines W7 and W11 return incorrect values.
  5. If lines W9–10 ever execute, the kernel will hang and make no further progress.
  6. None of the above.

Here’s a correct implementation of the futex_wake system call in WeensyOS.

       struct proc { ...
           uintptr_t waitaddr;
       }; ...

       case SYSCALL_FUTEX_WAKE: {
/* K1*/    uintptr_t x = current->regs.{some register};
/* K2*/    ... return -1 if x is invalid ...
/* K3*/
/* K4*/    uintptr_t wakeaddr = vmiter(current, x).pa();
/* K5*/    int n = 0;
/* K6*/    for (pid_t pid = 1; pid != NPROC; ++pid) {
/* K7*/        if (procs[pid].waitaddr == wakeaddr && procs[pid].state == P_BLOCKED) {
/* K8*/            procs[pid].state = P_RUNNABLE;
/* K9*/            procs[pid].waitaddr = 0;
/*K10*/            procs[pid].regs.reg_rax = 0;
/*K11*/            ++n;
/*K12*/        }
/*K13*/    }
/*K14*/    return n;
       }

QUESTION SYNCH-9D. Does proc::waitaddr contain physical or virtual addresses?

QUESTION SYNCH-9E. Which line or lines of code unblock waiting processes?

QUESTION SYNCH-9F. Which line or lines of code determine system call return values?

QUESTION SYNCH-9G. Complete this implementation of futex_wait corresponding to this implementation of futex_wake. You will use P_BLOCKED (a process state for blocked processes) and schedule() (a function that runs some runnable process).

case SYSCALL_FUTEX_WAIT: {
    uintptr_t x = current->regs.{some register};
    uintptr_t expected = current->regs.{some register};
    ... return -1 if x and/or expected is invalid ...

SYNCH-10. Barrier synchronization

This question concerns a synchronization object called a barrier. Barriers are useful for phased computations. N threads work in parallel; as each thread completes its work, it “arrives” at the barrier, then blocks. When the Nth thread arrives, all N threads are released and continue on to the next phase.

Barriers have two operations:

QUESTION SYNCH-10A. A one-use barrier can only be used once—after the N threads arrive, the barrier object cannot be used again (any further calls to arrive_and_wait cause undefined behavior). Use atomic variables to implement one_use_barrier::arrive_and_wait(). Your function should poll (it need not block).

struct one_use_barrier {
    size_t n_;
    std::atomic<size_t> k_ = 0;

    one_use_barrier(size_t n) { this->n_ = n; }

    void arrive_and_wait() {

QUESTION SYNCH-10B. Most solutions allow multiple threads to simultaneously access one_use_barrier::n_, which is not an atomic variable. Why is this OK?

QUESTION SYNCH-10C. What are the problems with polling synchronization operations? List all that apply.

  1. Polling violates mutual exclusion.
  2. Polling causes high CPU usage, leaving less CPU time available for other work.
  3. Polling causes deadlock.
  4. None of the above.

QUESTION SYNCH-10D. Complete the following blocking one-use barrier.

struct blocking_one_use_barrier {
    size_t n_;
    size_t k_ = 0;
    std::mutex m_;
    std::condition_variable_any cv_;

    blocking_one_use_barrier(size_t n) { this->n_ = n; }

    void arrive_and_wait() {

QUESTION SYNCH-10E. Multi-use barriers can be used multiple times: once all N threads unblock, the barrier can be used again by the same N threads. This is surprisingly tricky to get right, but a common solution involves a boolean sense variable. Complete this multi-use barrier using atomics and polling.

struct multi_use_barrier {
    size_t n_;
    std::atomic<size_t> k_ = 0;
    std::atomic<bool> sense_ = false;

    multi_use_barrier(size_t n) { this->n_ = n; }

    void arrive_and_wait() {
        bool my_sense = this->sense_.load();

SYNCH-11. Pool synchronization

QUESTION SYNCH-11A. List the true statements.

  1. When a process exits, all of its child processes also exit.
  2. When a thread completes, all other threads in the process also complete.
  3. When a process exits, all of its threads exit whether or not they are complete.
  4. When a process exits via exit, all standard I/O buffers are automatically flushed.
  5. When a process exits via _exit, all standard I/O buffers are automatically flushed.
  6. None of the above.

QUESTION SYNCH-11B. Which of the following functions and system calls can block? List all that apply.

  1. fork
  2. getpid
  3. read
  4. waitpid with the WNOHANG flag
  5. std::thread constructor
  6. std::thread::join
  7. std::thread::detach
  8. std::mutex::lock
  9. std::mutex::unlock
  10. std::condition_variable::wait
  11. std::condition_variable::notify_all
  12. std::atomic increment
  13. None of the above

QUESTION SYNCH-11C. Which system call has behavior closest to that of std::thread``::join? Select one.

  1. fork
  2. read
  3. getpid
  4. kill
  5. waitpid with the WNOHANG flag
  6. waitpid without the WNOHANG flag

The next three questions concern this program.

std::atomic<int> x = 0; // global variable

void thread1() {
    x = 1;
}

void thread2() {
    x = 2;
}

int main() {
    std::thread t1(thread1);   /* 1 */
    std::thread t2(thread2);   /* 2 */
    t1.join();                 /* 3 */
    t2.join();                 /* 4 */
    printf("%d\n", x.load());
}

QUESTION SYNCH-11D. Does this program invoke undefined behavior, yes or no?

QUESTION SYNCH-11E. What are all the possible values the program can print (on those runs where it does not invoke undefined behavior)?

QUESTION SYNCH-11F. Order the numbered lines so that the program would only ever print 2. (Your answer should be the numbers 1–4 in some order.)

QUESTION SYNCH-11G. This problem models water polo players as threads. Each thread runs the player function and represents one of the players on a team. Each player thread:

  1. Waits until there are fewer than 5 players in the pool, then gets in the pool.
  2. Plays the game until a substitution is called.
  3. Gets out of the pool and goes back to step 1.

Your task is to complete this code. There must never be more than 5 players in the pool, but up to 5 players should be able to play simultaneously. There must be no undefined behavior or deadlock, and for full credit, players on the sidelines should block rather than poll. (No polling in the pool.)

struct pool {
    // (1) Your code here
}

void player(pool* p) {
    while (true) {
        // Block until there are fewer than 5 players in the pool, then get in.
        // (2) Your code here

        // Play.
        while (!take_substitution()) {
            play();
        }

        // Get out of the pool.
        // (3) Your code here
    }
}

SYNCH-12. Promises

Promises are objects that help manage asynchronous and parallel computations, making them easier to program. In this section, you’ll implement a promise system specialized for stringfuncs, which are functions that take a single std::string argument and return a std::string as result.

QUESTION SYNCH-12A. First, complete the run_async function using async_helper.

typedef std::string (*stringfunc)(std::string);

static void async_helper(stringfunc f, std::string arg, std::string* result) {
    *result = f(arg);
}

// run_async(f, arg)
//    Run `f(arg)` on a separate thread. Block this thread until `f(arg)` completes,
//    then return the result of `f(arg)`.
std::string run_async(stringfunc f, std::string arg) {
    // Your code here. You will use `async_helper` and `std::thread`.
}

QUESTION SYNCH-12B. Promises behave like run_async, but the stringfunc’s return value is cached. For instance:

promise61* p = make_promise(encrypt, input); // creates a new thread running `encrypt(input)`
std::string s = p->get();        // waits for that thread to complete, then returns the result of `encrypt(input)`
std::string s2 = p->get();       // returns the same cached result
assert(s == s2);                 // will always succeed

Complete the following promise system. (Don’t worry about memory leaks.)

promise61* make_promise(stringfunc f, std::string arg) {
    promise61* p = new promise61;
    // (1) Your code here
    return p;
}

struct promise61 {
    std::thread t_;
    std::string result_;

    std::string get() {
        this->resolve();
        return this->result_;
    }

    void resolve() {
        if (this->t_.joinable()) {
            // (2) Your code here
        }
    }
};

Hints:

QUESTION SYNCH-12C. Promises can be chained using a then method. This automatically starts a new computation when a previous promise completes. For example:

promise61* result_p = make_promise(encrypt, input)->then(encode);
std::string result = result_p->get();
  1. First, make_promise(encrypt, input) will start a thread to compute encrypt(input).
  2. When that thread completes, producing result1 = encrypt(input), a new thread is started to compute encode(result1). This is the chained promise: then returns a new promise that starts after the previous promise resolves, and uses the previous promise’s result as its argument.
  3. Thus, the final result of result_p->get() will equal the composed computation encode(encrypt(input)).

This extension of Part C implements chained promises. Complete it.

promise61* make_promise(stringfunc f, std::string arg) {
    promise61* p = new promise61;
    // (1) Your code here (likely the same as in part B)
    return p;
}

struct promise61 {
    std::thread t_;
    std::string result_;
    promise61* prev_ = nullptr;  // only used for chained promises
    stringfunc f_ = nullptr;     // only used for chained promises

    promise61* then(stringfunc f) {
        promise61* next = new promise61;
        next->prev_ = this;
        next->f_ = f;
        return next;
    }

    std::string get() {
        this->resolve();
        return this->result_;
    }

    void resolve() {
        // (2) Your code here (likely different from part B)
    }
}

QUESTION SYNCH-12D (EXTRA CREDIT). Great implementations of promises execute chains as soon as possible. For example:

promise61* p = make_promise(encrypt, input)->then(encode)->then(add_signature);
expensive_computation();         // all 3 promise threads might complete while this runs!
std::string result = p->get();   // might return cached value right away!

But your Part C implementation only starts chained work when get() is called. (In the above example, only encrypt(input) will run in parallel with expensive_computation(); the promises for encode and add_signature will not even start running until p->get() is called.)

Update your implementation so that each promise in a chain begins execution automatically when the previous promise completes. This is quite difficult and you should leave it for last. Write pseudocode first, or describe in prose how a solution would work.