CS 61
  • Info
    Concepts Course Description Course Staff Extension Resources Schedule Setup: GitHub Setup: Docker Setup: VM Style C and C++ Patterns Git Diff GDB Introduction GDB Commands File Descriptors HTTP Test Information Test 1 Test 2 Final Midterm 2023 Final 2023 Midterm 2022 Final 2022 Exercises: Miscellaneous
  • Problem sets
    Problem set 0 Problem set 1 Problem set 2 Problem set 3 Problem set 4 Problem set 5 Problem set 5 EC: Interruption Problem set 6 Problem set 6 EC Data representation exercises Assembly exercises Kernel exercises Storage exercises Process control exercises Synchronization exercises
  • Lectures
    Textbook Data representation Data representation 1: Introduction Data representation 2: Sizes and layout Data representation 3: Layout and dynamic allocation Data representation 4: Dynamic allocation, invariants Assembly Kernel Kernel 1: Processes, isolation, robustness Kernel 2: Virtual memory Kernel 3: Virtual memory and page tables Storage Storage 1: OS input/output, memory hierarchy Storage 2: Cache optimizations and coherence Storage 3: Eviction policies and coherence Process control Processes 1: Basics Processes 2: Inter-process communication Processes 3: Interruption and race conditions EthiCS: Harms and Unicode EthiCS: UTF-8 Synchronization Synchronization 1: Threads and atomicity Synchronization 2: Critical sections, serializability, lock granularity Synchronization 5: Databases Synchronization 6: Databases and Experiences
  • Sections
    Times & locations Datarep Section 1: C++ data structures Datarep Section 2: Memory bugs Assembly Section 1: Fun Datarep/Assembly Exercises Section Kernel Section 1: WeensyOS Kernel Section 2: Kernel exercises Kernel Section 3: Pipes Storage Section 1: Single-slot cache Process Section 1 and Storage Exercises Process Supplement: BNF grammars Process Section 2: Process control exercises Synchronization Section 1: Threads and synchronization objects Synchronization Section 2: Synchronization Exercises

2024 CS 61 Final

Show all solutions Hide all solutions

Notes

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated.

If you get stuck, move on to the next question. When unsure, briefly explain your thinking for potential partial credit.


1. Our governing body (15 points)

The Harvard Corporation has attempted to complete pset 6. Their implementation has a lot of problems, but they refuse to admit they’ve done anything wrong. Help prove that they have problems by attacking their implementation.

QUESTION 1A. Phase 1 of the problem set advises students to implement the io61_try_lock, io61_lock, and io61_unlock functions “using a single std::recursive_mutex per file.” The Harvard Corporation likes to go its own way, and instead implemented their own coarse-grained lock using atomics. Here are the relevant excerpts of their code:

struct io61_file { ...
    std::atomic<int> lock;
};

int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
    if (len == 0
        || f->lock.exchange(1) == 0) {
        return 0;
    }
    return -1;
}

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    while (io61_try_lock(f, off, len, locktype) != 0) {
    }
    return 0;
}

int io61_unlock(io61_file* f, off_t off, off_t len) {
    if (len != 0) {
        f->lock = 0;
    }
    return 0;
}

List the true statements.

  1. This implementation has undefined behavior caused by data races on the f->lock field.
  2. This implementation is blocking (rather than polling).
  3. This implementation correctly provides mutual exclusion (i.e., while one thread has acquired an exclusive range lock using io61_lock(f, off, len, LOCK_EX), no other thread can acquire an overlapping range lock on the same file).
  4. The f->lock field implements a recursive mutex.
  5. None of the above.

2pt 3 only.

  1. False: there are no data races on atomics.
  2. False: there are no blocking calls in the code (operations on atomics do not block).
  3. True!
  4. False: f->lock is only ever equal to 1 or 0; recursive mutexes must count the number of times a lock is recursively held.

QUESTION 1B. Phase 1 is tested using the ftxxfer program. When the Harvard Corporation runs ./ftxxfer on their code, they observe deterministic behavior—the exact same result every time, regardless of scheduling order, random number choice, or number of threads. What result to they see and why? Explain briefly.

2pt They see deadlock on every transfer attempt: line ftxxfer.cc:30 will wait forever. This is because the ftxxfer program locks two accounts. The second lock attempt at ftxxfer.cc:30 attempts to acquire f->lock, the exclusive non-recursive lock that was already acquired on line ftxxfer.cc:29. No transfers are successfully performed.

QUESTION 1C. The Corporation decides to forge ahead to Phase 2, fine-grained locking. They want to implement fine-grained locking with an array of atomic lock bytes, one lock byte per byte of file data.

struct io61_file { ...
    off_t size;                         // size of file
    std::atomic<unsigned char>* locks;  // array of `size` lock bytes
};

int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
    for (off_t x = off; x < off + len; ++x) {
        if (f->locks[x].exchange(1) == 1) {
            return -1;
        }
    }
    return 0;
}

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    /* ... same as above ... */
}

int io61_unlock(io61_file* f, off_t off, off_t len) {
    for (off_t x = off; x < off + len; ++x) {
        f->locks[x] = 0;
    }
}

List the true statements; assume that f->size and f->locks are initialized correctly.

  1. This implementation uses fine-grained locking.
  2. This implementation behaves correctly when len == 0.
  3. This implementation has the same behavior on ./ftxxfer as the previous implementation.
  4. An application can cause undefined behavior by passing unexpected arguments to io61_try_lock. (Expected arguments always obey off >= 0, off <= f->size, len >= 0, and off + len <= f->size.)
  5. None of the above.

3pt 1, 2, 4.

  1. True: Locking is fine grained because each byte of the file has its own lock.
  2. True: When len == 0, both io61_try_lock and io61_unlock always succeed without modifying memory, which is the expected behavior.
  3. False: No deadlock!
  4. True: If one passes off > f->size, the functions will access memory out of bounds of the f->locks array.

QUESTION 1D. The Corporation’s io61_try_lock can behave incorrectly even when passed expected arguments (off >= 0 && off <= f->size && len >= 0 && off + len <= f->size). Correct the code, or, for partial credit, describe what’s wrong with the Corporation’s version.

3pt If the attempt to acquire a lock fails, then io61_try_lock doesn’t clean up the partially-acquired lock. This leaves some bytes in the file locked inappropriately, even though io61_try_lock returned -1. This fixes it:

int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
    for (off_t x = off; x < off + len; ++x) {
        if (f->locks[x].exchange(1) == 1) {
            for (off_t y = off; y < x; ++y) {
                f->locks[y] = 0;
            }
            return -1;
        }
    }
    return 0;
}

QUESTION 1E. In some cases, even the unfixed io61_try_lock from Question 1C always behaves correctly. Write an assertion involving off, len, and f that defines those cases. For full credit your assertion should be as broad as possible.

1 ec pt

assert(off >= 0 && off <= f->size && len >= 0 && len <= 1 && off + len <= f->size);

The most important part of this assertion is that len <= 1: the code always works for 0- and 1-byte lock regions, but doesn’t always work for any larger lock regions.

QUESTION 1F. The Corporation moves on to phase 3, blocking. It’s gotten this far (not far):

std::mutex file_mutex;

struct io61_file { ...
    off_t size;                         // size of file
    std::atomic<unsigned char>* locks;  // array of `size` lock bytes
    std::condition_variable_any cv;
};

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    while (io61_try_lock(f, off, len, locktype) != 0) {
    }
    return 0;
}

void io61_unlock(io61_file* f, off_t off, off_t len) {
    for (off_t x = off; x < off + len; ++x) {
        f->locks[x] = 0;
    }
    f->cv.notify_all();
}

Change the Corporation’s io61_lock to correctly block. Do not add any synchronization objects. You may assume that all previous problems with io61_try_lock have been fixed.

3pt Here are two potential solutions, each using std::condition_variable_any::wait on the file_mutex. Note that the condition variable cannot wait on one of the f->locks: these are atomic integers, not full std::mutex locks.

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    file_mutex.lock();
    while (io61_try_lock(f, off, len, locktype) != 0) {
        f->cv.wait(file_mutex);
    }
    file_mutex.unlock();
    return 0;
}

Or

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    std::unique_lock guard(file_mutex);
    while (io61_try_lock(f, off, len, locktype) != 0) {
        f->cv.wait(guard);
    }
    return 0;
}

QUESTION 1G. Briefly describe one way in which the Corporation’s synchronization plan uses fine-grained locking, and one way in which it uses coarse-grained locking.

2pt The Corporation uses coarse-grained locking for file_mutex: all attempts to io61_lock synchronize on the same mutex, regardless of file. It uses fine-grained locking to protect individual byte ranges within each file.

2. Server synchronization (10 points)

A web server is a process that accepts connections from clients, reads data from those connections, performs computations on that data, and sends responses back on those connections. This isn’t unlike the weensy database we saw in the synchronization unit. A dirt-simple server might look like this:

int main() {
    while (true) {
        int fd = accept_incoming_request();  // accept a new connection, return its socket file descriptor
        handle_client(fd);                   // read data, write response
        close(fd);
    }
    return 0;
}

In this code, both accept_incoming_request() and handle_client() can block.

QUESTION 2A. How many concurrent connections can the dirt-simple server handle at a time?

1pt One! Each incoming request is processed to completion before the next connection is accepted.

QUESTION 2B. Change the main loop to handle each connection in a new child process. This is the multi-process model commonly seen in early web servers. We’ve started the code for you; you may assume that fork() always succeeds.

int main() {
    while (true) {
        int fd = accept_incoming_request();

        // YOUR CODE HERE, including at least one `fork()`

        // clean up zombies
        int status;
        while (waitpid(-1, &status, WNOHANG) > 0) {
        }
    }
}

2pt

pid_t p = fork();
if (p == 0) {
    handle_connection(fd);
    close(fd);  // Optional (`_exit` automatically closes fd)
    _exit(0);   // Or `return 0` (since we are in main)
}
close(fd);      // Required in parent!

The important things are to (1) close(fd) in the parent (otherwise the parent will leak file descriptors, and eventually new connections will fail), and (2) _exit in the child.

QUESTION 2C. Change the main loop to handle each connection in a separate thread. This is the multi-threaded model also often seen in early web servers. We’ve started the code for you.

Hint: Remember that in some situations it is illegal to for a std::thread object to go out of scope!

void handle_client_and_close(int fd) {
    handle_client(fd);
    close(fd);
}

int main() {
    while (true) {
        int fd = accept_incoming_request();
        // YOUR CODE HERE
    }
}

2pt

std::thread th(handle_client_and_close, fd);
th.detach();

Note that joining the thread with th.join() would return to the behavior in Question 1A, where the server can process one connection at a time.

QUESTION 2D. True or false? An advantage of the multi-process model is that bugs in handle_client will only crash the child process handling a single connection. In contrast, in the multi-threaded model, a bug in handle_client will often crash the whole server.

1pt True

QUESTION 2E. True or false? An advantage of the multi-threaded model is that different connections can easily share state. In contrast, if the weensy database used a simple multi-process model, then changes made by one client would be hidden from all other clients.

1pt True

QUESTION 2F. A simple multi-threaded server can create an unbounded number of threads, which is dangerous. In this version, we create a thread pool instead. The main thread creates 30 worker threads and passes incoming connections off to these threads; at most 30 connections can be handled concurrently.

int incoming_fd = -1;

void worker() {
    while (true) {
        // poll for connection, then claim it
        int fd = incoming_fd;
        while (fd < 0) {
            fd = incoming_fd;
        }
        incoming_fd = -1;

        // handle claimed connection
        handle_client(fd);
        close(fd);
    }
}

int main() {
    for (int i = 0; i != 30; ++i) {
        std::thread th(worker);
        th.detach();
    }

    while (true) {
        incoming_fd = accept_incoming_request();
        // wait until some worker claims the connection
        while (incoming_fd >= 0) {
        }
    }
}

Does this code violate the Fundamental Law of Synchronization (that is, does it have any undefined behavior due to data races)? Explain briefly.

1pt Yes. Multiple threads read and write the shared incoming_fd variable without synchronization.

QUESTION 2G. Describe how to add correct synchronization to this thread-pool server. List specific synchronization objects you’d add, and explain briefly how they should be used in worker and main.

2pt I’d add a std::mutex and std::condition_variable_any to protect incoming_fd. worker would lock the mutex before accessing incoming_fd, and cv.wait() to block until an incoming_fd was available. main would lock the mutex when accept_incoming_request(), then use cv.wait() to block until incoming_fd was claimed by some worker. Each kind of thread would notify the other.

All we asked for was an explanation; nevertheless, here’s some code.

int incoming_fd = -1;
std::mutex m;
std::condition_variable_any cv;

void worker() {
    while (true) {
        // wait for connection, then claim it
        std::unique_lock guard(m);
        int fd = incoming_fd;
        while (fd < 0) {
            cv.wait(guard);
            fd = incoming_fd;
        }
        incoming_fd = -1;
        cv.notify_all();
        guard.unlock();

        // handle connection
        handle_client(fd);
        close(fd);
    }
}

int main() {
    // create thread pool
    for (int i = 0; i != 30; ++i) {
        std::thread th(worker);
        th.detach();
    }

    while (true) {
        // accept new connection
        int new_fd = accept_incoming_request();

        // post new connection
        std::unique_lock guard(m);
        incoming_fd = new_fd;
        cv.notify_one(); // or notify_all()

        // wait until a worker has claimed the connection
        while (incoming_fd >= 0) {
            cv.wait(guard);
        }
    }
}

This will feature many spurious wakeups per connection, since the worker thread’s cv.notify_all() will wake up other waiting workers as well as the main thread. A solution with two condition variables can avoid this problem:

int incoming_fd = -1;
std::mutex m;
std::condition_variable_any worker_cv;
std::condition_variable_any main_cv;

void worker() {
    while (true) {
        // wait for connection, then claim it
        std::unique_lock guard(m);
        int fd = incoming_fd;
        while (fd < 0) {
            worker_cv.wait(guard);
            fd = incoming_fd;
        }
        incoming_fd = -1;
        main_cv.notify_all();
        guard.unlock();

        // handle connection
        handle_client(fd);
        close(fd);
    }
}

int main() {
    // create thread pool
    for (int i = 0; i != 30; ++i) {
        std::thread th(worker);
        th.detach();
    }

    while (true) {
        // accept new connection
        int new_fd = accept_incoming_request();

        // post new connection
        std::unique_lock guard(m);
        incoming_fd = new_fd;
        worker_cv.notify_one(); // or notify_all()

        // wait until a worker has claimed the connection
        while (incoming_fd >= 0) {
            main_cv.wait(guard);
        }
    }
}

3. System calls (10 points)

QUESTION 3A. Which of the following system calls are likely to block? List all that apply.

  1. read on a pipe whose pipe buffer is empty
  2. write to a pipe whose pipe buffer is full
  3. fork when there are no available process slots
  4. waitpid when the current process has no children yet
  5. None of the above

2pt 1 and 2 only. fork and waitpid return errors in the given situations.

QUESTION 3B. Which of the following statements about execv are true? List all that apply.

  1. It must always be paired with fork
  2. It does not return unless it encounters an error
  3. It replaces the current process’s program image with a new program
  4. It closes the current process’s file descriptors except for standard input, standard output, and standard error
  5. None of the above

2pt 2 and 3. 1 is false; execv is almost always combined with fork, but it can be used independently. 4 is false; it leaves all file descriptors open (except those explicitly marked as “close-on-exec”).

QUESTION 3C. When will the return value of a read system call equal 0? List all that apply.

  1. When the byte of data at the file position equals '\0'
  2. When the file descriptor is the read end of a pipe and either the pipe buffer is empty or the write end of the pipe is closed
  3. When the file descriptor is the read end of a pipe and both the pipe buffer is empty and the write end of the pipe is closed
  4. When the file descriptor is the write end of a pipe
  5. When the read system call is interrupted by a signal before it can read any data
  6. None of the above

1pt Only 3.

  1. False: read returns the number of bytes read, not the value of a byte.
  2. False: When the pipe buffer is empty but the write end of a pipe is closed, read will block.
  3. True!
  4. False: read will return an error in this situation (EINVAL).
  5. False: read will return an error in this situation (EINTR).

QUESTION 3D. In class, we explored some racer programs whose goal is to wait until a child process exits or a timeout has elapsed, whichever comes first. The racer-block program attempted to use system call interruption to solve this problem. Here’s a version of its code, compressed to omit error checking.

    static volatile sig_atomic_t got_signal = 0;
    void signal_handler(...) {
S1:     got_signal = 1;
    }

    int main() {
M1:     set_signal_handler(SIGCHLD, signal_handler);
M2:     pid_t p = fork();
M3:     if (p == 0) { /* child process code */ }
M4:     usleep(timeout_in_microseconds);
M5:     if (elapsed_time_in_microseconds() >= timeout_in_microseconds) {
M6:         printf("SLOW\n");   /* child did not exit before timeout */
M7:     } else {
M8:         printf("quick\n");  /* child exited before timeout */
M9:     }
    }

Which line contains a system call that the programmer expects to be interrupted whenever a child process exits before the timeout?

1pt Line M4.

QUESTION 3E. In what circumstance might this program print an incorrect result (SLOW when the child exited meaningfully before the timeout, or quick when the child exited meaningfully after the timeout)? Explain briefly, referring to specific lines of code.

2pt If the child exits right away, the parent process might receive a signal between line M2 and line M4. That will mean the usleep system call begins after the child has exited. usleep will not be interrupted by the signal, and will run to completion; the program will print SLOW instead of quick.

QUESTION 3F. The racer race condition can be solved by designing new system calls. In this question, we add system calls that can temporarily mask signals, causing them to be delayed until a future time. The system calls are:

  • sigmask(int signum)
    Mask signal signum. If the process is sent this signal, the kernel records the signal as pending, but does not interrupt the process.

  • sigunmask(int signum)
    Unmask signal signum and immediately deliver any pending signum to the process.

  • pusleep(useconds_t timeout)
    Unmask all signals and sleep for timeout. This happens effectively atomically, with no sleep–wakeup race conditions, so if the process has a pending signal when pusleep is called, the kernel immediately delivers the pending signal and pusleep returns with error code EINTR. The original set of masked signals is restored before return.

Use these system calls to fix the race condition in the racer-block code above, or explain briefly how this could be done.

2pt Replace line M4 with:

sigmask(SIGCHLD);
if (!got_signal) {
    pusleep(timeout_in_microseconds);
}
sigunmask(SIGCHLD);

This would work too:

sigmask(SIGCHLD);
pid_t wp = waitpid(p, &status, WNOHANG);
if (wp < 0) {
    pusleep(timeout_in_microseconds);
}
sigunmask(SIGCHLD);

The important part is to mask the SIGCHLD signal before calling pusleep.

4. Testing a shell (15 points)

Your shell problem set was tested with a check.pl script that runs shell commands and examines their output (including both standard output and standard error). In this problem you will reason about some of check.pl’s checks.

QUESTION 4A. Check REDIR17 and its expected output are:

  • Command: echo > out.txt the redirection can occur anywhere && cat out.txt
  • Expected output: the redirection can occur anywhere

Match the actual outputs, on the left, with the implementations that might generate those outputs, on the right. You might use an implementation more than once or not at all.

(When a feature is “not understood”, that means the implementation treats the feature’s tokens like normal command arguments; for instance, an implementation that does not understand conditionals would treat && tokens like normal arguments. The actual outputs translate newlines into spaces, which is what check.pl does.)

    1. (empty)
    2. > out.txt the redirection can occur anywhere && cat out.txt
    3. > out.txt the redirection can occur anywhere cat: out.txt: No such file or directory
    4. > out.txt the redirection can occur anywhere
    1. Conditionals incorrectly implemented, redirection correctly implemented
    2. Conditionals incorrectly implemented, redirection not understood
    3. Conditionals correctly implemented, redirection not understood
    4. Conditionals and redirection not understood
    5. None of the above

2pt 1—A, 2—D, 3—C, 4—B

If redirection is not understood, then the echo command will print > out.txt the redirection can occur anywhere (including the redirection character). So implementations B–D cannot match output 1.

  1. If there’s no output, then redirection seems to have worked well enough: the file out.txt contains the redirection can occur anywhere. However, the command that reads out.txt, cat out.txt, might not execute if conditionals are incorrectly implemented. The best match is A.
  2. Since the && token was printed like it was normal, the best match is D (conditionals not understood).
  3. This output contains text printed by cat, so the shell tried to run cat. The best match is C.
  4. This output has no evidence of any attempt to run cat. The best match is B.

QUESTION 4B. Check PIPE5 runs the command yes | head -n 5.

Match the actual outputs, on the left, with the implementations that might generate those outputs, on the right. You might use an implementation more than once or not at all.

    1. y y y y y y y y … forever
    2. yes: invalid option -- 'n' Try 'yes --help' for more information.
    3. y y y y y, but shell hangs rather than printing another prompt
    4. y y y y y
    1. Pipes correctly implemented
    2. Pipes not understood
    3. Pipes incorrectly hooked up to command standard input and/or output
    4. Pipes correctly hooked up to command standard input and/or output, but parent shell process does not implement pipe hygiene
    5. None of the above

2pt 1—C, 2—B, 3—D, 4—A

The expected output is y y y y y: the first five lines of yes’s output, which is an infinite stream of ys. So output 4 matches with implementation A.

If pipes aren’t understood, then yes will be given arguments like yes "|" "head" "-n" "5", and it might complain that it doesn’t understand option -n. So output 2 matches with implementation B.

If pipes are correctly hooked up but pipe hygiene is missing, then the yes command might block forever rather than exiting with head. The reason yes exits prematurely is that it tries to write to a closed pipe buffer (where no readers exist). That attempt causes a SIGPIPE signal, which by default causes yes to exit. So output 3 matches with implementation D.

Finally, output 1 looks like yes is writing its standard output directly to the terminal, rather than to head. This matches with implementation C.

QUESTION 4C. Check BG7 runs this command:

sh -c "sleep 0.1; /bin/echo Second" & sh -c "sleep 0.05; /bin/echo First" & sleep 0.15

List the true statements about this command.

  1. sleep 0.15 is executed in the foreground.
  2. sleep 0.15 ensures that there is enough time for both First and Second to be printed before the parent sh61 returns to the prompt.
  3. sh -c "sleep 0.1; /bin/echo Second" will finish executing before sh -c "sleep 0.05; /bin/echo First" begins executing.
  4. sleep 0.05 will finish executing before /bin/echo First begins executing.
  5. This command line should take a minimum of 0.30 seconds to execute.
  6. None of the above.

2pt 1, 2, 4

  1. True: The last command in this command line has no & background marker.
  2. True: Assuming the background processes start instantaneously, the first one will run for about 0.1 seconds and the second one (in parallel) for about 0.05 seconds.
  3. False: The two background commands will start almost simultaneously.
  4. True: The ; sequence connector ensures sleep 0.05 finishes before /bin/echo First begins.
  5. False: The command line will take about 0.15 seconds.

QUESTION 4D. How many different sh61 descendant processes (children, children of children, etc.) will be created as sh61 executes check BG7? Explain briefly.

3pt A good answer is nine.

  • A background sh61 to manage sh -c "sleep 0.1; /bin/echo Second"
    • A child process to run sh -c "sleep 0.1; /bin/echo Second"
      • A grandchild process to run sleep 0.1
      • A grandchild process to run /bin/echo Second
  • A background sh61 to manage sh -c "sleep 0.05; /bin/echo First"
    • A child process to run sh -c "sleep 0.05; /bin/echo First"
      • A grandchild process to run sleep 0.05
      • A grandchild process to run /bin/echo First
  • A child process to run sleep 0.15

Other answers are possible. For instance, an optimized shell might not fork a background sh61 for the two background conditionals, because those background conditionals are simple (they execute a single command). That would leave 7 descendant processes.

QUESTION 4E. This code is from a student’s incorrect implementation of background commands.

// This function runs the conditional `sec` to completion.
void run_conditional(shell_parser sec);

void run_list(shell_parser sec) {
    shell_parser condsec = sec.first_conditional();
    while (condsec) {
        if (condsec.op() != TYPE_BACKGROUND) {
            run_conditional(condsec);
        } else {
            pid_t p = fork();
            assert(p >= 0);
            if (p == 0) {
                run_conditional(condsec);
                _exit(0);
            }
            [[maybe_unused]] int status;
            waitpid(p, &status, 0);
        }
        condsec.next_conditional();
    }
}

What would the associated shell print on check BG7, and how long will it take to print it? Explain briefly.

2pt It will print Second First after 0.3 seconds, because it waits for background conditionals as well as foreground conditionals.

QUESTION 4F. How could the code in the previous question be fixed?

2pt Remove the waitpid. The comment indicates that run_conditional already contains the relevant waitpid (since it “runs the conditional sec to completion”).

QUESTION 4G. This similar code has a different problem.

// This function runs the conditional `sec` to completion.
void run_conditional(shell_parser sec);

void run_list(shell_parser sec) {
    shell_parser condsec = sec.first_conditional();
    while (condsec) {
        if (condsec.op() != TYPE_BACKGROUND) {
            run_conditional(condsec);
        } else {
            pid_t p = fork();
            assert(p >= 0);
            if (p == 0) {
                run_conditional(condsec);
            }
        }
        condsec.next_conditional();
    }
}

What will this code print on check BG7? Explain briefly.

2pt Something like First First Second or First Second First, followed by lots of shell prompts. Since the subshells created for background commands don’t exit, they proceed through the rest of the command line and then return to (a forked copy of) the main shell loop.

5. Storage patterns (10 points)

QUESTION 5A. In the third lecture in Unit 1, we saw some strange performance anomalies when sorting a list of integers via insertion sort using an intsort program. What are the best explanations for the anomalies we saw? List all that apply.

  1. The system calls required to allocate memory are inherently expensive.
  2. Processor caches handle sequential memory accesses more efficiently than random memory accesses.
  3. Dynamically-allocated objects that are allocated in rapid sequence tend to receive addresses that are nearby in virtual address space.
  4. Insertion sort is an inefficient sorting algorithm.
  5. None of the above.

2pt 2, 3

  1. False. Memory allocation is pretty inexpensive (compared, say, to disk reads); and furthermore, the process will request the a lot of memory from the OS at once (a batching optimization), so there won’t be many system calls.
  2. True.
  3. True.
  4. True, but not an explanation for the anomalies we saw.

QUESTION 5B. In which situations might a stdio cache be flushed? List all that apply.

  1. When a new write request would overflow the configured cache size.
  2. When the underlying file data changed because a different process wrote it.
  3. When a seek request moves the file position outside the range covered by the current cache.
  4. When the corresponding file is closed.
  5. Upon encountering Bélády’s anomaly.
  6. None of the above.

2pt 1, 3, 4

2 is false because, as we saw, the stdio cache is not coherent: it doesn’t detect underlying changes. 5 is false because Bélády’s anomaly has nothing to do with why a cache might flush.

QUESTION 5C. The remaining questions ask you to analyze strace output and characterize the behavior of the programs that produced those traces. Your answer should refer to the following reference strings; an answer might say “A for reads, B for writes” or “A then B for reads, no writes”.

Reference string Description
A [0–4095], [4096-8191], [8192–12287], … 4096-byte blocks, sequential
B 0, 1, 2, 3, 4, … 1-byte blocks, sequential
C 0, 4096, … 1-byte blocks, every 4096th byte
D …, 4, 3, 2, 1, 0 1-byte blocks, reverse sequential
None (empty)

Trace 1:

...
read(0, "C", 1)                         = 1
fstat(0, {st_mode=S_IFREG|0644, st_size=8188, ...}) = 0
lseek(0, 4096, SEEK_SET)                = 4096
read(0, "a", 1)                         = 1
lseek(0, 0, SEEK_SET)                   = 0
read(0, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "atisfies the Programming 2 and Sys"..., 4096) = 4092
...
  • Which reference strings (A–D or None) best describe the reads in this trace?
  • Which reference strings best describe the writes?
  • Could this program be using a typical stdio cache for reads and/or writes?

2pt C then A for reads; no writes; stdio cache unlikely for reads.

The trace first reads a single byte, then skips ahead to offset 4096 and reads another byte. The skipping matches reference string C.

After seeking back to position 0, the trace tries to read two full blocks of data. This matches reference string A.

Stdio caching is unlikely because stdio would read a full block, not a byte, during the initial C portion of the trace.

QUESTION 5D. Trace 2:

...
lseek(0, 8187, SEEK_SET)                = 8187
read(0, "\n", 1)                        = 1
write(1, "\n", 1)                       = 1
lseek(0, 8186, SEEK_SET)                = 8186
read(0, "t", 1)                         = 1
write(1, "t", 1)                        = 1
lseek(0, 8185, SEEK_SET)                = 8185
read(0, "c", 1)                         = 1
write(1, "c", 1)                        = 1
lseek(0, 8184, SEEK_SET)                = 8184
read(0, "\205", 1)                      = 1
write(1, "\205", 1)                     = 1
lseek(0, 8183, SEEK_SET)                = 8183
read(0, ";", 1)                         = 1
write(1, ";", 1)                        = 1
lseek(0, 8182, SEEK_SET)                = 8182
read(0, "k", 1)                         = 1
write(1, "k", 1)                        = 1
lseek(0, 8181, SEEK_SET)                = 8181
read(0, "e", 1)                         = 1
write(1, "e", 1)                        = 1
...
  • Which reference strings (A–D or None) best describe the reads in this trace?
  • Which reference strings best describe the writes?
  • Could this program be using a typical stdio cache for reads and/or writes?

2pt D for reads, B for writes, stdio cache unlikely for either.

The reads in this trace (on file descriptor 0) use offsets that count down from 8187 and read a byte at a time. This matches string D. The writes do not jump around, but occur a byte at a time. This matches string B. Caching is unlikely because all accesses use bytes; stdio would use 4096-byte blocks.

QUESTION 5E. Trace 3:

...
read(0, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "may be used as one of the four CS "..., 4096) = 4094
write(1,  "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "", 4096)                       = 0
write(1, "may be used as one of the four CS "..., 4094) = 4094
  • Which reference strings (A–D or None) best describe the reads in this trace?
  • Which reference strings best describe the writes?
  • Could this program be using a typical stdio cache for reads and/or writes?

2pt A for reads and writes, stdio cache possible for both.

This is straightforward. Note that B is also a possible reference string if the stdio cache is definitely in use.

6. Kernel (10 points)

QUESTION 6A. List the true statements about user processes on x86-64 operating systems.

  1. A user process cannot read or write memory containing kernel code or data unless the kernel explicitly allows it.
  2. When a user process executes an invalid instruction, such as dividing by zero, the kernel receives control through an exception and can decide to terminate the process.
  3. The %cs register’s lowest 2 bits determine the privilege level of a process, and user processes operate at the highest privilege level ((%cs & 3) == 0).
  4. A user process can execute the syscall instruction to safely raise the processor’s privilege level and transfer control to the kernel.
  5. User processes can configure the computer’s timer interrupt frequency to optimize their own CPU usage.
  6. If a user process enters an infinite loop, timer interrupts ensure that the kernel regains control and can schedule other processes.
  7. None of the above.

2pt 1, 2, 4, 6

3 is false because user processes operate at the lowest privilege level: ((%cs & 3) == 3). 5 is false because user processes cannot fully configure the timer interrupt frequency.

QUESTION 6B. List the true statements about system calls on WeensyOS.

  1. System calls have a different calling convention than normal function calls.
  2. The syscall instruction transfers control to the kernel, which starts executing in privileged mode at the virtual address defined by the %rdx register.
  3. User processes pass arguments to the kernel’s system call implementation by loading those arguments into registers before executing the syscall instruction.
  4. The kernel can return results from a system call by loading them into specific registers in current->regs.
  5. The kernel can return results from a system call by executing a return statement from its syscall function.
  6. None of the above.

2pt 1, 3, 4, 5

2 is false: it would be unsafe to let user processes determine where to start the kernel; the kernel picks up at a preconfigured address it set during boot.

QUESTION 6C. The next three questions present partially-obscured address-space loops using vmiter objects. The original loops were taken from correct and complete code on the WeensyOS problem set.

vmiter it1(/* some page table */, 0);
vmiter it2(/* some page table */, 0);

for (; it1.va() < PROC_START_ADDR && it2.va() < PROC_START_ADDR; it1 += PAGESIZE, it2 += PAGESIZE) {
    int r = it2.try_map(it1.pa(), it1.perm());
    if (r < 0) {
        /* do something sensible */
    }
}

List the true statements about this loop.

  1. A loop like this could be used in kernel_start to create kernel_pagetable, an identity-mapped page table.
  2. A loop like this could be used in process_setup to initialize some or all of an initial process address space.
  3. A loop like this could be used in a SYSCALL_FORK implementation to initialize some or all of a child process’s address space.
  4. A loop like this could be used in a SYSCALL_EXIT implementation to free process memory.
  5. Within the loop body, it1.va() == it2.va().
  6. None of the above.

2pt 2, 3, 5. In both process_setup and SYSCALL_FORK, a processes' page table should match that of kernel_pagetable up to PROC_START_ADDR.

1 is false because when kernel_pagetable is the first meaningful page table the kernel creates: there’s no other page table to copy from. 4 is false because SYSCALL_EXIT involves one page table, not two.

QUESTION 6D.

vmiter it(/* some page table */, 0);

for (; it.va() < MEMSIZE_PHYSICAL; it += PAGESIZE) {
    int perm;
    if (it.va() == 0) {
        perm = 0;
    } else if (it.va() < PROC_START_ADDR && it.va() != CONSOLE_ADDR) {
        perm = PTE_P | PTE_W;  
    } else {
        perm = PTE_P | PTE_W | PTE_U;
    }
    int r = it.try_map(it.va(), perm);
    if (r < 0) {
        /* do something sensible */
    }
}

List the true statements about this loop.

  1. A loop like this could be used in kernel_start to create kernel_pagetable, an identity-mapped page table.
  2. A loop like this could be used in process_setup to initialize some or all of an initial process address space.
  3. A loop like this could be used in a SYSCALL_FORK implementation to initialize some or all of a child process’s address space.
  4. A loop like this could be used in a SYSCALL_EXIT implementation to free process memory.
  5. Within the loop body, it.va() < MEMSIZE_VIRTUAL.
  6. None of the above.

2pt The best answer is 1, 5. In process page tables, user-accessible mappings are reference counted, so it would be weird to set arbitrary virtual addresses to user-accessible. 5 is true because MEMSIZE_PHYSICAL < MEMSIZE_VIRTUAL.

QUESTION 6E.

vmiter it1(/* some page table */, 0);
vmiter it2(/* some page table */, 0);

for (; it1.va() < MEMSIZE_VIRTUAL && it2.va() < MEMSIZE_VIRTUAL; it1 += PAGESIZE, it2 += PAGESIZE) {
    if (!it1.user() || !it1.writable() || it1.va() == CONSOLE_ADDR) {
        /* do something sensible */
    } else {
        /* do something sensible */
    }
}

List the true statements about this loop.

  1. A loop like this could be used in kernel_start to create kernel_pagetable, an identity-mapped page table.
  2. A loop like this could be used in process_setup to initialize some or all of an initial process address space.
  3. A loop like this could be used in a SYSCALL_FORK implementation to initialize some or all of a child process’s address space.
  4. A loop like this could be used in a SYSCALL_EXIT implementation to free process memory.
  5. None of the above.

2pt The best answer is 3: the loop is copying mappings from one page table to another. Two cases are required because inaccessible or read-only pages should be shared, but user-writable pages must have their contents copied. Although a loop somewhat like this could be used in SYSCALL_EXIT, it would only need one vmiter.

7. Assembly (10 points)

Intel is trying to sell the next generation of its Fartium Core Trio processor. On this still-terrible processor, the only instructions that work are the ones on this list:

  1. addq %rsi, %rax
  2. cmpq %rsi, %rdi
  3. decq %rsi
  4. incq %rax
  5. je L1
  6. jl L2
  7. jmp L3
  8. leaq (%rax,%rdi,2), %rsi
  9. movl (%rdi,%rax,4), %edi
  10. retq
  11. xchgq %rax, %rdi
  12. xorq %rax, %rax

The instructions, registers, and arguments must match exactly. For example, addq %rsi, %rax works fine, but addq %rdi, %rax will explode the processor.

QUESTION 7A. Which Fartium Core Trio instructions examine the condition flags (i.e., behave differently based on condition flags)? List all that apply (by number or name) or say “none.”

2pt je, jl

QUESTION 7B. Which Fartium Core Trio instructions read primary memory (not including memory used to store instructions)? List all that apply or say “none.”

2pt movl (retq also does, but it’s OK to leave it off)

QUESTION 7C. In the remaining questions, your job is to correctly implement C++ functions using only the instructions on this list. Your implementations must have the same semantics as the C++ functions, but may perform much worse than one might expect. Your response can be a list of instruction numbers (“2, 11, 9, 10”) or names (“cmpq; xchgq; movl; retq”), intermixed with labels L1–L3 as necessary. Assume that the Fartium Core Trio uses the normal x86-64 calling convention.

int return_one() {
    return 1;
}

2pt

xorq %rax, %rax
incq %rax
retq

QUESTION 7D.

long return_minus_one() {
    return -1;
}

2pt

xorq %rax, %rax
xchgq %rax, %rdi
xorq %rax, %rax
leaq (%rax,%rdi,2), %rsi    # %rsi is now 0
decq %rsi                   # %rsi is now -1
addq %rsi, %rax             # %rax is now 0 + -1 = -1
retq

QUESTION 7E.

long max(long a, long b) {
    if (a > b) {
        return a;
    } else {
        return b;
    }
}

2pt Here’s a good answer:

     cmpq %rsi, %rdi
     jl L4
     # if we get here, `%rdi - %rsi >= 0`, so we should return %rdi
     xchgq %rax, %rdi
     retq
L4:  xorq %rax, %rax
     addq %rsi, %rax
     retq

Or you can use a loop to increase %rdi until it is no less than %rsi!

L3:  cmpq %rsi, %rdi
     jl L2
     xchgq %rax, %rdi
     retq
L2:  xchgq %rax, %rdi
     incq %rax
     xchgq %rax, %rdi
     jmp L3

8. Data representation (15 points)

QUESTION 8A. Given this code:

int array[10000];
int* p1 = &array[1];

Provide initializations of a new variable, p2, so that p2 - p1 has the given values according to the C++ abstract machine, or say that doing so is impossible. Your answer will comprise five initializations, one per value.

  1. p2 - p1 == 0
  2. p2 - p1 == 3
  3. p2 - p1 == -3
  4. p2 - p1 causes undefined behavior
  5. p2 - p1 returns a value dependent on the processor architecture, but does not cause undefined behavior

3pt

  1. int* p2 = p1;
  2. int* p2 = &array[4];
  3. Impossible (array[-2] does not exist)
  4. int p2[0];
  5. Impossible (C++ abstract machine makes this expression either well-defined or undefined behavior)

QUESTION 8B. Which of the marked printf lines in the following code contain violations of the live object rule? List all that apply or say “none.”

#include <cstdio>
int globalArray[5];

int* return_pointer_to_parameter(int x) {
    return &x;
}

int main() {
    ptr = &globalArray[5];
  A:  printf("%d\n", *ptr);
  B:  printf("%p\n", ptr);

    ptr = return_pointer_to_parameter(10);
  C:  printf("%d\n", *ptr);

    ptr = new int(12);
  D:  printf("%d\n", *ptr);

    delete ptr;
  E:  printf("%p\n", ptr);

    ptr = nullptr;
  F:  printf("%d\n", *ptr);

    return 0;
}

2pt A, C, and F.

The initialization of ptr produces a pointer “one past the end” of the global array. This pointer doesn’t point to an object, so dereferencing it at A violates the rule. Printing it, as at B, is fine, though.

At C, we dereference a pointer to an object that has gone out of scope.

At D, we print an uninitialized integer. This is bad news, but not a violation of the live object rule—the pointer points to a live object.

At E, we print the pointer value of a pointer that was just deallocated. Again, unwise, but not a violation of the live object rule, because we don’t dereference the pointer.

At F, we dereference a null pointer: classic live object rule violation.

QUESTION 8C. The next three questions concern an implementation of pset 1 that allocates memory from a single default_buffer of size 256 bytes.

Write a m61_malloc call that should always fail on this implementation.

1pt m61_malloc(257); This call tries to allocate more memory than there is available.

QUESTION 8D. Write two or more m61_malloc calls where at least one should always fail on this implementation, even though the total user-accessible memory requested is less than 256 bytes.

2pt m61_malloc(1); m61_malloc(254);. The second allocation should fail because of alignment concerns: the returned pointers must be at least 16 bytes apart, which would require a total size of at least 16 + 256 bytes of buffer.

QUESTION 8E. Write a sequence of m61_malloc and m61_free calls where at least one of the m61_malloc calls should fail unless the implementation correctly coalesces freed allocations.

2pt

auto p1 = m61_malloc(100);
auto p2 = m61_malloc(100);
m61_free(p1);
m61_free(p2);
m61_malloc(200);

QUESTION 8F. The UTF-8/Ethics lecture presented four design ideas for Unicode encodings. They were:

  1. Idea #1: Unary encoding
  2. Idea #2: Byte pairs
  3. Idea #3: Byte triples
  4. Idea #4: Self-synchronizing byte quadruplets

Which of these encodings would cause clear economic harm to some group if it were the only available text encoding, and which would cause clear allocative harm? Explain briefly.

2pt Idea #2 causes clear allocative harm: characters above 0x1FFF cannot be represented, which includes most Chinese characters; entire groups cannot represent their language in computer-accessible text. Idea #1 causes clear economic harm, but not necessarily allocative harm: every character can be represented, but possibly using an enormous number of bytes. Though ideas #3 and #4 cause slight economic harm relative to UTF-8 for groups whose languages use characters above 0x7F, it’s nothing compared to idea #1.

QUESTION 8G. The UTF-8 encoding is defined as follows:

  • Encode C \leq \texttt{0x7F} in the single byte with value ⟨C⟩
  • Encode \texttt{0x80} \leq C \leq \texttt{0x7FF} in two bytes:
    • Let C_0 equal bits 0–5 of C, and C_1 equal bits 6–10
    • Use bytes ⟨\texttt{0xC0} + C_1⟩ ⟨\texttt{0x80} + C_0⟩
  • Encode \texttt{0x800} \leq C \leq \texttt{0xFFFF} in three bytes:
    • Let C_0 equal bits 0–5 of C, C_1 bits 6–11, and C_2 bits 12–15
    • Use bytes ⟨\texttt{0xE0} + C_2⟩ ⟨\texttt{0x80} + C_1⟩ ⟨\texttt{0x80} + C_0⟩
  • Encode \texttt{0x10000} \leq C \leq \texttt{0x10FFFF} in four bytes:
    • Let C_0 equal bits 0–5 of C, C_1 bits 6–11, C_2 bits 12–17, and C_3 bits 18–20
    • Use bytes ⟨\texttt{0xF0} + C_3⟩ ⟨\texttt{0x80} + C_2⟩ ⟨\texttt{0x80} + C_1⟩ ⟨\texttt{0x80} + C_0⟩

Does the UTF-8 encoding resemble big-endian or little-endian integer representation?

1pt It’s more like big endian, because the most significant bits appear in bytes with lower addresses.

QUESTION 8H. This function intends to encode a Unicode character u into UTF-8 in the character buffer buf. Complete the missing case so that the function works for any u between 0 and 0xFFFF.

// Returns the number of bytes encoded
int encode_utf8(unsigned u, unsigned char* buf) {
    if (u <= 0x7F) {
        buf[0] = u;
        return 1;
    } else if (u <= 0xFFFF) {
        // YOUR CODE HERE
    } else {
        assert(false);
    }
}

2pt

buf[0] = 0xC0 + (u >> 6);
buf[1] = 0x80 + (u & 0x3F);
return 2;

9. The end (5 points)

QUESTION 9A. Is there any TF or fellow student you’d like to shout out? Any answer but no answer will receive full credit.

5pts Everyone is awesome