Final – CS 61

Show all solutions Hide all solutions

Notes

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated.

If you get stuck, move on to the next question. When unsure, briefly explain your thinking for potential partial credit.

1. Our governing body (15 points)

The Harvard Corporation has attempted to complete pset 6. Their implementation has a lot of problems, but they refuse to admit they’ve done anything wrong. Help prove that they have problems by attacking their implementation.

QUESTION 1A. Phase 1 of the problem set advises students to implement the io61_try_lock, io61_lock, and io61_unlock functions “using a single std::recursive_mutex per file.” The Harvard Corporation likes to go its own way, and instead implemented their own coarse-grained lock using atomics. Here are the relevant excerpts of their code:

struct io61_file { ...
    std::atomic<int> lock;
};

int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
    if (len == 0
        || f->lock.exchange(1) == 0) {
        return 0;
    }
    return -1;
}

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    while (io61_try_lock(f, off, len, locktype) != 0) {
    }
    return 0;
}

int io61_unlock(io61_file* f, off_t off, off_t len) {
    if (len != 0) {
        f->lock = 0;
    }
    return 0;
}

List the true statements.

This implementation has undefined behavior caused by data races on the f->lock field.
This implementation is blocking (rather than polling).
This implementation correctly provides mutual exclusion (i.e., while one thread has acquired an exclusive range lock using io61_lock(f, off, len, LOCK_EX), no other thread can acquire an overlapping range lock on the same file).
The f->lock field implements a recursive mutex.
None of the above.

QUESTION 1B. Phase 1 is tested using the ftxxfer program. When the Harvard Corporation runs ./ftxxfer on their code, they observe deterministic behavior—the exact same result every time, regardless of scheduling order, random number choice, or number of threads. What result to they see and why? Explain briefly.

QUESTION 1C. The Corporation decides to forge ahead to Phase 2, fine-grained locking. They want to implement fine-grained locking with an array of atomic lock bytes, one lock byte per byte of file data.

struct io61_file { ...
    off_t size;                         // size of file
    std::atomic<unsigned char>* locks;  // array of `size` lock bytes
};

int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
    for (off_t x = off; x < off + len; ++x) {
        if (f->locks[x].exchange(1) == 1) {
            return -1;
        }
    }
    return 0;
}

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    /* ... same as above ... */
}

int io61_unlock(io61_file* f, off_t off, off_t len) {
    for (off_t x = off; x < off + len; ++x) {
        f->locks[x] = 0;
    }
}

List the true statements; assume that f->size and f->locks are initialized correctly.

This implementation uses fine-grained locking.
This implementation behaves correctly when len == 0.
This implementation has the same behavior on ./ftxxfer as the previous implementation.
An application can cause undefined behavior by passing unexpected arguments to io61_try_lock. (Expected arguments always obey off >= 0, off <= f->size, len >= 0, and off + len <= f->size.)
None of the above.

QUESTION 1D. The Corporation’s io61_try_lock can behave incorrectly even when passed expected arguments (off >= 0 && off <= f->size && len >= 0 && off + len <= f->size). Correct the code, or, for partial credit, describe what’s wrong with the Corporation’s version.

3pt If the attempt to acquire a lock fails, then io61_try_lock doesn’t clean up the partially-acquired lock. This leaves some bytes in the file locked inappropriately, even though io61_try_lock returned -1. This fixes it:

int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
    for (off_t x = off; x < off + len; ++x) {
        if (f->locks[x].exchange(1) == 1) {
            for (off_t y = off; y < x; ++y) {
                f->locks[y] = 0;
            }
            return -1;
        }
    }
    return 0;
}

QUESTION 1E. In some cases, even the unfixed io61_try_lock from Question 1C always behaves correctly. Write an assertion involving off, len, and f that defines those cases. For full credit your assertion should be as broad as possible.

1 ec pt
assert(off >= 0 && off <= f->size && len >= 0 && len <= 1 && off + len <= f->size);
The most important part of this assertion is that len <= 1: the code always works for 0- and 1-byte lock regions, but doesn’t always work for any larger lock regions.

QUESTION 1F. The Corporation moves on to phase 3, blocking. It’s gotten this far (not far):

std::mutex file_mutex;

struct io61_file { ...
    off_t size;                         // size of file
    std::atomic<unsigned char>* locks;  // array of `size` lock bytes
    std::condition_variable_any cv;
};

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    while (io61_try_lock(f, off, len, locktype) != 0) {
    }
    return 0;
}

void io61_unlock(io61_file* f, off_t off, off_t len) {
    for (off_t x = off; x < off + len; ++x) {
        f->locks[x] = 0;
    }
    f->cv.notify_all();
}

Change the Corporation’s io61_lock to correctly block. Do not add any synchronization objects. You may assume that all previous problems with io61_try_lock have been fixed.

3pt Here are two potential solutions, each using std::condition_variable_any::wait on the file_mutex. Note that the condition variable cannot wait on one of the f->locks: these are atomic integers, not full std::mutex locks.

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    file_mutex.lock();
    while (io61_try_lock(f, off, len, locktype) != 0) {
        f->cv.wait(file_mutex);
    }
    file_mutex.unlock();
    return 0;
}

int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
    std::unique_lock guard(file_mutex);
    while (io61_try_lock(f, off, len, locktype) != 0) {
        f->cv.wait(guard);
    }
    return 0;
}

QUESTION 1G. Briefly describe one way in which the Corporation’s synchronization plan uses fine-grained locking, and one way in which it uses coarse-grained locking.

2. Server synchronization (10 points)

A web server is a process that accepts connections from clients, reads data from those connections, performs computations on that data, and sends responses back on those connections. This isn’t unlike the weensy database we saw in the synchronization unit. A dirt-simple server might look like this:

int main() {
    while (true) {
        int fd = accept_incoming_request();  // accept a new connection, return its socket file descriptor
        handle_client(fd);                   // read data, write response
        close(fd);
    }
    return 0;
}

In this code, both accept_incoming_request() and handle_client() can block.

QUESTION 2A. How many concurrent connections can the dirt-simple server handle at a time?

QUESTION 2B. Change the main loop to handle each connection in a new child process. This is the multi-process model commonly seen in early web servers. We’ve started the code for you; you may assume that fork() always succeeds.

int main() {
    while (true) {
        int fd = accept_incoming_request();

        // YOUR CODE HERE, including at least one `fork()`

        // clean up zombies
        int status;
        while (waitpid(-1, &status, WNOHANG) > 0) {
        }
    }
}

2pt
pid_t p = fork();
if (p == 0) {
    handle_connection(fd);
    close(fd);  // Optional (`_exit` automatically closes fd)
    _exit(0);   // Or `return 0` (since we are in main)
}
close(fd);      // Required in parent!
The important things are to (1) close(fd) in the parent (otherwise the parent will leak file descriptors, and eventually new connections will fail), and (2) _exit in the child.

QUESTION 2C. Change the main loop to handle each connection in a separate thread. This is the multi-threaded model also often seen in early web servers. We’ve started the code for you.

Hint: Remember that in some situations it is illegal to for a std::thread object to go out of scope!

void handle_client_and_close(int fd) {
    handle_client(fd);
    close(fd);
}

int main() {
    while (true) {
        int fd = accept_incoming_request();
        // YOUR CODE HERE
    }
}

2pt
std::thread th(handle_client_and_close, fd);
th.detach();
Note that joining the thread with th.join() would return to the behavior in Question 1A, where the server can process one connection at a time.

QUESTION 2D. True or false? An advantage of the multi-process model is that bugs in handle_client will only crash the child process handling a single connection. In contrast, in the multi-threaded model, a bug in handle_client will often crash the whole server.

QUESTION 2E. True or false? An advantage of the multi-threaded model is that different connections can easily share state. In contrast, if the weensy database used a simple multi-process model, then changes made by one client would be hidden from all other clients.

QUESTION 2F. A simple multi-threaded server can create an unbounded number of threads, which is dangerous. In this version, we create a thread pool instead. The main thread creates 30 worker threads and passes incoming connections off to these threads; at most 30 connections can be handled concurrently.

int incoming_fd = -1;

void worker() {
    while (true) {
        // poll for connection, then claim it
        int fd = incoming_fd;
        while (fd < 0) {
            fd = incoming_fd;
        }
        incoming_fd = -1;

        // handle claimed connection
        handle_client(fd);
        close(fd);
    }
}

int main() {
    for (int i = 0; i != 30; ++i) {
        std::thread th(worker);
        th.detach();
    }

    while (true) {
        incoming_fd = accept_incoming_request();
        // wait until some worker claims the connection
        while (incoming_fd >= 0) {
        }
    }
}

Does this code violate the Fundamental Law of Synchronization (that is, does it have any undefined behavior due to data races)? Explain briefly.

QUESTION 2G. Describe how to add correct synchronization to this thread-pool server. List specific synchronization objects you’d add, and explain briefly how they should be used in worker and main.

2pt I’d add a std::mutex and std::condition_variable_any to protect incoming_fd. worker would lock the mutex before accessing incoming_fd, and cv.wait() to block until an incoming_fd was available. main would lock the mutex when accept_incoming_request(), then use cv.wait() to block until incoming_fd was claimed by some worker. Each kind of thread would notify the other.

All we asked for was an explanation; nevertheless, here’s some code.

int incoming_fd = -1;
std::mutex m;
std::condition_variable_any cv;

void worker() {
    while (true) {
        // wait for connection, then claim it
        std::unique_lock guard(m);
        int fd = incoming_fd;
        while (fd < 0) {
            cv.wait(guard);
            fd = incoming_fd;
        }
        incoming_fd = -1;
        cv.notify_all();
        guard.unlock();

        // handle connection
        handle_client(fd);
        close(fd);
    }
}

int main() {
    // create thread pool
    for (int i = 0; i != 30; ++i) {
        std::thread th(worker);
        th.detach();
    }

    while (true) {
        // accept new connection
        int new_fd = accept_incoming_request();

        // post new connection
        std::unique_lock guard(m);
        incoming_fd = new_fd;
        cv.notify_one(); // or notify_all()

        // wait until a worker has claimed the connection
        while (incoming_fd >= 0) {
            cv.wait(guard);
        }
    }
}

This will feature many spurious wakeups per connection, since the worker thread’s cv.notify_all() will wake up other waiting workers as well as the main thread. A solution with two condition variables can avoid this problem:

int incoming_fd = -1;
std::mutex m;
std::condition_variable_any worker_cv;
std::condition_variable_any main_cv;

void worker() {
    while (true) {
        // wait for connection, then claim it
        std::unique_lock guard(m);
        int fd = incoming_fd;
        while (fd < 0) {
            worker_cv.wait(guard);
            fd = incoming_fd;
        }
        incoming_fd = -1;
        main_cv.notify_all();
        guard.unlock();

        // handle connection
        handle_client(fd);
        close(fd);
    }
}

int main() {
    // create thread pool
    for (int i = 0; i != 30; ++i) {
        std::thread th(worker);
        th.detach();
    }

    while (true) {
        // accept new connection
        int new_fd = accept_incoming_request();

        // post new connection
        std::unique_lock guard(m);
        incoming_fd = new_fd;
        worker_cv.notify_one(); // or notify_all()

        // wait until a worker has claimed the connection
        while (incoming_fd >= 0) {
            main_cv.wait(guard);
        }
    }
}

3. System calls (10 points)

QUESTION 3A. Which of the following system calls are likely to block? List all that apply.

read on a pipe whose pipe buffer is empty
write to a pipe whose pipe buffer is full
fork when there are no available process slots
waitpid when the current process has no children yet
None of the above

QUESTION 3B. Which of the following statements about execv are true? List all that apply.

It must always be paired with fork
It does not return unless it encounters an error
It replaces the current process’s program image with a new program
It closes the current process’s file descriptors except for standard input, standard output, and standard error
None of the above

QUESTION 3C. When will the return value of a read system call equal 0? List all that apply.

When the byte of data at the file position equals '\0'
When the file descriptor is the read end of a pipe and either the pipe buffer is empty or the write end of the pipe is closed
When the file descriptor is the read end of a pipe and both the pipe buffer is empty and the write end of the pipe is closed
When the file descriptor is the write end of a pipe
When the read system call is interrupted by a signal before it can read any data
None of the above

QUESTION 3D. In class, we explored some racer programs whose goal is to wait until a child process exits or a timeout has elapsed, whichever comes first. The racer-block program attempted to use system call interruption to solve this problem. Here’s a version of its code, compressed to omit error checking.

    static volatile sig_atomic_t got_signal = 0;
    void signal_handler(...) {
S1:     got_signal = 1;
    }

    int main() {
M1:     set_signal_handler(SIGCHLD, signal_handler);
M2:     pid_t p = fork();
M3:     if (p == 0) { /* child process code */ }
M4:     usleep(timeout_in_microseconds);
M5:     if (elapsed_time_in_microseconds() >= timeout_in_microseconds) {
M6:         printf("SLOW\n");   /* child did not exit before timeout */
M7:     } else {
M8:         printf("quick\n");  /* child exited before timeout */
M9:     }
    }

Which line contains a system call that the programmer expects to be interrupted whenever a child process exits before the timeout?

QUESTION 3E. In what circumstance might this program print an incorrect result (SLOW when the child exited meaningfully before the timeout, or quick when the child exited meaningfully after the timeout)? Explain briefly, referring to specific lines of code.

QUESTION 3F. The racer race condition can be solved by designing new system calls. In this question, we add system calls that can temporarily mask signals, causing them to be delayed until a future time. The system calls are:

sigmask(int signum)
Mask signal signum. If the process is sent this signal, the kernel records the signal as pending, but does not interrupt the process.
sigunmask(int signum)
Unmask signal signum and immediately deliver any pending signum to the process.
pusleep(useconds_t timeout)
Unmask all signals and sleep for timeout. This happens effectively atomically, with no sleep–wakeup race conditions, so if the process has a pending signal when pusleep is called, the kernel immediately delivers the pending signal and pusleep returns with error code EINTR. The original set of masked signals is restored before return.

Use these system calls to fix the race condition in the racer-block code above, or explain briefly how this could be done.

2pt Replace line M4 with:

sigmask(SIGCHLD);
if (!got_signal) {
    pusleep(timeout_in_microseconds);
}
sigunmask(SIGCHLD);

This would work too:

sigmask(SIGCHLD);
pid_t wp = waitpid(p, &status, WNOHANG);
if (wp < 0) {
    pusleep(timeout_in_microseconds);
}
sigunmask(SIGCHLD);

The important part is to mask the SIGCHLD signal before calling pusleep.

4. Testing a shell (15 points)

Your shell problem set was tested with a check.pl script that runs shell commands and examines their output (including both standard output and standard error). In this problem you will reason about some of check.pl’s checks.

QUESTION 4A. Check REDIR17 and its expected output are:

Command: echo > out.txt the redirection can occur anywhere && cat out.txt
Expected output: the redirection can occur anywhere

Match the actual outputs, on the left, with the implementations that might generate those outputs, on the right. You might use an implementation more than once or not at all.

(When a feature is “not understood”, that means the implementation treats the feature’s tokens like normal command arguments; for instance, an implementation that does not understand conditionals would treat && tokens like normal arguments. The actual outputs translate newlines into spaces, which is what check.pl does.)

1. (empty)
2. > out.txt the redirection can occur anywhere && cat out.txt
3. > out.txt the redirection can occur anywhere cat: out.txt: No such file or directory
4. > out.txt the redirection can occur anywhere
1. Conditionals incorrectly implemented, redirection correctly implemented
2. Conditionals incorrectly implemented, redirection not understood
3. Conditionals correctly implemented, redirection not understood
4. Conditionals and redirection not understood
5. None of the above

QUESTION 4B. Check PIPE5 runs the command yes | head -n 5.

Match the actual outputs, on the left, with the implementations that might generate those outputs, on the right. You might use an implementation more than once or not at all.

1. y y y y y y y y … forever
2. yes: invalid option -- 'n' Try 'yes --help' for more information.
3. y y y y y, but shell hangs rather than printing another prompt
4. y y y y y
1. Pipes correctly implemented
2. Pipes not understood
3. Pipes incorrectly hooked up to command standard input and/or output
4. Pipes correctly hooked up to command standard input and/or output, but parent shell process does not implement pipe hygiene
5. None of the above

QUESTION 4C. Check BG7 runs this command:

sh -c "sleep 0.1; /bin/echo Second" & sh -c "sleep 0.05; /bin/echo First" & sleep 0.15

List the true statements about this command.

sleep 0.15 is executed in the foreground.
sleep 0.15 ensures that there is enough time for both First and Second to be printed before the parent sh61 returns to the prompt.
sh -c "sleep 0.1; /bin/echo Second" will finish executing before sh -c "sleep 0.05; /bin/echo First" begins executing.
sleep 0.05 will finish executing before /bin/echo First begins executing.
This command line should take a minimum of 0.30 seconds to execute.
None of the above.

QUESTION 4D. How many different sh61 descendant processes (children, children of children, etc.) will be created as sh61 executes check BG7? Explain briefly.

QUESTION 4E. This code is from a student’s incorrect implementation of background commands.

// This function runs the conditional `sec` to completion.
void run_conditional(shell_parser sec);

void run_list(shell_parser sec) {
    shell_parser condsec = sec.first_conditional();
    while (condsec) {
        if (condsec.op() != TYPE_BACKGROUND) {
            run_conditional(condsec);
        } else {
            pid_t p = fork();
            assert(p >= 0);
            if (p == 0) {
                run_conditional(condsec);
                _exit(0);
            }
            [[maybe_unused]] int status;
            waitpid(p, &status, 0);
        }
        condsec.next_conditional();
    }
}

What would the associated shell print on check BG7, and how long will it take to print it? Explain briefly.

QUESTION 4F. How could the code in the previous question be fixed?

QUESTION 4G. This similar code has a different problem.

// This function runs the conditional `sec` to completion.
void run_conditional(shell_parser sec);

void run_list(shell_parser sec) {
    shell_parser condsec = sec.first_conditional();
    while (condsec) {
        if (condsec.op() != TYPE_BACKGROUND) {
            run_conditional(condsec);
        } else {
            pid_t p = fork();
            assert(p >= 0);
            if (p == 0) {
                run_conditional(condsec);
            }
        }
        condsec.next_conditional();
    }
}

What will this code print on check BG7? Explain briefly.

5. Storage patterns (10 points)

QUESTION 5A. In the third lecture in Unit 1, we saw some strange performance anomalies when sorting a list of integers via insertion sort using an intsort program. What are the best explanations for the anomalies we saw? List all that apply.

The system calls required to allocate memory are inherently expensive.
Processor caches handle sequential memory accesses more efficiently than random memory accesses.
Dynamically-allocated objects that are allocated in rapid sequence tend to receive addresses that are nearby in virtual address space.
Insertion sort is an inefficient sorting algorithm.
None of the above.

QUESTION 5B. In which situations might a stdio cache be flushed? List all that apply.

When a new write request would overflow the configured cache size.
When the underlying file data changed because a different process wrote it.
When a seek request moves the file position outside the range covered by the current cache.
When the corresponding file is closed.
Upon encountering Bélády’s anomaly.
None of the above.

QUESTION 5C. The remaining questions ask you to analyze strace output and characterize the behavior of the programs that produced those traces. Your answer should refer to the following reference strings; an answer might say “A for reads, B for writes” or “A then B for reads, no writes”.

	Reference string	Description
A	[0–4095], [4096-8191], [8192–12287], …	4096-byte blocks, sequential
B	0, 1, 2, 3, 4, …	1-byte blocks, sequential
C	0, 4096, …	1-byte blocks, every 4096th byte
D	…, 4, 3, 2, 1, 0	1-byte blocks, reverse sequential
None	(empty)

Trace 1:

...
read(0, "C", 1)                         = 1
fstat(0, {st_mode=S_IFREG|0644, st_size=8188, ...}) = 0
lseek(0, 4096, SEEK_SET)                = 4096
read(0, "a", 1)                         = 1
lseek(0, 0, SEEK_SET)                   = 0
read(0, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "atisfies the Programming 2 and Sys"..., 4096) = 4092
...

Which reference strings (A–D or None) best describe the reads in this trace?
Which reference strings best describe the writes?
Could this program be using a typical stdio cache for reads and/or writes?

QUESTION 5D. Trace 2:

...
lseek(0, 8187, SEEK_SET)                = 8187
read(0, "\n", 1)                        = 1
write(1, "\n", 1)                       = 1
lseek(0, 8186, SEEK_SET)                = 8186
read(0, "t", 1)                         = 1
write(1, "t", 1)                        = 1
lseek(0, 8185, SEEK_SET)                = 8185
read(0, "c", 1)                         = 1
write(1, "c", 1)                        = 1
lseek(0, 8184, SEEK_SET)                = 8184
read(0, "\205", 1)                      = 1
write(1, "\205", 1)                     = 1
lseek(0, 8183, SEEK_SET)                = 8183
read(0, ";", 1)                         = 1
write(1, ";", 1)                        = 1
lseek(0, 8182, SEEK_SET)                = 8182
read(0, "k", 1)                         = 1
write(1, "k", 1)                        = 1
lseek(0, 8181, SEEK_SET)                = 8181
read(0, "e", 1)                         = 1
write(1, "e", 1)                        = 1
...

Which reference strings (A–D or None) best describe the reads in this trace?
Which reference strings best describe the writes?
Could this program be using a typical stdio cache for reads and/or writes?

QUESTION 5E. Trace 3:

...
read(0, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "may be used as one of the four CS "..., 4096) = 4094
write(1,  "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "", 4096)                       = 0
write(1, "may be used as one of the four CS "..., 4094) = 4094

Which reference strings (A–D or None) best describe the reads in this trace?
Which reference strings best describe the writes?
Could this program be using a typical stdio cache for reads and/or writes?

6. Kernel (10 points)

QUESTION 6A. List the true statements about user processes on x86-64 operating systems.

A user process cannot read or write memory containing kernel code or data unless the kernel explicitly allows it.
When a user process executes an invalid instruction, such as dividing by zero, the kernel receives control through an exception and can decide to terminate the process.
The %cs register’s lowest 2 bits determine the privilege level of a process, and user processes operate at the highest privilege level ((%cs & 3) == 0).
A user process can execute the syscall instruction to safely raise the processor’s privilege level and transfer control to the kernel.
User processes can configure the computer’s timer interrupt frequency to optimize their own CPU usage.
If a user process enters an infinite loop, timer interrupts ensure that the kernel regains control and can schedule other processes.
None of the above.

QUESTION 6B. List the true statements about system calls on WeensyOS.

System calls have a different calling convention than normal function calls.
The syscall instruction transfers control to the kernel, which starts executing in privileged mode at the virtual address defined by the %rdx register.
User processes pass arguments to the kernel’s system call implementation by loading those arguments into registers before executing the syscall instruction.
The kernel can return results from a system call by loading them into specific registers in current->regs.
The kernel can return results from a system call by executing a return statement from its syscall function.
None of the above.

QUESTION 6C. The next three questions present partially-obscured address-space loops using vmiter objects. The original loops were taken from correct and complete code on the WeensyOS problem set.

vmiter it1(/* some page table */, 0);
vmiter it2(/* some page table */, 0);

for (; it1.va() < PROC_START_ADDR && it2.va() < PROC_START_ADDR; it1 += PAGESIZE, it2 += PAGESIZE) {
    int r = it2.try_map(it1.pa(), it1.perm());
    if (r < 0) {
        /* do something sensible */
    }
}

List the true statements about this loop.

A loop like this could be used in kernel_start to create kernel_pagetable, an identity-mapped page table.
A loop like this could be used in process_setup to initialize some or all of an initial process address space.
A loop like this could be used in a SYSCALL_FORK implementation to initialize some or all of a child process’s address space.
A loop like this could be used in a SYSCALL_EXIT implementation to free process memory.
Within the loop body, it1.va() == it2.va().
None of the above.

QUESTION 6D.

vmiter it(/* some page table */, 0);

for (; it.va() < MEMSIZE_PHYSICAL; it += PAGESIZE) {
    int perm;
    if (it.va() == 0) {
        perm = 0;
    } else if (it.va() < PROC_START_ADDR && it.va() != CONSOLE_ADDR) {
        perm = PTE_P | PTE_W;  
    } else {
        perm = PTE_P | PTE_W | PTE_U;
    }
    int r = it.try_map(it.va(), perm);
    if (r < 0) {
        /* do something sensible */
    }
}

List the true statements about this loop.

A loop like this could be used in kernel_start to create kernel_pagetable, an identity-mapped page table.
A loop like this could be used in process_setup to initialize some or all of an initial process address space.
A loop like this could be used in a SYSCALL_FORK implementation to initialize some or all of a child process’s address space.
A loop like this could be used in a SYSCALL_EXIT implementation to free process memory.
Within the loop body, it.va() < MEMSIZE_VIRTUAL.
None of the above.

QUESTION 6E.

vmiter it1(/* some page table */, 0);
vmiter it2(/* some page table */, 0);

for (; it1.va() < MEMSIZE_VIRTUAL && it2.va() < MEMSIZE_VIRTUAL; it1 += PAGESIZE, it2 += PAGESIZE) {
    if (!it1.user() || !it1.writable() || it1.va() == CONSOLE_ADDR) {
        /* do something sensible */
    } else {
        /* do something sensible */
    }
}

List the true statements about this loop.

A loop like this could be used in kernel_start to create kernel_pagetable, an identity-mapped page table.
A loop like this could be used in process_setup to initialize some or all of an initial process address space.
A loop like this could be used in a SYSCALL_FORK implementation to initialize some or all of a child process’s address space.
A loop like this could be used in a SYSCALL_EXIT implementation to free process memory.
None of the above.

7. Assembly (10 points)

Intel is trying to sell the next generation of its Fartium Core Trio processor. On this still-terrible processor, the only instructions that work are the ones on this list:

addq %rsi, %rax
cmpq %rsi, %rdi
decq %rsi
incq %rax
je L1
jl L2
jmp L3
leaq (%rax,%rdi,2), %rsi
movl (%rdi,%rax,4), %edi
retq
xchgq %rax, %rdi
xorq %rax, %rax

The instructions, registers, and arguments must match exactly. For example, addq %rsi, %rax works fine, but addq %rdi, %rax will explode the processor.

QUESTION 7A. Which Fartium Core Trio instructions examine the condition flags (i.e., behave differently based on condition flags)? List all that apply (by number or name) or say “none.”

QUESTION 7B. Which Fartium Core Trio instructions read primary memory (not including memory used to store instructions)? List all that apply or say “none.”

QUESTION 7C. In the remaining questions, your job is to correctly implement C++ functions using only the instructions on this list. Your implementations must have the same semantics as the C++ functions, but may perform much worse than one might expect. Your response can be a list of instruction numbers (“2, 11, 9, 10”) or names (“cmpq; xchgq; movl; retq”), intermixed with labels L1–L3 as necessary. Assume that the Fartium Core Trio uses the normal x86-64 calling convention.

int return_one() {
    return 1;
}

QUESTION 7D.

long return_minus_one() {
    return -1;
}

2pt

xorq %rax, %rax
xchgq %rax, %rdi
xorq %rax, %rax
leaq (%rax,%rdi,2), %rsi    # %rsi is now 0
decq %rsi                   # %rsi is now -1
addq %rsi, %rax             # %rax is now 0 + -1 = -1
retq

QUESTION 7E.

long max(long a, long b) {
    if (a > b) {
        return a;
    } else {
        return b;
    }
}

2pt Here’s a good answer:

     cmpq %rsi, %rdi
     jl L4
     # if we get here, `%rdi - %rsi >= 0`, so we should return %rdi
     xchgq %rax, %rdi
     retq
L4:  xorq %rax, %rax
     addq %rsi, %rax
     retq

Or you can use a loop to increase %rdi until it is no less than %rsi!

L3:  cmpq %rsi, %rdi
     jl L2
     xchgq %rax, %rdi
     retq
L2:  xchgq %rax, %rdi
     incq %rax
     xchgq %rax, %rdi
     jmp L3

8. Data representation (15 points)

QUESTION 8A. Given this code:

int array[10000];
int* p1 = &array[1];

Provide initializations of a new variable, p2, so that p2 - p1 has the given values according to the C++ abstract machine, or say that doing so is impossible. Your answer will comprise five initializations, one per value.

p2 - p1 == 0
p2 - p1 == 3
p2 - p1 == -3
p2 - p1 causes undefined behavior
p2 - p1 returns a value dependent on the processor architecture, but does not cause undefined behavior

QUESTION 8B. Which of the marked printf lines in the following code contain violations of the live object rule? List all that apply or say “none.”

#include <cstdio>
int globalArray[5];

int* return_pointer_to_parameter(int x) {
    return &x;
}

int main() {
    ptr = &globalArray[5];
  A:  printf("%d\n", *ptr);
  B:  printf("%p\n", ptr);

    ptr = return_pointer_to_parameter(10);
  C:  printf("%d\n", *ptr);

    ptr = new int(12);
  D:  printf("%d\n", *ptr);

    delete ptr;
  E:  printf("%p\n", ptr);

    ptr = nullptr;
  F:  printf("%d\n", *ptr);

    return 0;
}

QUESTION 8C. The next three questions concern an implementation of pset 1 that allocates memory from a single default_buffer of size 256 bytes.

Write a m61_malloc call that should always fail on this implementation.

QUESTION 8D. Write two or more m61_malloc calls where at least one should always fail on this implementation, even though the total user-accessible memory requested is less than 256 bytes.

QUESTION 8E. Write a sequence of m61_malloc and m61_free calls where at least one of the m61_malloc calls should fail unless the implementation correctly coalesces freed allocations.

QUESTION 8F. The UTF-8/Ethics lecture presented four design ideas for Unicode encodings. They were:

Which of these encodings would cause clear economic harm to some group if it were the only available text encoding, and which would cause clear allocative harm? Explain briefly.

QUESTION 8G. The UTF-8 encoding is defined as follows:

Encode C \leq \texttt{0x7F} in the single byte with value ⟨C⟩
Encode \texttt{0x80} \leq C \leq \texttt{0x7FF} in two bytes:
- Let C_0 equal bits 0–5 of C, and C_1 equal bits 6–10
- Use bytes ⟨\texttt{0xC0} + C_1⟩ ⟨\texttt{0x80} + C_0⟩
Encode \texttt{0x800} \leq C \leq \texttt{0xFFFF} in three bytes:
- Let C_0 equal bits 0–5 of C, C_1 bits 6–11, and C_2 bits 12–15
- Use bytes ⟨\texttt{0xE0} + C_2⟩ ⟨\texttt{0x80} + C_1⟩ ⟨\texttt{0x80} + C_0⟩
Encode \texttt{0x10000} \leq C \leq \texttt{0x10FFFF} in four bytes:
- Let C_0 equal bits 0–5 of C, C_1 bits 6–11, C_2 bits 12–17, and C_3 bits 18–20
- Use bytes ⟨\texttt{0xF0} + C_3⟩ ⟨\texttt{0x80} + C_2⟩ ⟨\texttt{0x80} + C_1⟩ ⟨\texttt{0x80} + C_0⟩

Does the UTF-8 encoding resemble big-endian or little-endian integer representation?

QUESTION 8H. This function intends to encode a Unicode character u into UTF-8 in the character buffer buf. Complete the missing case so that the function works for any u between 0 and 0xFFFF.

// Returns the number of bytes encoded
int encode_utf8(unsigned u, unsigned char* buf) {
    if (u <= 0x7F) {
        buf[0] = u;
        return 1;
    } else if (u <= 0xFFFF) {
        // YOUR CODE HERE
    } else {
        assert(false);
    }
}

9. The end (5 points)

QUESTION 9A. Is there any TF or fellow student you’d like to shout out? Any answer but no answer will receive full credit.

2024 CS 61 Final

Notes

1. Our governing body (15 points)

2. Server synchronization (10 points)

3. System calls (10 points)

4. Testing a shell (15 points)

5. Storage patterns (10 points)

6. Kernel (10 points)

7. Assembly (10 points)

8. Data representation (15 points)

9. The end (5 points)