CS 61
  • Info
    Course Description Course Staff Extension Resources Schedule Setup: GitHub Setup: Docker Setup: VM Style C and C++ Patterns Git Diff GDB Introduction GDB Commands File Descriptors HTTP Test 1 Test 2
  • Problem sets
    Problem set 0 Problem set 1 Problem set 2 Problem set 3 Problem set 4 Problem set 5 Problem set 5 EC: Interruption Problem set 6 Data representation exercises Assembly exercises Kernel exercises Storage exercises Process control exercises Synchronization exercises Miscellaneous exercises
  • Lectures
    Textbook Data representation Data representation 1: Introduction Data representation 2: Addresses Data representation 3: Layout and performance Data representation 4: Performance, computer arithmetic Data representation 5: Undefined behavior, assembly Assembly Kernel Kernel 1: Processes, isolation, robustness Kernel 2: Protected control transfer, virtual memory Kernel 3: Virtual memory and page tables Storage Storage 1: OS input/output, memory hierarchy Storage 2: Cache optimizations and coherence Storage 3: Eviction policies and coherence Process control Process 1: Basics Process 2: Inter-process communication Process 3: Pipe sizes and interrupts Process 4: Timeouts Synchronization Synchronization 1: Threads and atomicity Synchronization 2: Critical sections, serializability, lock granularity Synchronization 3: Lock ordering and condition variables
  • Sections
    Times & locations Datarep Section 0: C++ containers Datarep Section 1: Motel Friend Datarep Section 2: Data representation exercises Assembly Section 1: Fun Assembly Section 2: Assembly exercises Kernel Section 1: WeensyOS Kernel Section 2: Kernel exercises Storage Section 1: Single-slot cache Storage Section 2: Storage exercises Process Section 1: Shell commands Process Supplement: BNF grammars Process Section 2: Process control exercises Synchronization Section 1: Threads and synchronization objects

2025 CS 61 Test 2

Show all solutions Hide all solutions

1. Lonely isolated kernels (15 points)

QUESTION 1A. How do timer interrupts help an operating system kernel enforce process isolation? List all that apply.

  1. Timer interrupts stop processes from writing to kernel-only memory.
  2. Timer interrupts ensure that processes exit if they run for too long.
  3. Timer interrupts ensure that processes yield if they run for too long.
  4. Timer interrupts ensure that the kernel gets many chances per second to make scheduling decisions.
  5. None of the above.

5 points 3 and 4.

Timer interrupts interrupt the processor periodically, based on a timer. They have no impact on memory isolation (so #1 is false). The interrupt gives the kernel control, thereby yielding the process (so #3 is true). The kernel decides what happens next. It might exit a process, but it doesn’t have to—it’s up to kernel policy (so #2 is false). Regardless, it gets a chance to make scheduling decisions (so #4 is true).

A student might interpret “processes yield” as meaning “processes call sys_yield”, and timer interrupts cannot force a system call—the process does yield (stop running), but it doesn’t sys_yield. So ”4 only” is a valid answer.

QUESTION 1B. How do page tables help an operating system kernel enforce process isolation? List all that apply.

  1. Page tables give each process its own view of primary memory.
  2. Page tables ensure that processes cannot access kernel-only memory.
  3. Every page table corresponds to a process, and the pages mapped in that page table are all owned by that process.
  4. Page tables ensure that a process exits if it accesses an address for which the process has no permission.
  5. Page tables ensure that the kernel gets control if a process accesses an address for which the process has no permission.
  6. None of the above.

5 points 1, 2, and 5.

3 is false because kernel_pagetable does not correspond to a process, and because every process page table has the console and the kernel mapped (those pages are not owned by any process). 4 is not as good an answer as 5; often an invalid address should cause a process to exit, but whether or not the process exits is the kernel’s decision. (In fact, once Unit 5 explained different kinds of process termination, we could argue that an invalid address access leads not to exit—which, when being pedantic, refers only to normal termination—but to termination via segmentation fault signal (SIGSEGV), which is abnormal termination.)

QUESTION 1C. How does protected control transfer help an operating system kernel enforce process isolation? List all that apply.

  1. Protected control transfer lets a kernel control the environment which is active when the machine is running with full privilege.
  2. Protected control transfer lets a process communicate with the kernel by causing the kernel to start executing at an arbitrary location.
  3. Protected control transfer lets a process communicate with the kernel by causing the kernel to start executing at a preconfigured location.
  4. Protected control transfer ensures that a process can be restarted exactly where it left off.
  5. None of the above.

5 points 1 and 3, and optionally 4.

It would violate kernel isolation if a process could start the kernel at an arbitrary location (so #2 is false). #4 is true, but one can argue whether that feature is designed to “help an operating system kernel enforce process isolation.” On the pro side, timer interrupts (for example) are critical for enforcing process isolation; timer interrupts involve an involuntary protected control transfer; and some aspects of involuntary protected control transfer (like the way process registers are saved) are critical for restarting processes where they left off. On the anti side, protected control transfer doesn’t on its own ensure that “a process can be restarted exactly where it left off”—a lot of other kernel machinery is involved.

2. Water through a sieve (35 points)

Memory leaks are the bane of WeensyOS. In this problem, you are asked to find and describe leaks from fragments of a WeensyOS kernel written by Sherron Watkins. There are more general questions about kernels too.

QUESTION 2A. First consider Watkins’s implementation of syscall_page_alloc.

int syscall_page_alloc(uintptr_t addr) {
    if (addr < PROC_START_ADDR || addr >= MEMSIZE_VIRTUAL || (addr % PAGESIZE) != 0) {
        return -1;
    }
    void* ptr = kalloc(PAGESIZE);
    if (!ptr || vmiter(current, addr).try_map(ptr, PTE_PWU) < 0) {
        return -1;
    }
    memset(ptr, 0, PAGESIZE);
    return 0;
}

This function is defined in kernel.cc and called (from the kernel’s syscall function) as return syscall_page_alloc(current->regs.reg_rdi).

What statements are true about syscall_page_alloc and its addr parameter? List all that apply.

  1. addr was originally supplied by a WeensyOS process as a system call argument.
  2. addr was passed in the %rdi register, as required by WeensyOS’s system call calling convention.
  3. System calls like page_alloc involve voluntary protected control transfer from processes into the kernel.
  4. The WeensyOS kernel’s system call handler stores process registers like %rdi into the relevant struct proc.
  5. None of the above.

4 points All of 1–4 are true!

QUESTION 2B. Which page table is active on the processor during line 9 of syscall_page_alloc, when the contents of the newly-allocated page are initialized to 0?

2 points The kernel_pagetable

QUESTION 2C. List the statements about ptr that are true whenever syscall_page_alloc returns 0.

  1. reinterpret_cast<uintptr_t>(ptr) is page-aligned (a multiple of PAGESIZE).
  2. reinterpret_cast<uintptr_t>(ptr) == addr.
  3. reinterpret_cast<uintptr_t>(ptr) != addr.
  4. reinterpret_cast<uintptr_t>(ptr) >= PROC_START_ADDR.
  5. None of the above.

3 points 1 only. ptr and addr are independent; though they’re usually unequal, they might be equal. And ptr may be taken from any part of physical memory unused for the kernel or other reserved purposes, which includes many pages with physical addresses < PROC_START_ADDR.

QUESTION 2D. Describe a situation that might cause Watkins’s syscall_page_alloc to leak at least one page of memory.

4 points Here are two such situations:

  1. If vmiter::try_map fails, then Watkins’s implementation leaks ptr.
  2. If a process allocates a new page over an already-existing page, Watkins’s implementation leaks the already-existing page.

QUESTION 2E. Now consider Watkins’s implementation of syscall_exit. (Her implementation takes the process to exit as an argument.)

void syscall_exit(proc* p) {
    for (uintptr_t addr = PROC_START_ADDR; addr < MEMSIZE_VIRTUAL; addr += PAGESIZE) {
        vmiter it(p, addr);
        if (it.present()) {
            kfree(it.kptr());
        }
    }
    kfree(reinterpret_cast<void*>(p->pagetable));
    p->state = P_FREE;
}

Watkins’s syscall_exit has no special case for CONSOLE_ADDR. Does it need one? Explain briefly.

3 points It doesn’t need one. Watkins only frees pages that are mapped at or above PROC_START_ADDR. This is reasonable, since user process memory is all located at or above PROC_START_ADDR. But CONSOLE_ADDR < PROC_START_ADDR.

QUESTION 2F. What kind of memory does Watkins’s syscall_exit leak, and how would you fix this leak? Explain briefly.

4 points Watkins’s syscall_exit leaks level-1, level-2, and level-3 page table pages. A ptiter loop is required to find and free these pages.

QUESTION 2G. If you remove one line of code from Watkins’s syscall_exit, it will start to leak process descriptors, eventually causing syscall_fork to fail regardless of memory availability. Which line?

3 points Line 9, p->state = P_FREE;.

QUESTION 2H. Now consider Watkins’s implementation of syscall_fork.

int syscall_fork() {
    pid_t pid = free_pid(); // finds a free ptable slot; returns -1 if none
    if (pid < 0 || pid >= MAXNPROC) {
        return -1;
    }

    ptable[pid].pagetable = kalloc_pagetable();
    if (!ptable[pid].pagetable) {
        return -1;
    }

    for (uintptr_t addr = 0; addr < PROC_START_ADDR; addr += PAGESIZE) {
        vmiter parit(current, addr);     // parent iterator
        vmiter chit(&ptable[pid], addr); // child iterator
        if (chit.try_map(parit.pa(), parit.perm()) < 0) {
            return -1;
        }
    }

    for (uintptr_t addr = PROC_START_ADDR; addr < MEMSIZE_VIRTUAL; addr += PAGESIZE) {
        vmiter parit(current, addr);
        if (!parit.present()) {
            continue;
        }
        void* ptr = kalloc(PAGESIZE);
        if (!ptr) {
            return -1;
        }
        // ... rest of function elided ...

What kinds of memory might be leaked if Watkins’s syscall_fork returns on line 4? List all that apply.

  1. Page table pages
  2. The console
  3. Kernel memory
  4. Memory containing a process code segment
  5. Memory containing a process data segment
  6. Memory containing process heap pages
  7. Memory containing a process stack
  8. None of the above

3 points 8 (none of the above). No memory is leaked at that return point.

QUESTION 2I. What kinds of memory listed in Question 2H might be leaked if Watkins’s syscall_fork returns on line 9? List all that apply.

3 points 8 (none of the above); again no memory is leaked.

QUESTION 2J. What kinds of memory listed in Question 2H might be leaked if Watkins’s syscall_fork returns on line 16? List all that apply.

3 points 1. The page table is leaked. No process memory is leaked because no process memory has been allocated at this point. Kernel memory and the console are not allocated by processes and cannot be leaked.

QUESTION 2K. What kinds of memory listed in Question 2H might be leaked if Watkins’s syscall_fork returns on line 27? List all that apply.

3 points 1, 4, 5, 6, 7. The page table is leaked, and any memory previously allocated for the process might be leaked too.

3. A garland of caches (20 points)

QUESTION 3A. Match each type of cache (1–4) with a read request that cache might use if an access causes a cache miss (A–F). You will use each read request at most once.

    1. Processor cache
    2. Translation lookaside buffer
    3. Buffer cache
    4. Stdio cache
    1. System call
    2. Drive READ command
    3. Instruction fetch
    4. Page table walk
    5. Cache line load
    6. None of the above

4 points 1–E, 2–D, 3–B, 4–A

QUESTION 3B. Write a brief narrative explaining how executing a single line of C++ code in a process, such as int ch = fgetc(stdin), might activate all of these caches (i.e., (1) processor cache, (2) translation lookaside buffer, (3) buffer cache, and (4) stdio cache).

8 points The line fgetc(stdin) gets compiled into instructions, which are located in primary memory and addressed by virtual addresses. Executing the instruction requires first translating the virtual address into a physical address, and the translation lookaside buffer (2) caches those translations. Once the translation succeeds, the processor needs to access the corresponding data, which involves the processor cache (1). fgetc is a stdio library call, so the compiled function will check the stdio cache (4) to see if a character is available. If there isn’t a character, stdio will call out to the kernel via a read system call, and the kernel may respond by accessing the buffer cache (3).

QUESTION 3C. You’re powering through pset 4 with the aid of a secret snack drawer that can hold up to 3 types of snack at a time. This snack drawer operates as a 3-slot snack cache for the dining hall. You hunger for these snacks in this order:

  1. Apple slice
  2. Blueberry
  3. Chip
  4. Apple slice
  5. Dosa bite
  6. Blueberry
  7. Egg (hard-boiled)
  8. Apple slice
  9. Blueberry
  10. Chip

(So the reference string is ABCADBEABC.)

If you use the optimal eviction policy to manage your snack drawer, which snacks will be evicted from your drawer (i.e., thrown out and replaced with a different snack)? List the evicted snacks in order by time; you may use letters instead of names.

8 points for C-D Give at least 5 points if either C or D is correct.

CDE: Chip (replaced with Dosa bite), Dosa bite (replaced with Egg), Egg (replaced with Chip). The evicted snacks are the snacks that precede the circled misses in this cache diagram.

A B C A D B E A B C
1 Ⓐ A A A A A A A A A
2 Ⓑ B B B B B B B B
3 Ⓒ C Ⓓ D Ⓔ E E Ⓒ

QUESTION 3D. If you eat the same snacks in the same order but manage your snack drawer with the least-recently-used (LRU) policy, which snacks will be evicted from your drawer? List the evicted snacks in order by time; you may use letters instead of names.

BCADE: Blueberry (replaced with Dosa bite), Chip (replaced with Blueberry), Apple slice (replaced with Egg), Dosa bite (replaced with Apple slice), Egg (replaced with Chip)

A B C A D B E A B C
1 Ⓐ A A A A A Ⓔ E E Ⓒ
2 Ⓑ B B Ⓓ D D Ⓐ A A
3 Ⓒ C C Ⓑ B B B B

4. Wrestling with stdio (30 points)

Linda McMahon is working on problem set 4, specifically on the combination of single-character writes and seeks. Here’s the relevant part of her code, which is based on the single-slot cache we presented in section:

struct io61_file {
    int fd;
    static constexpr off_t bufsize = 4096;
    unsigned char cbuf[bufsize];
    off_t tag;
    off_t end_tag;
    off_t pos_tag;
};

int io61_writec(io61_file* f, int ch) {
    // flush to make room if character doesn’t fit
    if ((f->pos_tag < f->tag || f->pos_tag >= f->tag + f->bufsize)
        && io61_flush(f) < 0) {
        return -1;
    }
    f->cbuf[f->pos_tag - f->tag] = ch;
    ++f->pos_tag;
    if (f->end_tag < f->pos_tag) {
        f->end_tag = f->pos_tag;
    }
    return 0;
}

int io61_flush(io61_file* f) {
    if (f->tag < f->end_tag) { // flush cache buffer
        ssize_t nw = write(f->fd, f->cbuf, f->end_tag - f->tag);
        if (nw != f->end_tag - f->tag) {
            return -1;
        }
        f->tag = f->end_tag;
    }
    if (f->tag != f->pos_tag) { // move to requested file position
        off_t off = lseek(f->fd, f->pos_tag, SEEK_SET);
        if (off == -1) {
            return -1;
        }
        f->tag = f->end_tag = f->pos_tag;
    }
    return 0;
}

int io61_seek(io61_file* f, off_t off) {
    f->pos_tag = off; // delay actual seek to next flush or write
    return 0;
}

(Recall that io61_writec writes the character to the cache, returning -1 on error; and io61_flush writes all cached data to the underlying file descriptor, returning -1 if not all cached data could be written.)

QUESTION 4A. Parts A through C ask you what system calls McMahon’s library would make for different test programs. In each case, f->fd == 1 and f->tag, f->end_tag, and f->pos_tag are all initially 0. You may assume that every system call returns the value expected by the library (so io61_writec and io61_flush never return -1). As an example, this test program:

io61_writec(f, 'A');
io61_writec(f, 'B');
io61_flush(f);

would make one system call, write(1, "AB", 2).

What system calls would McMahon’s library make for this test program?

for (int i = 0; i != 5096; ++i) {
    io61_writec(f, '6');
}
io61_flush(f);

3 points

write(1, "666..." (4096 characters), 4096);
write(1, "666..." (1000 characters), 1000);

QUESTION 4B. Under the same assumptions, what system calls would McMahon’s library make for this test program?

io61_writec(f, 'H');
io61_flush(f);
io61_seek(f, 1);
io61_writec(f, 'i');
io61_flush(f);

3 points

write(1, "H", 1);
write(1, "i", 1);

QUESTION 4C. Under the same assumptions, what system calls would McMahon’s library make for this test program?

for (int i = 0; i != 3; ++i) {
    io61_writec(f, 'X');
}
io61_seek(f, 5000);
for (int i = 0; i != 3; ++i) {
    io61_writec(f, 'Y');
}
io61_flush(f);

3 points

write(1, "XXX", 3);
lseek(1, 5000, SEEK_SET);
write(1, "YYY", 3);

QUESTION 4D. Does McMahon’s library handle short writes correctly? Explain briefly.

4 points It does not. A short write happens when a write system call returns less than the number of characters requested; in McMahon’s code, a short write in io61_flush will cause that function to return an inappropriate error. Correctly handling short writes requires a loop, such as this:

if (f->tag < f->end_tag) {
    off_t wtag = f->tag;
    while (wtag < f->end_tag) {
        ssize_t nw = write(f->fd, &f->cbuf[wtag - f->tag], f->end_tag - wtag);
        if (nw > 0) {
            wtag += nw;
        } else if (nw == 0 || errno == EINTR) {
            return -1;
        }
    }
    f->tag = f->end_tag;
}

QUESTION 4E. True or false: McMahon’s cache is correct for sequential writes (with the possible exception of short writes). Explain briefly if unsure.

4 points True. If all writes are sequential, then all io61_writec calls happen when pos_tag == end_tag, and McMahon’s cache will behave exactly like the cache from the section material.

QUESTION 4F. True or false: McMahon’s cache is efficient for sequential writes (with the possible exception of short writes). Explain briefly if unsure.

4 points True. It will make the same batched write system calls as stdio.

QUESTION 4G. Which of these non-sequential access patterns will McMahon’s cache handle correctly (assuming no short writes)? List all that apply. All block writes are executed with repeated io61_writec calls.

  1. Repeatedly writing the same byte (writec(ch); seek(0); writec(ch); etc.)
  2. Strided access with block size 1 and stride 5000
  3. Strided access with block size 1 and stride 1000
  4. Strided access with block size 4096 and stride 1000000
  5. Reverse sequential access
  6. Random access
  7. None of the above

3 points 1, 2, 4, and 5. McMahon’s cache can make mistakes for non-sequential access patterns. Specifically, it will write bogus data if a seek jumps within 4095 bytes of tag, but outside the range of already-written characters [tag, end_tag].

QUESTION 4H. Which of the non-sequential access patterns from the previous question offer good opportunities for write coalescing (whether or not McMahon’s cache implements that optimization)? List all that apply.

3 points Especially 1, but also 5.

QUESTION 4I. Change one line of code so that McMahon’s cache is correct for all listed non-sequential access patterns (with the possible exception of short writes).

3 points Change line 12 to

if ((f->pos_tag < f->tag || f->pos_tag > f->end_tag || f->pos_tag == f->tag + f->bufsize)

5. The end (5 points)

QUESTION 5A. How should this class adapt to AI coding assistants? Any answer but no answer will receive full credit.

5pts We saw an impressive variety of answers. Some popular categories were:

  1. Release a tool like cs50.ai. Probably the most popular answer—and a good one! We’ll look into doing this for next year.
  2. Hold a section or discussion on how best to use AI for CS 61. I was surprised by this one and we’ll update the class next time around. One step: if you use ChatGPT, turn on study mode.
  3. Have more pset checkins.
  4. Make students explain their code to staff in person.
  5. Ban AIs entirely.
  6. Allow AIs without restriction.

Here are some common suggestions with which I fundamentally disagree.

  1. Refocus the problem sets on “conceptual understanding” or “theory”, because LLMs will eventually take care of all the details.—This framing goes exactly counter to my experience. I think it’s easy to convince oneself that one has “conceptual understanding” whether or not one can actually solve any concrete problem. You get better at understanding concepts by solving concrete problems.
  2. Require students write explanations of their code.—I find LLMs even better at generating superficially plausible explanations than code. The explanations may be bogus sometimes, but LLMs are text generating machines.