Section 6: Virtual memory

The pset for this unit concerns our tiny WeensyOS operating system. Our section concerns x86-64 virtual memory and WeensyOS’s internal interfaces for dealing with virtual memory. It will be easiest to follow if you are familiar with the problem set specification.

The coding exercises in this section use the cs61-sections/kernels1 subdirectory. This version of WeensyOS uses identity mappings; all processes share the same page table (the kernel page table), but the kernel is isolated. This resembles the state of your pset 3 after step 1.

What is virtual memory?

Virtual memory is the processor feature that supports process isolation for primary memory (the part of the computer that stores data and has addresses). It allows different processes to have completely different views of memory. For instance, the data stored at address 0x100000 in the two processes might be totally different, even though the addresses are the same.

Virtual memory is a very powerful technique with many possible use cases, and when originally developed it was not used primarily for isolation, but isolation is its most important use case now.

Part A: Page alignment

x86-64 virtual memory is paged, meaning memory is managed in units of pages—aligned groups of 2¹² = 4096 bytes. The mapping data structure implemented by vmiter is therefore called a page table. A virtual page is a page-aligned sequence of 2¹² contiguous virtual addresses, and a physical page is a page-aligned sequence of 2¹² contiguous physical addresses.

Paged virtual memory has two features:

Each virtual page maps arbitrarily to any physical page (or to a fault).
But within each page, each address maps onto the corresponding physical address in fixed fashion, relative to the page mapping.

Mathematically, a valid x86-64 virtual memory mapping function \mathscr{P} : \textit{VA} \mapsto \textit{PA} from virtual addresses to physical addresses must be piecewise linear on aligned pages (units of 2¹² bytes). It will meet these requirements:

\mathscr{P}(p \ll 12) = x \ll 12 for some integer x, and
\mathscr{P}((p \ll 12) + o) = \mathscr{P}(p \ll 12) + o whenever 0\leq o <2^{12}.

EXERCISE A1. Assume \mathscr{P}_1 is a virtual memory mapping function having:

\mathscr{P}_1(0) = 0

\mathscr{P}_1(10) = 10

\mathscr{P}_1(\text{0x3009}) = \text{0x19009}

Can \mathscr{P}_1 be a valid x86-64 virtual memory mapping function?

Yes; it meets both requirements.

EXERCISE A2. Assume \mathscr{P}_2 is a virtual memory mapping function having:

\mathscr{P}_2(4095) = 8193

\mathscr{P}_2(4096) = 0

Can \mathscr{P}_2 be a valid x86-64 virtual memory mapping function?

No! 4095 can also be written 0xFFF, so \mathscr{P}_2(\text{0xFFF}) = \mathscr{P}_2((0 \ll 12) + \text{0xFFF}) must equal (x \ll 12) + \text{0xFFF} for some x. But it doesn’t, because 8193 = 0x2001 has lower 12 bits 0x001, not 0xFFF.

EXERCISE A3. Given this valid x86-64 virtual memory mapping function:

\mathscr{P}_3(0) = \text{0x1F000}

\mathscr{P}_3(\text{0x4019}) = \text{0x19}

\mathscr{P}_3(\text{0x1000FF}) = \text{0x500FF}

write a series of vmiter calls that add the corresponding mappings with permissions PTE_P|PTE_W to a page table pt3. (Remember that vmiter::map requires page-aligned addresses.)
vmiter(pt3, 0).map(0x1F000, PTE_P|PTE_W);
vmiter(pt3, 0x4000).map(0, PTE_P|PTE_W);
vmiter(pt3, 0x100000).map(0x50000, PTE_P|PTE_W);

EXERCISE A4. Assume \mathscr{P}_4 is a virtual memory mapping function having:

\mathscr{P}_4(0) = \text{0x1F000}

\mathscr{P}_4(\text{0x1000}) = \text{0x1F000}

\mathscr{P}_4(\text{0x2000}) = \text{0x1F000}

Can \mathscr{P}_4 be a valid x86-64 virtual memory mapping function?

Sure!! There’s no requirement that virtual pages must map to different physical pages.

EXERCISE A5. Assume that x is a page-aligned object of size sz > 0. How many virtual pages does that object overlap?

\lceil \texttt{sz} / 2^{12} \rceil

EXERCISE A6. Assume that x is a page-aligned object of size sz > 0. How many physical pages does that object overlap (ignoring faults)?

This is more complicated! Given a virtual memory mapping like that in exercise B4, the answer might be 1 regardless of sz. The truest answer is some p having 1 \leq p \leq \lceil \texttt{sz} / 2^{12} \rceil.

EXERCISE A7. Assume that x is a possibly non-page-aligned object of size sz > 0. How many virtual pages does that object overlap? You may assume that x’s virtual address is va.

Either \lceil \texttt{sz} / 2^{12} \rceil or \lceil \texttt{sz} / 2^{12} \rceil + 1. Precisely, it is:

\left\lceil (\texttt{va} + \texttt{sz}) / 2^{12} \right\rceil - \left\lfloor \texttt{va} / 2^{12} \right\rfloor.

Part B: Examining mappings and permissions

Update your cs61-sections repository and change into the kernels1 subdirectory. Run make run-hello to see the exciting result of the p-hello.cc program!

vmiter and ptiter are iterators in WeensyOS for processing page tables (the in-memory structures used by the x86-64 processor to implement virtual memory). vmiter answers the question “In the context of page table pt, what does the virtual address va map to in terms of physical address and permissions?” It also lets you modify mappings, thereby changing a process’s view of memory. ptiter answers the question “Which physical pages are used to represent this page table?” This lets you free a page table relatively easily.

EXERCISE B1. Let’s use vmiter and log_printf to log information about kernel_pagetable’s mappings for the following virtual addresses:

The syscall_entry function;

The kernel_pagetable;

The p-hello process’s entry point, process_main.

First log the physical addresses mapped for these virtual addresses immediately before the call to process_setup. What are they?
We know we can use vmiter to find a physical address corresponding to a virtual address. But how can we find the virtual addresses for these entities? One way is just to ask GDB, which knows the addresses for things.
cs61-user@9d076324193b:~/cs61-sections/kernels1$ make run
* Run `gdb -x build/weensyos.gdb` to connect gdb to qemu.
...
kohler@elsewhere$ gdb -x build/weensyos.gdb
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
...blah blah blah...
(gdb) p syscall_entry
$1 = {<text variable, no debug info>} 0x40ad6 <syscall_entry()>
(gdb) p process_main
$2 = {void (void)} 0x100000 <process_main()>
(gdb) p kernel_pagetable
$3 = 0x59000 <kernel_pagetable>
(gdb) 
(You may see different addresses.) Another way is to take advantage of the symbol and assembly files created in the obj/ directory. If you’re not sure what file to check, use grep to check them all:
cs61-user@9d076324193b:~/cs61-sections/kernels1$ grep syscall_entry obj/*
Binary file obj/kernel matches
obj/kernel.asm:0000000000040ad6 <syscall_entry()>:
obj/kernel.asm:    wrmsr(MSR_IA32_LSTAR, reinterpret_cast<uint64_t>(syscall_entry));
Binary file obj/kernel.full matches
obj/kernel.sym:0000000000040ad6 T _Z13syscall_entryv
Binary file obj/k-exception.ko matches
Binary file obj/k-hardware.ko matches
However you get these addresses, you discover their mappings using vmiter::pa:
log_printf("syscall_entry pa: %p\n", vmiter(kernel_pagetable, 0x40ad6).pa());
log_printf("kernel_pagetable pa: %p\n", vmiter(kernel_pagetable, 0x59000).pa());
log_printf("process_main pa: %p\n", vmiter(kernel_pagetable, 0x100000).pa());
Unfortunately, adding log_printf calls and recompiling can move the symbols around! After recompiling, make sure you use obj/kernel.sym, or gdb, to check that the addresses haven’t changed. If they do change, one more round of editing should stabilize them.

We see (less log.txt after make run-hello):
syscall_entry pa: 0x40ad6
kernel_pagetable pa: 0x59000
process_main pa: 0x100000
As advertised, the kernel page table is identity mapped. (You may see different results for syscall_entry and kernel_pagetable, but your process_main should match exactly.)

You can avoid hard-coding constants into the kernel by referring to the actual funtions and objects. The tricky bit is process_main; we extract that address by reading it from p-hello’s program image.
log_printf("syscall_entry pa: %p\n", vmiter(kernel_pagetable, (uintptr_t) syscall_entry).pa());
log_printf("kernel_pagetable pa: %p\n", vmiter(kernel_pagetable, (uintptr_t) kernel_pagetable).pa());
program_image pgm("hello"); // "hello" = name of process
log_printf("process_main pa: %p\n", vmiter(kernel_pagetable, pgm.entry()).pa());

EXERCISE B2. Now log the physical addresses immediately after the call to process_setup. Do the values change? Why or why not? Refer to specific properties of process_setup.

They don’t change. In this simple version of WeensyOS, all process pages are identity mapped. The process_setup function changes the permissions associated with specific mappings, but not the physical addresses to which they are mapped.

EXERCISE B3. Log the permissions associated with those virtual addresses immediately before the call to process_setup. What are they? What permissions do they correspond to? (Look for the PTE_ constants in x86-64.h to understand the meaning of each permission bit.)

They are all 0x3, which equals PTE_P | PTE_W: user-inaccessible.

EXERCISE B4. What about these permissions indicate that the kernel appears to implement kernel isolation (in terms of virtual memory access)?

The kernel is keeping its code and data isolated from process code by removing the PTE_U bit from most memory mappings.

EXERCISE B5. Log the permissions associated with those virtual addresses immediately after the call to process_setup. Did they change?

Indeed they did! The process_main permission has PTE_U now. Also, oddly, some more bits appear to have been spontaneously added to the permissions for syscall_entry and process_main. These bits—PTE_A and PTE_D—are set automatically by the hardware to indicate that a page of memory has been Accessed (that is, read) and Dirtied (that is, modified). Some virtual memory features make use of this information.

Part C: Warped virtual memory

Run make run-bigdata. Boo!

EXERCISE C1. Add one line of virtual memory map manipulation code to process_startup so that the bigdata process prints CS 61 Is Amazing. Don’t change p-bigdata.cc or the contents of physical memory, just change memory mappings.
p-bigdata initially prints CS 61 Is Awful. But notice that the big_data buffer is page-aligned, and the string is placed exactly 4086 bytes into that buffer: the first part of the string and the last part are on different virtual pages. Using colors to mark the page boundary:
CS 61 Is Awful
What if we change the process’s virtual memory mappings so that both virtual pages of the buffer map to the same physical memory? For instance, if we map virtual address 0x102000 to physical address 0x103000, then memory will look like this after running the two calls to strcpy:
(va 0x102000/pa 0x103000): mazing
...
(va 0x102ff6/pa 0x103ff6): CS 61 Is A
(va 0x103000/pa 0x103000): mazing
This looks good! When we print the bytes at virtual address 0x102ff6, we will see CS 61 Is Amazing. Remember, virtual address 0x103000 is still mapped to physical address 0x103000 because we didn’t modify that mapping!

The following line of code added to process_setup performs this mapping:
vmiter(p->pagetable, 0x102000).map(0x103000, PTE_P|PTE_W|PTE_U);
If we didn't want to rely on the fact that the pagetable is an identity mapping, we could look up the physical address of virtual address 0x103000 and use that for the mapping:
vmiter(p->pagetable, 0x102000).map(vmiter(p->pagetable, 0x103000).pa(), PTE_PWU);

Make sure you remove your VM manipulation code before moving on to another problem.

Part D: Maze

Virtual memory can feel like you’re in a maze of twisty little passages, all alike. Here’s a game that’s a (very weak) metaphor for paged virtual memory. The point of the game is to show how different starting points—different “page tables”!—can provide very different views of the world.

Virtual memory map maze

You’ve been teleported into the weird dungeon above and you may only move according to the rules. You get to pick a starting room (a number from 1 to 9) and a path, which is a bitstring of a given length (for instance, 01). Then you walk through the maze starting from your room by taking the exits indicated by your bits in sequence:

+-----+
|  X  |  <- exit to take for bit 0
|  Y  |  <- exit to take for bit 1
+-----+

For instance, if you start in room #1 with bitstring 00, you will end up in room #8. (The 0 exit from #1 leads to #3. The 0 exit from #3 leads to #8.) There are death pits hanging off #3 and #9.

EXERCISE D1. Give a starting room from which you can reach all blue and red rooms (#6–9) in exactly two steps.

Only room #2 works. 00 leads to #6, 01 to #8, 10 to #7, and 11 to #9.

EXERCISE D2. Give a starting room from which you can reach only blue rooms (#6 and #8) and death pits in exactly two steps.

Only room #1 works. 00 leads to #8, 01 to a pit, 10 to #6, and 11 to #8.

EXERCISE D3. Give the most boring starting room, which is the room from which you can reach the minimum number of other rooms in exactly two steps.

#7: you can reach only room #7 from there.

EXERCISE D4. Which rooms cannot be reached in exactly two steps, no matter where you start?

#1, #2, #3, and #5. If you start in #4, you can reach #4 with string 01.

EXERCISE D5. Now you can change the maze by sending out drones. A drone firsts travel two steps according to a bitstring. When it arrives, it re-routes one of that room’s exits to point to another arbitrary room (or a death pit). (This is vaguely like modifying a page table.)

Assume you start in room #9. How many drones would it take for a drone to reach room #1 in exactly two steps? What are those drones’ instructions?

Three drones:

Traverse 01 and modify exit 0 to point to room #1. (This traverses to #7 and sets its first exit.)

Traverse 11 and modify exit 0 to point to room #7. (This traverses to #9—the starting room—and sets its first exit.)

Traverse 00. The first step reaches #7, the second step reaches #1.

Your drones might have different instructions, but it always takes at least three drones.

Part E: Using iterators

In the problem set, you will implement fork and exit system calls. At the center of these system calls are operations on virtual memory that you’ll use vmiter and ptiter to implement. In section we investigate related problems for practice.

EXERCISE E1. Use vmiter to implement the following function.

void copy_mappings(x86_64_pagetable* dst, x86_64_pagetable* src) {
    // Copy all virtual memory mappings from `src` into `dst`
    // for addresses in the range [0, MEMSIZE_VIRTUAL).
    // You may assume that `dst` starts out empty (has no mappings),
    // and that all calls to `vmiter::map` succeed.

    // After this function completes, for any va with
    // 0 <= va < MEMSIZE_VIRTUAL, dst and src should map that va
    // to the same pa with the same permissions.

    ... 
}

The key here is to use two vmiter objects with synchronized addresses, and to copy the physical addresses and permissions from one vmiter into the other.

void copy_mappings(x86_64_pagetable* dst, x86_64_pagetable* src) {
    vmiter srcit(src, 0);
    vmiter dstit(dst, 0);
    for (; srcit.va() < MEMSIZE_VIRTUAL; srcit += PAGESIZE, dstit += PAGESIZE) {
        dstit.map(srcit.pa(), srcit.perm());
    }
}

The fork system call starts up a new process, called the child process, that is essentially a copy of the parent process (the process that called fork). The child process has a copy of the parent process’s memory, as well as its registers and other state. Changes to the child process’s memory should not be visible in the parent’s memory and vice versa.

EXERCISE E2. Does copy_mappings suffice to implement the memory copying required by fork? Why or why not?

It isn’t suitable on its own, because it copies mappings, not memory. When copy_mappings completes, the destination page table dst maps the same physical memory as the source page table src, rather than a copy of the source’s visible memory. A full fork solution will have some features of copy_mappings as well as other code to handle copying memory data.

EXERCISE E3. Use vmiter and ptiter to implement the following function.

void free_everything(x86_64_pagetable* pt) {
    // Free the following pages by passing their kernel pointers
    // to `kfree(void*)`:
    // 1. All memory accessible via unprivileged mappings in `pt` from virtual
    //    addresses in [0,MEMSIZE_VIRTUAL).
    // 2. All page table pages that are part of `pt`.
    ...
}

void free_everything(x86_64_pagetable* pt) {
    for (vmiter it(pt, 0); it.va() < MEMSIZE_VIRTUAL; it += PAGESIZE) {
        if (it.user()) {
            kfree(it.kptr());
        }
    }
    for (ptiter it(pt); !it.done(); it.next()) {
        kfree(it.kptr());
    }
    kfree(pt);
}

The exit system call allows a process to stop executing. All memory belonging to or representing that process must be returned to the kernel for reuse. This will include some memory that is accessible only to the kernel, such as memory storing the process’s page table.

EXERCISE E4. Does free_everything suffice to implement the memory freeing required by exit? Why or why not?

Yes, with one exception: It’s most likely a mistake to free the user-visible page at 0xB8000, because that is the console—the shared memory representing the screen—and it was never allocated.

Part F: Spawn

The remaining three part won’t be covered in section by default. They are optional practice to prepare for the problem set or exams. We suggest groups of students get together and work through them on their own.

In the first part, we start a new process in WeensyOS. Starting a new process requires a couple things: choosing a struct proc member of the ptable array, initializing memory, initializing registers, and marking the process as runnable. The fork system call in pset 3 step 5 starts a process by copying an already-running process, but other interfaces are possible. The spawn interface, for example, starts a new process from scratch.

EXERCISE F1. The p-spawn.cc process tries to start another process—a copy of the p-alice program—by calling sys_spawn("alice"). Use make run-spawn (or make run-console-spawn, etc.) to run this program.

The system call is not working. How can you tell?

Complete the system call implementation so that p-spawn.cc successfully starts a new process.

Modify p-spawn.cc so that it tries to start two copies of alice. What happens and why? How could you fix this? Refer to the steps of pset 3.

Does your implementation of syscall_spawn check its arguments for validity? If not, how would you change it to do this?

Part G: Confused deputy

A confused deputy attack occurs when a low-privilege attacker convinces a privileged “deputy” to complete an attack on its behalf. In the context of operating systems, a process is unprivileged, while the kernel has full privilege. However, a system call asks the kernel to perform an operation on behalf of a process, making the kernel act as a privileged deputy. A confused deputy attack occurs if the process, by invoking system calls, can somehow convince the kernel to perform a function that violates process isolation.

EXERCISE G1. p-eve.cc and p-alice.cc may be familiar from class. Use make run-friends to run Alice and Eve together.

It is possible to change one argument of one system call in p-eve.cc to execute a successful confused deputy attack that breaks the kernel or otherwise prevents Alice from running. What is the system call? What is the confused deputy attack?

Complete the attack.

Modify the kernel’s system call checking to prevent the attack. The kernel might kill the Eve process upon detecting the attack, or (better perhaps) return -1 from the relevant system call.

Part H: Efficient communication

The central theme of Unit 3 is isolation and security, but kernels also can offer design features that transform overall system performance.

The p-pipewriter.cc and p-pipereader.cc programs implement a simple form of communication, mediated by the kernel. The pipewriter program picks a random message and writes it to a shared “pipe,” which is just a memory buffer located in the kernel. The pipereader program reads this message by reading from the pipe.

EXERCISE H1. Use make run-pipe to run the pipewriter and pipereader together.

Read the explanations for the sys_piperead and sys_pipewrite system calls in u-lib.hh. What do these system calls remind you of? How do their parameters differ from the parameters to analogous functions in Unix/Linux?

Find the implementations of SYSCALL_PIPEWRITE and SYSCALL_PIPEREAD in kernel.cc. What is the maximum number of bytes these system calls can read or write at a time? How might this be a performance problem?

How many system calls do the writer and reader make per message? Is there a discrepancy? Look at their code; can you explain the discrepancy?

EXERCISE H2. Change the kernel, and only the kernel, to reduce the number of system calls required to transfer a message to the minimum, which is less than five. Here are some techniques you can use:

Scheduling. When process P is unlikely to be doing useful work, it often makes sense to run a process other than P. The handout kernel does not follow this rule of thumb.

Batching/buffering. It is more efficient to transfer more than one byte at a time.

Blocking. When process P cannot make progress until some state changes, it can be more efficient overall to put process P to sleep. This is called blocking process P. For instance, perhaps the sys_piperead system call might block the calling process until there is at least one byte available to read.