Problem set 3

In this assignment, you will add new features to a tiny operating system! In particular, you will implement process memory isolation, virtual memory, and some system calls.

Due Thu 10/17 11:59pm EDT (one day later for DCE)
Please mark your final submission commit with “Grade this commit” on the grading server.

You may want to read Chapter 9 of the textbook. Specifically, the 64-bit x86 virtual memory architecture is described in Section 9.7; the PTE_P, PTE_W, and PTE_U bits are shown in Figure 9.23 and discussed in Section 9.7.1.

Get the code

Get our code with

$ git pull; git pull handout main

or, alternately

$ git pull; git pull https://github.com/cs61/cs61-f24-psets.git main

This will merge our Problem Set 3 code with your previous work. If you have any “conflicts” from prior problem sets, resolve them before continuing further. Run git push to save your work back to your personal repository.

You may also create a new cs61-psets repository for this assignment. Don’t forget to enter your repository URL on the grading server.

Running WeensyOS

To launch WeensyOS, open a Docker container with ./cs61-run-docker, then cd pset3; make run. You should see something like this, which shows four related p-allocator processes running in parallel:

Initial WeensyOS state

The image above loops forever; in an actual run on your real computer, the bars will move to the right and stay there. Don’t worry if your image has different numbers of K’s or otherwise has different details. (If your bars run painfully slowly, edit the p-allocator.cc file and reduce the ALLOC_SLOWDOWN constant.)

The WeensyOS documentation has more information on how to build WeensyOS and how to run it in different modes, including modes that output more debugging information. Some highlights:

It is much easier to work on this pset in Docker. The pset will not build on native Mac OS X or Windows.
We do not recommend storing your pset files in a directory that is automatically backed up to the cloud (Microsoft OneDrive, Apple iCloud). Automatic backups are likely to disrupt the functioning of the QEMU emulator that runs WeensyOS.
Only one QEMU will run at a time. If make run reports an error like Failed to get "write" lock, this means that some other tab is running the QEMU emulator in the same directory. Try make stop to kill that other tab and then make run again.

Initial state

Let’s take a look at the code in p-allocator.cc and understand what it does, and why it results in the above display.

WeensyOS displays the current state of physical and virtual memory. Each character represents 4 KiB of memory (i.e., a single x86-64 page). There are 2 MiB of physical memory in total. (How many pages is this?)
WeensyOS runs four processes, 1 through 4. Each process runs a different program. The four programs are compiled from the same source code (p-allocator.cc), but for each program the compiler is told to use a different region of memory for its text and data segments.
Each process asks the kernel for more heap space, one page at a time, until it runs out of room. Each process’s heap begins just above its code and global data, and ends just below its stack. The processes allocate space at different rates: compared to Process 1, Process 2 allocates space twice as quickly, Process 3 goes three times faster, and Process 4 goes four times faster. (A random number generator is used, so the exact rates may vary.) The marching rows of numbers show how quickly the heap spaces for processes 1, 2, 3, and 4 are allocated.

Here are two labeled memory diagrams, showing what the characters mean and how memory is arranged.

Physical memory map 1

Physical memory map 2

The virtual memory display is similar.

The virtual memory display cycles between the four processes' address spaces. However, all the address spaces are the same for now. In other words, in the handout version of WeensyOS, all processes live in the same address space-there is no memory isolation between processes! Furthermore, the kernel shares that address space too, meaning that there is no memory isolation between the kernel and user-mode processes!
Blank spaces in the virtual memory display correspond to unmapped addresses. If a process (or the kernel) tries to access such an address, the processor will page fault.
The character shown at address X in the virtual memory display identifies the owner of the corresponding physical page.
In the virtual memory display, a character is reverse video (i.e., black foreground and colored background) if an application process is allowed to access the corresponding address. Initially, any process can modify all of physical memory, including the kernel. This means that memory is not properly isolated!

Goal

You will implement complete and correct memory isolation for WeensyOS processes. Then you'll implement full virtual memory, which will improve utilization of physical RAM. You'll also implement fork—creating new processes at runtime—and exit—destroying processes at runtime.

We need to provide a lot of support code for this assignment, but the code you write will all go in kernel.cc.

There is no make check functionality for this pset. Instead, you should run your instance of WeensyOS and visually compare it to the images in the pset description.

Notes

Memory system layout

WeensyOS memory system layout is described by several constants.


`KERNEL_START_ADDR`	Start of kernel code.
`KERNEL_STACK_TOP`	Top of kernel stack. The kernel stack is one page long.
`CONSOLE_ADDR`	CGA console memory.
`PROC_START_ADDR`	Start of application code. Applications should not be able to access memory below `PROC_START_ADDR`, except for the single page at `console`.
`MEMSIZE_PHYSICAL`	Size of physical memory in bytes. WeensyOS does not support physical addresses ≥ `MEMSIZE_PHYSICAL`. Equals 0x200000 (2MiB).
`MEMSIZE_VIRTUAL`	Size of virtual memory. WeensyOS does not support virtual addresses ≥ `MEMSIZE_VIRTUAL`. Equals 0x300000 (3MiB).
`PAGESIZE`	Size of a memory page. Equals 4096 (or, equivalently, `1 << 12`).
`PAGEOFFMASK`	Mask for the offset portion of an address. Equals 4095 (`PAGESIZE - 1`). If `(a & PAGEOFFMASK) == 0`, then `a` is page-aligned.

Kernel and process address spaces

WeensyOS begins with the kernel and all processes sharing a single address space. This is defined by the kernel_pagetable page table. kernel_pagetable is initialized to the identity mapping: virtual address X maps to physical address X.

As you work through the pset, you will shift processes to use independent address spaces, where each process can access only a subset of physical memory.

The kernel, though, still needs the ability to access all of physical memory. Therefore, all kernel functions run using the kernel_pagetable page table. Thus, in kernel functions, each virtual address maps to the physical address with the same number. The exception_entry and syscall_entry assembly codes explicitly install kernel_pagetable when they begin, and exception_return and the syscall return path install the process’s page table as they exit.

Each process page table must contain kernel mappings for the kernel stack and for the exception_entry and syscall_entry code paths. [Why? What would the CPU do if the kernel stack, the exception_entry code, and the syscall_entry code were unmapped when a system call or exception occurred?]

Debugging WeensyOS

There are several ways to debug WeensyOS. We recommend:

Add log_printf statements to your code to print log messages to log.txt. However, log_printf will dramatically slow down your operating system. Delete log_printfs you don’t need to keep things snappy.
Use assertions to catch problems early (for instance, call check_page_table to test a page table for obvious issues, or add your own).
Printouts such as assertions and fault reports include the virtual address of the faulting instruction, but they do not always include symbol information, and they never contain line number information. Use files obj/kernel.asm (for the kernel) and obj/p-PROCESSNAME.asm (for processes) to map instruction addresses to instructions.
For example, here’s a backtrace for a kernel bug:
```
PANIC: Kernel page fault on 0x256000 (write missing page, rip=0x413ad)!
  #0  0x413ad  <_Z7syscallP8regstate>
  #1  0x40b28  <_Z13syscall_entryv>
```
This backtrace says that the fault happened at %rip 0x413ad, which is an instruction in the syscall() function. It also says that syscall() was called by syscall_entry(). To locate the specific instruction at issue, we check obj/kernel.asm for 413ad, and find:
```
        current[10000].regs.reg_rax = -1;
   413ad: 48 c7 80 10 bd 1f 00  movq   $0xffffffffffffffff,0x1fbd10(%rax)
```
Oops, I guess current[10000] is out of range!

A bug in process code might produce a message like this in log.txt; unfortunately, process backtraces do not have symbols:
```
Process 1 page fault on 0xefed937b23 (write missing page, rip=0x100015)!
  #0  0x100015
  #1  0x10011c
```
We can check the .asm file for the relevant process (here, obj/p-allocator.asm) to decode the instructions and their containing functions, and maybe get insight into the bug:
```
0000000000100000 <cause_trouble()>:
...
void cause_trouble() {
  100000: f3 0f 1e fa           endbr64
    heap_top[1030481984291] = 61;
  100004: 48 b8 23 3b 83 ed ef  movabs $0xefed833b23,%rax
  10000b: 00 00 00 
  10000e: 48 03 05 f3 1f 00 00  add    0x1ff3(%rip),%rax        # 102008 <heap_top>
  100015: c6 00 3d              movb   $0x3d,(%rax)

0000000000100019 <process_main()>:
...
            cause_trouble();
  100117: e8 e4 fe ff ff        call   100000 <cause_trouble()>
  10011c: eb 90                 jmp    1000ae <process_main()+0x95>
```
Oops, I guess heap_top[1030481984291] is out of range!
Sometimes a mistake will cause the OS to crash hard and reboot. Use make D=1 run to get additional, painfully-verbose debugging output. Search through qemu.log for check_exception lines to see where the exceptions occur.

A powerful, yet simple, technique for debugging extreme crashes is to narrow down where they occur using infinite loops. Add an infinite loop (while (true) {}) to your kernel. If the resulting kernel crashes, then the infinite loop is after the crash point; if it infinite loops, then the infinite loop is before the crash point. When a kernel infinite-loops on Docker, you must open another terminal to the same Docker instance and make stop to kill it.

Understanding memory errors

You may want to skip this section until you have completed the first few parts of the pset.

The WeensyOS memory viewer, which is defined in k-memviewer.cc, checks your memory management data structures and reports any problems it sees. The memusage::symbol_at() function chooses which symbol to display for each page and detects some errors. Here’s what some of those symbols and error messages mean.

L page: An L page has been leaked. This means that the page has been marked as used (it has a reference count greater than 0), but is not referenced by any process page table. This usually means that you’re missing a try_map call, you called kalloc without freeing or mapping the result, or you didn’t free your pages properly in phase 7.
PAGE TABLE ERROR: nullptr physical page mapped for user: A process pagetable has mapped the physical page at address 0. This page is reserved and mapping it is illegal. Not checking the return value of kalloc is a common cause of this error.
PAGE TABLE ERROR: reserved physical page mapped for user: Some other physical pages are reserved for special purposes, such as internal machine software and device memory. Processes aren’t allowed to map these pages.
PAGE TABLE ERROR: kernel data page mapped for user: Processes also aren’t allowed to map physical pages that control kernel instructions or data (that would violate kernel isolation).
PAGE TABLE ERROR: freed page mapped for user: A process has an active mapping for a physical page that’s currently marked as free (has refcount 0). This is invalid and dangerous, since the mapping grants access to memory that the kernel could use again at any moment.
PAGE TABLE ERROR: invalid reference count: A physical page has a refcount that’s unexpectedly large. In WeensyOS, page reference counts are typically limited to the range [0, PID_MAX], where PID_MAX is the number of processes in the system; but if your code decrements page refcounts too little or too much, the counter can easily get out of whack and overflow.
Unwanted S page: In phase 8, you should start seeing S pages, but only for pages that can be safely shared—namely, program instructions and read-only data. If you see lots of S pages, this indicates that multiple processes are sharing the same writable memory, which is unsafe. (Copy-on-write extra-credit will show a lot of S pages.)

The WeensyOS exception handler will also print messages on page faults, which indicate that process memory accesses went wrong.

PAGE FAULT on <PTR> (pid N, read protection problem, rip=<RIP>: A process is trying to read a virtual address that has a kernel-only mapping (not PTE_P|PTE_U). This likely means you haven’t mapped a page with the correct PTE_PWU permissions. Other messages include:
- write protection problem: The process tried to write a page that is present, but unwritable (not PTE_P|PTE_W|PTE_U).
- read missing page: The process tried to read a page that is not present.
- write missing page: The process tried to write a page that is not present.
<PTR> gives the address whose access failed, and <RIP> is the address of the instruction that tried to access that address.

Phase 1: Kernel isolation

WeensyOS processes could stomp all over the kernel’s memory if they wanted to. Better stop that! Change kernel_start, the kernel initialization function, so that kernel memory is inaccessible to applications—except for the memory holding the CGA console (the single page at CONSOLE_ADDR == 0xB8000).

When you are done, WeensyOS should look like this. In the virtual map, kernel memory is no longer reverse-video, since the user can’t access it. Note the lonely CGA console memory block.

Kernel isolation

Use vmiter to create memory mappings. Start from the vmiter loop in the kernel_start function.

About virtual memory iterators (`vmiter`)

The vmiter class examines and modifies x86-64 page tables, especially their virtual-to-physical address mappings.

Examining page tables

vmiter(pt, va) creates a vmiter object examining page table pt and with current virtual address va.

The va() method returns the iterator’s current virtual address.

The pa() method returns the physical address mapped at the current virtual address:

x86_64_pagetable* pt = ...;
uintptr_t pa = vmiter(pt, va).pa();    // returns uintptr_t(-1) if unmapped

The kptr() method returns a kernel pointer corresponding to the physical address. Use this method if you need to examine the physical memory located at va() in pt, or you need to pass that memory to kfree. kptr() returns nullptr if the virtual address is unmapped. Since the kernel’s page table is identity-mapped, the address value of kptr() is the same as the numeric physical address pa().

The perm() method returns the permissions of the current mapping.

if ((vmiter(pt, va).perm() & PTE_W) != 0) {
    // then `va` is present and writable in `pt`
}

perm() returns a bitwise-or of flags, possibly including PTE_P (numeric value 1), PTE_W (numeric value 2), and PTE_U (numeric value 4). PTE_P marks Present pages (pages that are mapped). PTE_W marks Writable pages. PTE_U marks User-accessible pages—pages accessible to unprivileged processes. Kernel memory should be mapped with permissions PTE_P|PTE_W, which allows the kernel to read or write the memory, but prevents all access by processes.

(Note that perm()’s return value may include other bits than PTE_P, PTE_W, and PTE_U, because the processor automatically modifies some bits as it runs. There are convenient shorthands—present(), writable(), and user()—for PTE_P, PTE_W, and PTE_U.)

Traversing page tables

The find(va) method changes the iterator’s current virtual address to va. You can also say it += DELTA, which increments the current virtual address by DELTA.

It is common to use vmiter in loops. This loop prints all present mappings in the lower 64KiB of memory, moving one page at a time:

for (vmiter it(pt, 0); it.va() < 0x10000; it += PAGESIZE) {
    if (it.present()) {
        log_printf("%p maps to %p with permissions %x\n",
                   it.va(), it.pa(), it.perm());
    }
}

The range_size() method returns the number of bytes left in the current page starting at va(). This can be useful when copying data into or out of a user buffer, because all bytes up to range_size() can be accessed without another kptr() lookup or vmiter permission check. Typically range_size() is greater than 0 and less than or equal to PAGESIZE.

Modifying page tables

The vmiter::try_map function modifies mappings in a page table. This code maps physical page 0x3000 at virtual address 0x2000, with permissions P, W, and U:

int r = vmiter(pt, 0x2000).try_map(0x3000, PTE_P | PTE_W | PTE_U);
if (r < 0) {
    // there was an error; mappings remain unchanged
}

vmiter::try_map returns -1 if it cannot add a mapping (this happens if it fails to allocate memory for a page table page).

You can also use try_map to change a mapping’s permissions; this adds PTE_W to the permissions for virtual address va:

vmiter it(pt, va);
assert(it.present());
int r = it.try_map(it.pa(), it.perm() | PTE_W);

Interface summary

// `vmiter` walks over virtual address mappings.
// `pa()` and `perm()` read current addresses and permissions;
// `map()` and `try_map()` modify mappings.

class vmiter {
  public:
    // Initialize a `vmiter` for `pt`, with initial virtual address `va`
    inline vmiter(x86_64_pagetable* pt, uintptr_t va);
    inline vmiter(const proc* p, uintptr_t va);

    // Return page table
    inline x86_64_pagetable* pagetable() const;

    // ADDRESS QUERIES
    // Return current virtual address
    inline uintptr_t va() const;
    // Return one past last virtual address in this mapping range
    inline uintptr_t last_va() const;
    // Return the number of bytes left in this mapping range
    // (If present() and N < range_size(), (*this + N).pa() == this.pa() + N.)
    inline uintptr_t range_size() const;
    // Return true iff iteration has completed (reached last va)
    inline bool done() const;
    // Return physical address mapped at `this->va()`,
    // or `(uintptr_t) -1` if `this->va()` is unmapped
    inline uint64_t pa() const;
    // Return a kernel-accessible pointer corresponding to `this->pa()`,
    // or `nullptr` if `this->va()` is unmapped
    template <typename T = void*>
    inline T kptr() const;

    // PERMISSIONS
    // Return permissions at `this->va()` (or 0 if `PTE_P` is not set)
    inline uint64_t perm() const;
    // Return true iff `this->va()` is present (`PTE_P`)
    inline bool present() const;
    // Return true iff `this->va()` is present and writable (`PTE_P|PTE_W`)
    inline bool writable() const;
    // Return true iff `this->va()` is present and unprivileged (`PTE_P|PTE_U`)
    inline bool user() const;

    // ADVANCED PERMISSIONS
    // Return true iff `(this->perm() & desired_perm) == desired_perm`
    inline bool perm(uint64_t desired_perm) const;
    // Return intersection of permissions in [this->va(), this->va() + sz)
    // (or uint64_t(-1) if `sz == 0`)
    uint64_t range_perm(size_t sz) const;
    // Return true iff `(range_perm(sz) & desired_perm) == desired_perm`
    inline bool range_perm(size_t sz, uint64_t desired_perm) const;

    // TRAVERSAL
    // Move to virtual address `va`; return `*this`
    inline vmiter& find(uintptr_t va);
    // Advance to virtual address `va() + delta`; return `*this`
    inline vmiter& operator+=(intptr_t delta);
    // Advance to virtual address `va() + 1`; return `*this`
    inline vmiter& operator++();
    inline void operator++(int);
    // Advance to virtual address `va() - delta`; return `*this`
    inline vmiter& operator-=(intptr_t delta);
    // Advance to virtual address `va() - 1`; return `*this`
    inline vmiter& operator--();
    inline void operator--(int);
    // Return new vmiter pointing at a nearby address
    inline vmiter operator+(intptr_t delta) const;
    inline vmiter operator-(intptr_t delta) const;
    // Move to next larger page-aligned virtual address, skipping large
    // non-present regions
    void next();
    // Move to `last_va()`
    void next_range();

    // MAPPING MODIFICATION
    // Change the mapping in `this->pagetable()` for `this->va()` (the
    // current virtual address) to `pa` with permissions `perm`.
    // `this->va()` must be page-aligned. Can call `kalloc` to allocate
    // page table pages. Panics if `kalloc` fails (returns `nullptr`).
    inline void map(uintptr_t pa, int perm);
    // Same, but map a kernel pointer
    inline void map(void* kptr, int perm);
    inline void map(volatile void* kptr, int perm);

    // Change the mapping in `this->pagetable()` for `this->va()` (the
    // current virtual address) to `pa` with permissions `perm`.
    // `this->va()` must be page-aligned. Can call `kalloc` to allocate
    // page table pages. On success, changes the mapping and returns 0.
    // If `kalloc` fails, returns a negative error code without modifying
    // any mappings.
    [[gnu::warn_unused_result]] int try_map(uintptr_t pa, int perm);
    [[gnu::warn_unused_result]] inline int try_map(void* kptr, int perm);
    [[gnu::warn_unused_result]] inline int try_map(volatile void* kptr, int perm);
};

In addition, make sure that your sys_page_alloc system call preserves kernel isolation: Applications shouldn’t be able to use sys_page_alloc to screw up the kernel. This requires changes to the SYSCALL_PAGE_ALLOC case in syscall. Read the description of sys_page_alloc in u-lib.hh to get a feeling for the possible errors.

Phase 2: Isolated address spaces

Implement process isolation by giving each process its own independent page table. Your OS should look like this:

Per-process isolation

Each process only has permission to access its own pages, which you can tell because only its own pages are shown in reverse video.

How to implement per-process page tables in process_setup:

Allocate a new, initially-empty page table for the process by calling kalloc_pagetable.
Copy the mappings from kernel_pagetable into this new page table using vmiter::try_map. This ensures that the required kernel mappings are present in the new page table. All user page tables should match the kernel_pagetable up to PROC_START_ADDR, when they start differing from kernel_pagetable and from each other. You can do this using a loop with two vmiters, or you can set the mappings yourself (they are identity mappings).

Note: vmiter::try_map will allocate page table pages as needed. All calls to try_map in process_setup are guaranteed to succeed. (That won’t be true for later parts of the pset.) In process_setup, feel free to assert that the return value from try_map is 0 (or call vmiter::map instead).
Then you will need to make sure that any page that belongs to the process is mapped as user-accessible. These are the pages the process needs to access, including its code, data, stack, and heap segments. There are several places you’ll need to change.

Note the diagram now has four pages for each process in the kernel area, starting at 0x1000. These are the four-level page tables for each process. (The colored background indicates that these pages contain kernel-private page table data, even though the pages “belong” to the process.) The first page was allocated explicitly in process_setup; the other pages were allocated by vmiter::try_map as the page table was initialized.

One common solution, shown above, leaves addresses above PROC_START_ADDR totally unmapped by default, but other designs work too. As long as a virtual address mapping has no PTE_U bit, its process isolation properties are unchanged. For instance, this solution, in which all mappings are present but accessible only to the kernel, also implements process isolation correctly:

Per-process isolation, alternate

If you create an incorrect page table, WeensyOS might crazily reboot. Don’t panic; see the debugging hints above.

About program images and segments (pgm and seg)
A program image, or executable, is the data file generated by the compiler and linker that defines a process’s initial state. It’s essentially a recipe for constructing a process. It describes the process’s initial virtual address space, defines the instruction and data bytes that initialize that address space, and specifies which instruction is the first instruction to execute. Operating systems use program images to initialize new processes.

A segment is a region of process memory. You’ve heard this term before: Unix processes have text, data, heap, and stack segments. Most program image data consists of information about the segments that must be loaded into processes’ initial memory state. (The operating system decides on the size and location of the stack segment itself, and the heap segment starts out empty.) You can print about a program image’s segment definitions using objdump -p.

The WeensyOS kernel includes, as part of its kernel data, program images for all the processes it knows how to run—p-allocator, p-allocator2, p-fork, etc. The program_image and program_image_segment data types allow you to query these images. Their interfaces are as follows:
struct program_image { ...
    // Return an iterator to the beginning loadable segment in the image.
    program_image_segment begin() const;
    // Return an iterator to the end loadable segment in the image.
    program_image_segment end() const;

    // Return the user virtual address of the entry point instruction.
    uintptr_t entry() const;
};

struct program_image_segment { ...
    // Return the user virtual address where this segment should be loaded.
    uintptr_t va() const;
    // Return the size of the segment, including zero-initialized space.
    size_t size() const;
    // Return a pointer to the kernel’s copy of the initial segment data.
    const char* data() const;
    // Return the number of bytes of initial segment data.
    size_t data_size() const;
    // It is always true that `data_size() <= size()`. If `data_size() < size()`,
    // the remaining bytes must be initialized to zero.

    // Return true iff the segment is writable.
    bool writable() const;
};
For instance, this loop prints information about a program image.
void log_program_info(const char* program_name) {
    program_image pgm(program_name);
    log_printf("program %s: entry point %p\n", program_name, pgm.entry());
    size_t n = 0;
    for (auto seg = pgm.begin(); seg != pgm.end(); ++seg, ++n) {
        log_printf("  segment %zu: addr %p, size %lu, data_size %lu, %s\n",
                   n, seg.va(), seg.size(), seg.data_size(),
                   seg.writable() ? "writable" : "not writable");
        if (seg.data_size() > 0) {
            log_printf("    first data byte: %u\n", (unsigned char) *seg.data());
        }
    }
}
The relationship between seg.va() and seg.data() is worth emphasizing. seg.va() is a user virtual address. It is the process virtual address where the process code expects the segment to be loaded. seg.data(), on the other hand, is a kernel pointer. It points into the program image data in the kernel. WeensyOS seg.va() values are all >= PROC_START_ADDR (0x100000), and are typically page-aligned, whereas seg.data() values are between 0x40000 and 0x80000 and are not aligned.

Phase 3: General process address spaces

So far, WeensyOS processes use identity mappings for process memory: a process code, data, stack, or heap page with virtual address X is stored in the physical page with physical address X. This is inflexible and limits utilization. Processes don’t have access to the address mapping (only the kernel does), so it should be fine for a process’s virtual address X to map to a different physical page—the process won’t be able to tell the difference. This will also enable new functionality, like running different processes with similar virtual address spaces.

Change your operating system to allocate all process data, including its code, globals, stack, and heap, using kalloc instead of direct access to the physpages array. This will turn the process page tables from subsets of an identity mapping to a more general mapping, where lots of pages have different virtual and physical addresses.

Here’s how your OS should look after this phase.

Virtual page allocation

This will complicate the code that initializes process code in process_setup. You’ll need to figure out why (hint: which page table is being used in process_setup?) and find a way around it (hint: vmiter or set_pagetable).

Phase 4: Nonsequential physical page allocation

In the handout code, kalloc always chooses the available physical page that has the lowest address. This means consecutive kalloc calls typically return consecutive physical pages. But your code should not depend on this behavior.

Change kalloc to allocate pages nonsequentially. This can be as simple as setting page_increment to 3 (or any larger odd number). The virtual address spaces should work as before, though the physical memory map will look different:

Virtual page allocation

But if you see a panic—like, maybe,

PAGE FAULT on 0xcccccccccccccccc (pid 3, read missing page, rip=0xcccccccccccccccc)!

you have a problem. Go back and think again about how to copy instructions and data from program segments into process memory.

Once you’re confident in your phase 4 solution, you can go back to setting page_increment = 1, but your code should work with any page allocation strategy.

Phase 5: Overlapping address spaces

Now the processes are isolated, which is awesome, but they’re still not taking full advantage of virtual memory. Since they are isolated, they can use the same address ranges without conflicting on the underlying data.

In this phase, change each process’s stack to grow down from address 0x300000 == MEMSIZE_VIRTUAL. Now the processes have enough heap room to use up all of physical memory!

Overlapping address spaces

If there’s no physical memory available, sys_page_alloc should return an error code to the calling process, such as -1. Do not kill the calling process! Lack of memory is a potentially recoverable problem.

Phase 6: Fork

The fork system call starts a new process as a copy of an existing process. The process that calls fork is called the parent process, and the newly created copy is called the child process. The system call appears to return twice: it returns 0 to the child, and returns the child’s process ID to the parent.

Run WeensyOS with make run-fork. Alternately, at any time, press the ‘f’ key; this will soft-reboot WeensyOS and ask it to run a single p-fork process, rather than the gang of allocators. You should see something like this:

Initial fork state

This is because you haven’t implemented fork yet.

Implement fork.

Look at other WeensyOS system calls as models for implementing your own system call. Also take a look at the fork man page to understand what this system call does.
When a process calls fork, look for a free process slot in the ptable[] array for the child. Don’t use slot 0. The ptable[] array is made up of proc process descriptor objects, which are defined in kernel.hh.
If a free slot is found, copy the parent’s page table for the child. For addresses below PROC_START_ADDR (0x100000), the parent and child page tables will have identical mappings (same physical addresses, same permissions). But for addresses at or above PROC_START_ADDR, the child’s page table must map different pages that contain copies of any user-accessible, writable memory. This ensures process isolation: two processes should not share any writable memory except the console.

So fork must examine every virtual address in the parent page table. Whenever the parent process has an application-writable, non-console page at virtual address V, then fork must allocate a new physical page P; copy the data from the parent’s page into P, using memcpy; and finally map page P at address V in the child page table, using the permissions from the parent page table.
The child process’s registers are initialized as a copy of the parent process’s registers, except for reg_rax, which is initialized to 0.
The child process should be runnable.
If fork fails in any way—if there is no free process slot, or there’s not enough memory to create a complete copy of the parent process—then fork must clean up after itself and return -1. The cleanup procedure must reset the child process slot to the free state. Eventually, after phase 7, it will also free any memory that was allocated during the fork attempt.

When you’re done, you should see something like this after make run-fork (or pressing ‘f’).

Fork works

An image like this means you forgot to copy the data for some pages, so the processes are actually sharing stack and/or data pages:

Incorrect fork

Phase 7: Freeing memory

So far, none of your test programs have ever freed memory or exited. Memory allocation’s pretty easy when there’s no free! So let’s add free by allowing applications to exit.

In this phase, you’ll implement the sys_exit system call, which exits the current process. This is a capstone since freeing memory will tend to expose weaknesses and problems in your other code.

To test your work, use make run-exit. Alternately, you can press ‘e’ at any time, which his reboots WeensyOS to run the p-exit program. (Initially it’ll crash because sys_exit() isn’t implemented yet.) p-exit processes alternate among forking new children, allocating memory, and exiting. The result is that once your code is correct, p-exit makes crazy patterns forever, like this:

Fork with exit

A fully correct OS can run p-exit indefinitely. In an OS with a memory leak, the physical memory map will collect L pages (L for leak), persistent blank spots, or both, and if run long enough, these pages will take over the screen. Here’s an example; note the Ls accumulating at the end of the physical memory map:

Fork with exit and slow leak

Reducing ALLOC_SLOWDOWN in p-exit may encourage errors to manifest, but you may need to be patient.

Here’s your task.

Complete kfree so that kfree frees memory. Make sure that kalloc can re-use freed memory.
sys_exit should mark a process as free and free all of its memory. This includes the process’s code, data, heap, and stack pages, as well as the pages used for its page directory and page table pages. The memory should become available again for future allocations.

Use vmiter and ptiter to enumerate the relevant pages. But be careful not to free the console.
You may also need to change fork and perhaps other places. In p-exit, unlike in previous parts of the pset, sys_fork and sys_page_alloc might be called when there’s not enough memory to create a new process or allocate or map a page. Your code should handle this cleanly, without leaking memory in any situation.

If a sys_page_alloc or sys_fork system call runs out of memory during its operation:
- The system call must clean up after itself. Any kernel memory or other resources (e.g., process slots) allocated during the system call must be freed.
- The system call must return -1 to the caller.
Note that out-of-memory events in system calls must not panic the kernel, and they must not cause the calling process to exit. It is not an error when a system call fails! Instead the error return value is passed to the process to handle as it prefers.

There should be no memory leaks!

About physical memory iterators (ptiter)
The ptiter type iterates through the physical memory used to represent a page table. (x86-64 page tables are hierarchical structures that can comprise multiple pages of memory.) ptiter is useful mostly when freeing page table structures.
class ptiter {
public:
    // initialize a `ptiter` for `pt`
    inline ptiter(x86_64_pagetable* pt);
    inline ptiter(const proc* p);

    // Return kernel-accessible pointer to current page table page.
    inline x86_64_pagetable* kptr() const;
    // Return physical address of current page table page.
    inline uintptr_t pa() const;
    // Return first virtual address mapped by this page table page.
    inline uintptr_t va() const;
    // Return one past the last virtual address mapped by this page table page.
    inline uintptr_t last_va() const;

    // Move to next page table page in depth-first order.
    inline void next();

    // ...
};
ptiter visits the individual page table pages in a multi-level page table, in depth-first order (so all level-1 page tables under a level-2 page table are visited before the level-2 is visited). A ptiter loop can easily visit all the page table pages owned by a process, which is usually at least 4 page tables in x86-64 (one per level):
for (ptiter it(pt); it.va() < MEMSIZE_VIRTUAL; it.next()) {
    log_printf("[%p, %p): level-%d ptp at pa %p\n",
               it.va(), it.last_va(), it.level() + 1, it.kptr());
}
A WeensyOS process might print the following:
[0x0, 0x200000): level-1 ptp at pa 0x58000
[0x200000, 0x400000): level-1 ptp at pa 0x59000
[0x0, 0x40000000): level-2 ptp at pa 0x57000
[0x0, 0x8000000000): level-3 ptp at pa 0x56000
Note the depth-first order: the level-1 page table pages are visited first, then level-2, then level-3. This makes it safe to use a ptiter to free the pages in a page table. ptiter never visits the topmost page table page, so that must be freed separately. Note also that ptiter::level is one less than you might expect (it returns a number between 0 and 3, rather than between 1 and 4).

Phase 8: Shared read-only memory

It’s wasteful for fork() to copy all of a process’s memory. For example, most processes, including p-fork, never change their code. So what if we shared the memory containing the code? That’d be fine for process isolation, as long as neither process could write the code.

Change process_setup to map read-only program segments using read-only memory. Use the seg.writable() function to detect whether a program segment can use read-only memory. Then, in fork, share read-only pages between processes rather than copying them; and, in exit, ensure that shared pages are correctly accounted for when a process exits.

Use vmiter::writable() to detect whether a page is read-only. Do not use an equality test like it.perm() == (PTE_P | PTE_U)! vmiter::perm() values often include bits like PTE_A and PTE_D that are automatically set by processor hardware, so equality tests on it.perm() will usually fail.

When you’re done, running p-exit should look like this:

Shared read-only memory

Each process’s virtual address space begins with an “S”, indicating that the corresponding physical page is Shared by multiple processes.

Extra credit

If you are finished and can't wait to do more of this type of work, try:

Copy-on-write page allocation!
Write more system calls and test programs! Some ideas:
- sys_page_alloc is like a restricted mmap system call: the user always defines where the page lives (so addr == nullptr is not supported), the system call allocates exactly one page (so sz == PAGESIZE always), and the map is copied when a process forks (so the flags are like MAP_ANON | MAP_PRIVATE and the protection is PROT_READ | PROT_WRITE | PROT_EXEC). Eliminate some of these restrictions by passing more arguments to the kernel and extending the SYSCALL_PAGE_ALLOC implementation.
- Implement sys_page_free/munmap.
- Implement a system call that puts the calling process to sleep until a given amount of time has elapsed, as measured by ticks (which counts timer interrupts).
- Implement a kill system call that lets one process kill others!
You will need to write a new test program to test this functionality.
How to write a new test program
1. Choose a name for your test program. We’ll assume testprogram for this example.
2. Create a C++ file p-testprogram.cc for your test program. You will base this file off one of the existing programs (p-allocator.cc, p-fork.cc, or p-exit.cc).
3. Now you can make clean; make run-testprogram to run your test program.
4. If you’d like, you can teach check_keyboard in k-hardware.cc about your program. Pick a keystroke that should correspond to your program and edit the “soft reboot process” accordingly. For instance:
```
if (c == 'a' || c == 'f' || c == 'e' || c == 't') { ...
    if (c == 'a') {
        argument = "allocators";
    } else if (c == 'e') {
        argument = "exit";
    } else if (c == 't') {
        argument = "testprogram";
    }
```
Now you should be able to run your test program with make run and typing t.

Turnin

You will turn in your code by pushing your git repository to github.com/cs61/YOUR-PSET_REPO.git and updating the grading server.

Problem set 3: WeensyOS

Get the code

Running WeensyOS

Initial state

Goal

Notes

Memory system layout

Kernel and process address spaces

Debugging WeensyOS

Understanding memory errors

Phase 1: Kernel isolation

About virtual memory iterators (`vmiter`)

Examining page tables

Traversing page tables

Modifying page tables

Interface summary

Phase 2: Isolated address spaces

About program images and segments (`pgm` and `seg`)

Phase 3: General process address spaces

Phase 4: Nonsequential physical page allocation

Phase 5: Overlapping address spaces

Phase 6: Fork

Phase 7: Freeing memory

About physical memory iterators (`ptiter`)

Phase 8: Shared read-only memory

Extra credit

How to write a new test program

Turnin

Problem set 3: WeensyOS

Get the code

Running WeensyOS

Initial state

Goal

Notes

Memory system layout

Kernel and process address spaces

Debugging WeensyOS

Understanding memory errors

Phase 1: Kernel isolation

About virtual memory iterators (vmiter)

Examining page tables

Traversing page tables

Modifying page tables

Interface summary

Phase 2: Isolated address spaces

About program images and segments (pgm and seg)

Phase 3: General process address spaces

Phase 4: Nonsequential physical page allocation

Phase 5: Overlapping address spaces

Phase 6: Fork

Phase 7: Freeing memory

About physical memory iterators (ptiter)

Phase 8: Shared read-only memory

Extra credit

How to write a new test program

Turnin

About virtual memory iterators (`vmiter`)

About program images and segments (`pgm` and `seg`)

About physical memory iterators (`ptiter`)