Problem set 4 – CS 61 2018

In this assignment, you implement process memory isolation, virtual memory, and some system calls in a tiny operating system. This will introduce you to virtual memory and operating system design.

Due Wednesday 11/7 11:59:59pm (extension 1 day later)
You must run this assignment on a Linux machine, such as a virtual machine. Running on Mac or a Windows box without a VM is not likely to work.

You may want to ponder Chapter 9 of the text. The 64-bit x86 virtual memory architecture is described in Section 9.7. The PTE_P, PTE_W, and PTE_U bits are shown in Figure 9.23 and discussed in Section 9.7.1.

Get the code

Start with the cs61-psets repository you used for Problem Set 3.

First, ensure that your repository has a handout remote. Type

git remote show handout

If this reports an error, run

git remote add handout git://github.com/cs61/cs61-f18-psets.git

Then run git pull handout master. This will merge our Assignment 4 code with your previous work. If you have any “conflicts” from Problem Set 3, resolve them before continuing further. Run git push to save your work back to your personal repository.

You may also create a new cs61-psets repository for this assignment.

Don’t forget to enter your repository URL on the grading server.

Initial state

For this assignment, there is no handy make check functionality. Instead, you should run your instance of WeensyOS and visually compare it to the images you see below in the assignment.

Run make run in your pset4 directory. You should see something like this, which shows four versions of the p-allocator process running in parallel:

This image loops forever; in an actual run, the bars will move to the right and stay there. Don’t worry if your image has different numbers of K’s or otherwise has different details.

If your bars run painfully slowly, edit the p-allocator.c file and reduce the ALLOC_SLOWDOWN constant.

Stop now to read and understand p-allocator.c. Here’s what’s going on in the physical memory display.

WeensyOS displays the current state of physical and virtual memory. Each character represents 4 KB of memory: a single page. There are 2 MB of physical memory in total. (How many pages is this?)
WeensyOS runs four processes, 1 through 4. Each process is compiled from the same source code (p-allocator.c), but linked to use a different region of memory.
Each process asks the kernel for more heap space, one page at a time, until it runs out of room. As usual, each process's heap begins just above its code and global data, and ends just below its stack. The processes allocate space at different rates: compared to Process 1, Process 2 allocates space twice as quickly, Process 3 goes three times faster, and Process 4 goes four times faster. (A random number generator is used, so the exact rates may vary.) The marching rows of numbers show how quickly the heap spaces for processes 1, 2, 3, and 4 are allocated.

Here are two labeled memory diagrams, showing what the characters mean and how memory is arranged.

The virtual memory display is similar.

The virtual memory display cycles between the four processes' address spaces. However, all the address spaces are the same for now.
Blank spaces in the virtual memory display correspond to unmapped addresses. If a process (or the kernel) tries to access such an address, the processor will page fault.
The character shown at address X in the virtual memory display identifies the owner of the corresponding physical page.
In the virtual memory display, a character is reverse video if an application process is allowed to access the corresponding address. Initially, any process can modify all of physical memory, including the kernel. Memory is not properly isolated.

Goal

You will implement complete and correct memory isolation for WeensyOS processes. Then you'll implement full virtual memory, which will improve utilization. You'll implement fork—creating new processes at runtime—and exit—destroying processes at runtime.

We need to provide a lot of support code for this assignment, but the code you write will be limited. All your code goes in kernel.cc.

Notes

Running WeensyOS

If QEMU’s default display causes accessibility problems, you will want to run make run-console. Please read the ABOUT.md file for tips on how to kill an out-of-control QEMU. You may also want to read about the QEMU Monitor.

There are several ways to debug WeensyOS. We recommend:

Add log_printf statements to your code (the output of log_printf is written to the file log.txt).
Use assertions to catch problems early (for instance, call check_page_table to test a page table for obvious issues, or add your own).
Sometimes a mistake will cause the OS to crash hard and reboot. Use make D=1 run 2>debuglog.txt to get additional verbose debugging output. Search through debuglog.txt for check_exception lines to see where the exceptions occur.
Use make run-gdb to use GDB for debugging, or use the QEMU Monitor to examine registers and memory.
Printouts such as assertions and fault reports include the virtual address of the faulting instruction, but they do not always include symbol information. Use files obj/kernel.asm (for the kernel) and obj/p-PROCESSNAME.asm (for processes) to map instruction addresses to instructions.
You can track down the worst problems using infinite loops. Say that you’re not sure whether a reboot-causing problem occurs before or after line 100 in your fork function. So add an infinite loop at line 100! If the problem does occur, then the problem is located before line 100; if the OS hangs instead, then the problem is located after line 100.

Memory system layout

WeensyOS memory system layout is described by several constants.


`KERNEL_START_ADDR`	Start of kernel code.
`KERNEL_STACK_TOP`	Top of kernel stack. The kernel stack is one page long.
`console`	CGA console memory.
`PROC_START_ADDR`	Start of application code. Applications should not be able to access memory below `PROC_START_ADDR`, except for the single page at `console`.
`MEMSIZE_PHYSICAL`	Size of physical memory in bytes. WeensyOS does not support physical addresses ≥ `MEMSIZE_PHYSICAL`. Equals 0x200000 (2MB).
`MEMSIZE_VIRTUAL`	Size of virtual memory. WeensyOS does not support virtual addresses ≥ `MEMSIZE_VIRTUAL`. Equals 0x300000 (3MB).
`PAGESIZE`	Size of a memory page. Equals 4096 (or, equivalently, `1 << 12`).

Kernel and process address spaces

WeensyOS begins with the kernel and all processes sharing a single address space. This is defined by the kernel_pagetable page table. kernel_pagetable is initialized to the identity mapping: virtual address X maps to physical address X.

As you work through the pset, you will shift processes to use independent address spaces, where each process can access only a subset of physical memory.

The kernel, though, still needs the ability to access all of physical memory. Therefore, all kernel functions run using the kernel_pagetable page table. Thus, in kernel functions, each virtual address maps to the physical address with the same number. The exception_entry and syscall_entry assembly codes explicitly install kernel_pagetable when they begin, and exception_return and the syscall return path install the process’s page table as they exit.

Each process page table must contain kernel mappings for the kernel stack and for the exception_entry and syscall_entry code paths.

Step 1: Kernel isolation

WeensyOS processes could stomp all over the kernel’s memory if they wanted. Better stop that. Change kernel, the kernel initialization function, so that kernel memory is inaccessible to applications—except for the memory holding the CGA console (the single page at (uintptr_t) console == 0xB8000).

When you are done, WeensyOS should look like this. In the virtual map, kernel memory is no longer reverse-video, since the user can’t access it. Note the lonely CGA console memory block.

Hints:

Use vmiter, which was discussed in Section 4. (The handout code uses vmiter as well, in k-hardware.cc; look for a loop in init_cpu_state, for example. We do not recommend trying to understand how vmiter is implemented, but k-vmiter.hh does document its methods.)
The perm argument to vmiter::map is a bitwise-or of zero or more PTE flags, PTE_P, PTE_W, and PTE_U. PTE_P marks Present pages (pages that are mapped). PTE_W marks Writable pages. PTE_U marks User-accessible pages—pages accessible to applications. You want kernel memory to be mapped with permissions PTE_P|PTE_W, which will prevent applications from reading or writing the memory, while allowing the kernel to both read and write.
Make sure that your sys_page_alloc system call preserves kernel isolation: Applications shouldn’t be able to use sys_page_alloc to screw up the kernel. This requires changes to the SYSCALL_PAGE_ALLOC case in syscall.

Step 2: Isolated address spaces

Implement process isolation by giving each process its own independent page table. Your OS should look like this:

Thus, each process only has permission to access its own pages. You can tell this because only its own pages are shown in reverse video.

How to implement per-process page tables:

process_setup should allocate a new, initially-empty page table for each process by calling kalloc.
Copy the mappings from kernel_pagetable into this new page table. The purpose of this step is to ensure that the required kernel mappings are present in the new page table. You can do this using a loop with two vmiters; or you can set the mappings yourself (they are identity mappings); or, if you’re feeling crazy, you can set up the new page table yourself without using vmiter (that will show you really understand page tables!).
Then you will need to make sure that any page that belongs to the process is mapped as user-accessible. These are the pages the process needs to access, including its code, data, stack, and heap segments. There are several places you’ll need to change.

Note the diagram now has four pages for each process in the kernel area, starting at 0x1000. These are the four-level page tables for each process. (The dark blue background indicates that these pages contain kernel-private page table data, even though the pages “belong” to the process.) The first page was allocated explicitly in process_setup; the other pages were allocated by vmiter::map as needed to fill out the page table.

One common solution, shown above, leaves addresses above PROC_START_ADDR totally unmapped by default, but that’s not the only valid choice. As long as a virtual address mapping has no PTE_U bit, its process isolation properties are unchanged. For instance, this solution, in which all mappings are present but accessible only to the kernel, also implements process isolation correctly:

If you create an incorrect page table, WeensyOS might crazily reboot. Don’t panic; see the debugging hints above.

Step 3: Virtual page allocation

So far, WeensyOS processes use physical page allocation for process memory: Process code, data, stack, and heap pages with virtual address X always use the physical pages with physical address X. This is inflexible and limits utilization.

Change your operating system to call kalloc instead of kalloc_physical_page for process data. No calls to kalloc_physical_page should remain (and you can remove it from your source code).

Here’s how our OS looks after this step.

Virtual page allocation will complicate the code that initializes process code in process_setup. You’ll need to figure out why (hint: which page table is in force in process_setup?) and find a way around it.

Step 4: Overlapping address spaces

Now the processes are isolated, which is awesome, but they’re still not taking full advantage of virtual memory. Isolated address spaces can use the same virtual addresses for different physical memory. There’s no need to keep the four process address spaces disjoint.

In this step, change each process’s stack to start from address 0x300000 == MEMSIZE_VIRTUAL. Now the processes have enough heap room to use up all of physical memory!

If there’s no physical memory available, sys_page_alloc should return an error to the caller (by returning -1). (If you want, you can also print “Out of physical memory!” or something to the console; you don’t need to.)

Step 5: Fork

The fork system call is one of Unix’s great ideas. It starts a new process as a copy of an existing process. The fork system call appears to return twice, once to each process. To the child process, it returns 0. To the parent process, it returns the child’s process ID.

Run WeensyOS with make run or make run-console. At any time, press the ‘f’ key. This will soft-reboot WeensyOS and ask it to run a single p-fork process, rather than the gang of allocators. You should see something like this:

This is because you haven’t implemented fork yet.

Implement fork.

When a process calls fork, look for a free process slot in the ptable[] array. Don’t use slot 0. If no slot exists, return -1 to the caller.
If a free slot is found, make a copy of current->pagetable, the forking process’s page table, for the child.
But you must also copy the process data in every application page shared by the two processes. The processes should not share any writable memory except the console (otherwise they wouldn’t be isolated). So fork must examine every virtual address in the old page table. Whenever the parent process has an application-writable page at virtual address V, then fork must allocate a new physical page P; copy the data from the parent’s page into P, using memcpy; and finally map page P at address V in the child process’s page table.
The child process’s registers are initialized as a copy of the parent process’s registers, except for reg_rax.

When you’re done, you should see something like this after pressing ‘f’.

An image like this means you forgot to copy the data for some pages, so the processes are actually sharing stack and/or data pages:

Step 6: Shared read-only memory

It’s wasteful for fork() to copy all of a process’s memory. For example, most processes, including p-fork, never change their code. So what if we shared the memory containing the code? That’d be fine for process isolation, as long as neither process could write the code.

Change the process loader to map read-only program segments using read-only memory. Use the program_loader::writable() function (loader.writable() in context) to detect whether a program segment can use read-only memory. Then, in fork, share read-only pages between processes rather than copying them.

When you’re done, running p-fork should look like this:

Each process’s virtual address space begins with a dark-colored “S”, indicating that the corresponding physical page is Shared by multiple processes. (The color difference is only visible on graphical QEMU; the console version doesn’t distinguish between light reverse-video and dark reverse-video.)

Step 7: Freeing memory

So far none of your test programs have ever freed memory or exited. Memory allocation’s pretty easy until you add free! So let’s do that, by allowing applications to exit. In this exercise you’ll implement the sys_exit system call, which exits the current process.

This exercise is a capstone since freeing memory will tend to expose weaknesses and problems in your other code.

To test your work, use make run and then type ‘e’. This reboots WeensyOS to run the p-forkexit program. (Initially it’ll crash because sys_exit() isn’t implemented yet.) p-forkexit combines two types of behavior:

Process 1 forks children indefinitely.
The child processes, #2 and up, are memory allocators, as in the previous parts of the pset. But with small probability at each step, each child process either exits or attempts to fork a new child.

The result is that once your code is correct, p-forkexit makes crazy patterns forever. An example:

A fully correct OS can run p-forkexit indefinitely. If you reduce ALLOC_SLOWDOWN in p-forkexit, even rare errors should turn up within a minute or two.

Here’s your task.

sys_exit should mark a process as free and free all of its memory. This includes the process’s code, data, heap, and stack pages, as well as the pages used for its page directory and page table pages. The memory should become available again for future allocations.

Use vmiter and ptiter to enumerate the relevant pages. But be careful not to free the console. Also, don’t free any shared pages until they are totally unused (the last sharing process exits)!

You will need to change the pageinfo structure (the type of pages entries), kalloc, fork, and perhaps other places to support this fully. Think about all the places that allocate or share memory: what information must they maintain?
In p-forkexit, unlike in previous parts of the pset, sys_fork and sys_page_alloc can run when there isn’t quite enough memory to create a new process or allocate a page. Your code should handle this cleanly, without leaking memory. If there isn’t enough free memory to allocate a process, fork() should clean up after itself (i.e., free any memory that was allocated for the new process before memory ran out), and then return -1 to the caller.

There should be no memory leaks! (A memory leak will show up as a blank spot on the memory map that never gets used.) Remember that vmiter::map can fail if memory is low.

Extra credit

If you are finished and can't wait to do more of this type of work, try:

Copy-on-write page allocation!
Write more system calls and test programs! Some ideas:
- sys_page_alloc is like a restricted mmap system call: the user always defines where the page lives (so addr == nullptr is not supported), the system call allocates exactly one page (so sz == PAGESIZE always), and the map is copied when a process forks (so the flags are like MAP_ANON | MAP_PRIVATE and the protection is PROT_READ | PROT_WRITE | PROT_EXEC). Eliminate some of these restrictions by passing more arguments to the kernel and extending the SYSCALL_PAGE_ALLOC implementation.
- Implement sys_page_free/munmap.
- Implement a system call that puts the calling process to sleep until a given amount of time has elapsed, as measured by ticks (which counts timer interrupts).
- Implement a kill system call that lets one process kill others!
You will need to write a new test program to test this functionality.
How to write a new test program
1. Run git pull handout master or equivalent to ensure you’re using our latest code (you need commit 2badb947).
2. Choose a name for your test program. We’ll assume testprogram for this example.
3. Create a C++ file p-testprogram.cc for your test program. You will base this file off one of the existing programs (p-allocator.cc, p-fork.cc, or p-forkexit.cc).
4. Modify GNUmakefile to build your test program: add $(OBJ)/p-testprogram to the PROCESS_BINARIES definition, and $(OBJ)/p-testprogram.o to the PROCESS_OBJS definition.
5. Teach the program_loader about your test program. Look for program_loader code in k-hardware.cc. Then add declarations for _binary_obj_p_testprogram_start and _binary_obj_p_testprogram_end, and add the appropriate entry to the ramimages array.
6. Teach check_keyboard in k-hardware.cc about your program. Pick a keystroke that should correspond to your program and edit the “soft reboot process” accordingly. For instance:
```
  if (c == 'a') {
      argument = "allocator";
  } else if (c == 'e') {
      argument = "forkexit";
  } else if (c == 't') {
      argument = "testprogram";
  }
```
7. Teach the kernel() function in kernel.cc about your program. Replace the current if (command...) statements with this:
```
  if (command && program_loader::program_number(command) > 0) {
      process_setup(1, program_loader::program_number(command));
  } else {
      // ... set up allocator processes ...
  }
```
  (This only needs to be done once, for the first test program you add.)
Now you should be able to run your test program by typing t.

Turnin

You will turn in your code by pushing your git repository to github.com/cs61/YOUR-PSET_REPO.git and updating the grading server.

Problem set 4: WeensyOS