Problem Set 4: WeensyOS
In this assignment, you implement process memory isolation, virtual memory, and some system calls in a tiny operating system. This will introduce you to virtual memory and operating system design.
- Due Fri 11/3 11:59:59pm (extension 1 day later)
- This assignment may be completed in pairs.
- You must run this assignment on a Linux machine, such as a virtual machine. Running on Mac or a Windows box without a VM will not likely work.
You may want to ponder Chapter 9 of the text. The 64-bit x86 virtual
memory architecture is described in Section 9.7. The PTE_P
, PTE_W
,
and PTE_U
bits are shown in Figure 9.23 and discussed in Section
9.7.1.
Get the code
Start with the cs61-psets
repository you used for Assignment
3.
First, ensure that your repository has a handout
remote. Type
git remote show handout
If this reports an error, run
git remote add handout
git://github.com/cs61/cs61-psets-f17.git
Then run git pull handout master
. This will
merge our Assignment 4 code with your
previous work. If you have any “conflicts” from Assignment 3, resolve
them before continuing further. Run
git push
to save your work back to your personal repository.
You may also create a new cs61-psets
repository for this assignment.
Don’t forget to enter your repository URL on the grading server.
Additionally, due to the complexity of this pset, and the possibility of breaking the functionality of the kernel if you code in some errors, make sure to commit and push your code often! It's very important that your commits have working versions of the code, so if something goes wrong, you can always go back to a previous commit and get back a working copy! At the very least, for this pset, you should be committing once per step, so you can go back to the last step if necessary.
Initial state
For this assignment, there is no handy make check
functionality.
Instead, you should run your instance of Weensy OS and visually compare
it to the images you see below in the assignment.
Run make run
in your pset4
directory. You should see something like
this, which shows four versions of the p-allocator
process running in
parallel:
This image loops forever; in an actual run, the bars will move to the right and stay there. Don't worry if your image has different numbers of K's or otherwise has different details.
If your bars run painfully slowly, edit the p-allocator.c
file and
reduce the ALLOC_SLOWDOWN
constant.
Stop now to read and understand p-allocator.c
. Here’s what’s going
on in the physical memory display.
- WeensyOS displays the current state of physical and virtual memory. Each character represents 4 KB of memory: a single page. There are 2 MB of physical memory in total. (How many pages is this?)
- WeensyOS runs four processes, 1 through 4. Each process is compiled
from the same source code (
p-allocator.c
), but linked to use a different region of memory. - Each process asks the kernel for more heap space, one page at a time, until it runs out of room. As usual, each process's heap begins just above its code and global data, and ends just below its stack. The processes allocate space at different rates: compared to Process 1, Process 2 allocates space twice as quickly, Process 3 goes three times faster, and Process 4 goes four times faster. (A random number generator is used, so the exact rates may vary.) The marching rows of numbers show how quickly the heap spaces for processes 1, 2, 3, and 4 are allocated.
Here are two labeled memory diagrams, showing what the characters mean and how memory is arranged.
The virtual memory display is similar.
- The virtual memory display cycles between the four processes' address spaces. However, all the address spaces are the same for now.
- Blank spaces in the virtual memory display correspond to unmapped addresses. If a process (or the kernel) tries to access such an address, the processor will page fault.
- The character shown at address X in the virtual memory display identifies the owner of the corresponding physical page.
- In the virtual memory display, a character is reverse video if an application process is allowed to access the corresponding address. Initially, any process can modify all of physical memory, including the kernel. Memory is not properly isolated.
Goal
You will implement complete and correct memory isolation for WeensyOS processes. Then you'll implement full virtual memory, which will improve utilization. You'll implement fork—creating new processes at runtime—and exit—destroying processes at runtime.
We need to provide a lot of support code for this assignment, but the
code you write will be limited. Our solutions contain less than 200
lines. All your code goes in kernel.c
(except for part of step 6).
Notes
Running WeensyOS
Read the README-OS.md
file for information on how to run WeensyOS. If
QEMU’s default display causes accessibility problems, you will want to
run make run-console
. To make run-console
the default, run
export QEMUCONSOLE=1
in your shell.
There are several ways to debug WeensyOS. We recommend adding
log_printf
statements to your code (the output of log_printf
is
written to the file log.txt
), and use assertions to catch problems
early (for instance, call check_page_table_mappings
and
check_page_table_ownership
to test a page table for obvious issues).
Memory system layout
WeensyOS memory system layout is described by several constants.
KERNEL_START_ADDR |
Start of kernel code. |
KERNEL_STACK_TOP |
Top of kernel stack. The kernel stack is one page long. |
console |
CGA console memory. |
PROC_START_ADDR |
Start of application code. Applications should not be able to access memory below PROC_START_ADDR , except for the single page at console . |
MEMSIZE_PHYSICAL |
Size of physical memory in bytes. WeensyOS does not support physical addresses ≥ MEMSIZE_PHYSICAL . Equals 0x200000 (2MB). |
MEMSIZE_VIRTUAL |
Size of virtual memory. WeensyOS does not support virtual addresses ≥ MEMSIZE_VIRTUAL . Equals 0x300000 (3MB). |
Address composition
WeensyOS uses several macros to handle addresses. They are defined at
the top of x86-64.h
. The most important include:
PAGESIZE |
Size of a memory page. Equals 4096 (or, equivalently, 1 << 12 ). |
PAGENUMBER(addr) |
The page number for the page containing addr . Expands to something like addr / PAGESIZE . |
PAGEADDRESS(pn) |
The initial address in page number pn . Expands to something like pn * PAGESIZE . |
PAGEINDEX(addr, level) |
The level th page table index for addr . level must be between 0 and 3; 0 returns the level-1 page table index (address bits 39–47), 1 returns the level-2 index (bits 30–38), 2 returns the level-3 index (bits 21–29), and 3 returns the level-4 index (bits 12–20). |
PTE_ADDR(pe) |
The physical address contained in a page table entry pe . Obtained by masking off the flag bits (setting the low-order 12 bits to zero). |
You should both understand what these macros represent and be able to derive values if you were given a different pagesize.
Kernel and process address spaces
WeensyOS begins with the kernel and all processes sharing a single
address space. This is defined by the kernel_pagetable
page table.
kernel_pagetable
is initialized to the identity mapping: virtual
address X maps to physical address X.
As you work through the pset, you will shift processes to using their own independent address spaces, where each process can access only a subset of physical memory.
The kernel, though, still needs the ability to access any location in
physical memory. Therefore, all kernel functions run using the
kernel_pagetable
page table. Thus, in kernel functions, each virtual
address maps to the physical address with the same number. The
exception
function explicitly installs kernel_pagetable
when it
begins.
WeensyOS system calls are more expensive than they need to be, since
every system call switches address spaces twice (once to
kernel_pagetable
and once back to the process’s page table). Real
operating systems avoid this overhead. In real operating systems,
kernels access memory using process page tables, rather than a
kernel-specific kernel_pagetable
. This makes the kernel code more
complicated, since kernels can’t always access all of physical memory
directly.
Step 1: Kernel isolation
WeensyOS processes could stomp all over the kernel’s memory if they
wanted. Better stop that. Change kernel
, the kernel initialization
function, so that kernel memory is inaccessible to applications—except
for the memory holding the CGA console (the single page at
(uintptr_t) console == 0xB8000
).
When you are done, WeensyOS should look like this. In the virtual map, kernel memory is no longer reverse-video, since the user can’t access it. Note the lonely CGA console memory block.
Hints:
- Use
virtual_memory_map
. A description of this function is inkernel.h
. You will benefit from reading all the function descriptions inkernel.h
. You can supplyNULL
for theallocator
argument for now. - If you really want to look at the code for
virtual_memory_map
, it is ink-hardware.c
, along with many other hardware functions. - The
perm
argument tovirtual_memory_map
is a bitwise-or of zero or morePTE
flags,PTE_P
,PTE_W
, andPTE_U
.PTE_P
marks Present pages (pages that are mapped).PTE_W
marks Writable pages.PTE_U
marks User-accessible pages—pages accessible to applications. You want kernel memory to be mapped with permissionsPTE_P|PTE_W
, which will prevent applications from reading or writing the memory, while allowing the kernel to both read and write. - Make sure that your
sys_page_alloc
system call preserves kernel isolation: Applications shouldn’t be able to usesys_page_alloc
to screw up the kernel.
When you're done with this step, make sure to commit and push your code!
Step 2: Isolated address spaces
Implement process isolation by giving each process its own independent page table. Your OS should look like this:
Thus, each process only has permission to access its own pages. You can tell this because only its own pages are shown in reverse video.
What goes in per-process page tables:
- The initial mappings for addresses less than
PROC_START_ADDR
should be copied from those inkernel_pagetable
. You can use a loop withvirtual_memory_lookup
andvirtual_memory_map
to copy them. Alternately, you can copy the mappings from the kernel’s page table into the new page tables; this is faster, but make sure you copy the right data! - The initial mappings for the user area—addresses greater than or equal
to
PROC_START
—should be inaccessible to user processes (noPTE_U
). In our solution (shown above), these addresses are totally inaccessible (so they show as blank), but you can also change this so that the mappings are still there, but accessible only to the kernel, as in this diagram:
The reverse video shows that this OS also implements process isolation correctly.
How to implement per-process page tables:
- Change
process_setup
to create per-process page tables. - We suggest you write a
copy_pagetable(x86_64_pagetable* pagetable, int8_t owner)
function that allocates and returns a new page table, initialized as a full copy ofpagetable
(including all mappings frompagetable
). This function will be useful in Step 5. Inprocess_setup
you can modify the page table returned bycopy_pagetable
according to the requirements above. Your function can usepageinfo
to find free pages to use for page tables. Read aboutpageinfo
at the top ofkernel.c
. - Remember that the x86-64 architecture uses four-level page tables.
- The easiest way to copy page tables involves an allocator function
suitable for passing to
virtual_memory_map
. - You’ll need at least to allocate a level-1 page table and initialize
it to zero. You can also set up the whole four-level page table
skeleton (for addresses
0…MEMSIZE_VIRTUAL - 1
) yourself; then you don’t need an allocator function. - A physical page is free if
pageinfo[PAGENUMBER].refcount == 0
. Look at the other code inkernel.c
for some hints on how to examine thepageinfo
array. - All of process
P
’s page table pages must havepageinfo[...].owner == P
or our checking functions will fail. This will affect your allocator function. (Hint: Don’t forget that global variables are allowed in this class!)
If you create an incorrect page table, WeensyOS might crazily reboot.
Don’t panic! Add log_printf
statements. Another useful technique:
add infinite loops to your kernel to track down exactly where a
fault occurs. (If the OS stops, then the crash occurs after the infinite
loop.)
Again, once finished with step 2, commit and push!
Step 3: Virtual page allocation
So far, WeensyOS processes use physical page allocation: the page with
physical address X is used to satisfy the sys_page_alloc(X)
allocation request for virtual address X. This is inflexible and
limits utilization. Change the implementation of the
INT_SYS_PAGE_ALLOC
system call so that it can use any free physical
page to satisfy a sys_page_alloc(X)
request.
Your new INT_SYS_PAGE_ALLOC
code must perform the following tasks.
- Find a free physical page using the
pageinfo
array. Return-1
to the application if you can’t find one. Use any algorithm you like to find a free physical page; we just return the first one we find. - Record the physical page’s allocation in
pageinfo
. - Map that physical page at the requested virtual address.
Don’t modify the physical_page_alloc
helper function, which is also
used by the program loader. You can write a new function if you want.
Here’s how our OS looks after this step.
Now commit and push your code before moving on to step 4!
Step 4: Overlapping address spaces
Now the processes are isolated, which is awesome. But they’re still not taking full advantage of virtual memory. Isolated address spaces can use the same virtual addresses for different physical memory. There’s no need to keep the four process address spaces disjoint.
In this step, change each process’s stack to start from address
0x300000 == MEMSIZE_VIRTUAL
. Now the processes have enough heap room
to use up all of physical memory!
If there’s no physical memory available, sys_page_alloc
should return
an error to the caller (by returning -1). (Our solution additionally
prints “Out of physical memory!
” to the console when this happens; you
don’t need to.)
As always, make sure to commit and push after finishing this step!
Step 5: Fork
The fork
system call is one of Unix’s great ideas. It starts a new
process as a copy of an existing process. The fork
system call
appears to return twice, once to each process. To the child process, it
returns 0. To the parent process, it returns the child’s process ID.
Run WeensyOS with make run
or make run-console
. At any time, press
the ‘f
’ key. This will soft-reboot WeensyOS and ask it to run a single
p-fork
process, rather than the gang of allocator
s. You should see
something like this:
This is because you haven’t implemented fork
yet.
Implement fork
.
- When a process calls
fork
, look for a free process slot in theprocesses[]
array. Don’t use slot 0. If no slot exists, return-1
to the caller. - If a free slot is found, make a copy of
current->p_pagetable
, the forking process’s page table, using your function from earlier. - But you must also copy the process data in every application page
shared by the two processes. The processes should not share any
writable memory except the console (otherwise they wouldn’t be
isolated). So
fork
must examine every virtual address in the old page table. Whenever the parent process has an application-writable page at virtual addressV
, thenfork
must allocate a new physical pageP
; copy the data from the parent’s page intoP
, usingmemcpy
; and finally map pageP
at addressV
in the child process’s page table. - The child process’s registers are initialized as a copy of the parent
process’s registers, except for
reg_rax
. - Use
virtual_memory_lookup
to query the mapping between virtual and physical addresses in a page table.
When you’re done, you should see something like this after pressing
‘f
’.
An image like this means you forgot to copy the data for some pages, so the processes are actually sharing stack and/or data pages:
Don't forget to commit and push after finishing fork!
Step 6: Shared read-only memory
It’s wasteful for fork()
to copy all of a process’s memory. For
example, most processes, including p-fork
, never change their code. So
what if we shared the memory containing the code? That’d be fine for
process isolation, as long as neither process could write the code.
Change the process loader in k-loader.c
to detect read-only program
segments and map them as read-only for applications (PTE_P|PTE_U
). A
program segment ph
is read-only iff
(ph->p_flags & ELF_PFLAG_WRITE) == 0
.
Your fork()
code shouldn’t copy shareable pages, but it should keep
track of the number of active references to each user page.
Specifically, if pageinfo[pn].refcount > 0
and
pageinfo[pn].owner > 0
, then pageinfo[pn].refcount
should equal the
number of times pn
is mapped in process page tables.
When you’re done, running p-fork
should look like this:
Each process’s virtual address space begins with a darker-colored “1”.
The dark color indicates that the corresponding physical page has
reference count (refcount
) greater than 1. (The color difference is
only visible on graphical QEMU; the console version doesn’t distinguish
between light reverse-video and dark reverse-video.)
Hint:
- Mark a program segment read-only after the
memcpy
andmemset
operations that add data to the segment. Otherwise you’ll get a fault.
Again, commit and push!
Step 7: Freeing memory
So far none of your test programs have ever freed memory or exited.
Memory allocation’s pretty easy until you add free! So let’s do that, by
allowing applications to exit. In this exercise you’ll implement the
sys_exit
system call, which exits the current process.
This exercise is worth fewer points than others, but it is important: freeing memory will tend to expose weaknesses and problems in your other code.
To test your work, use make run
and then type ‘e
’. This reboots
WeensyOS to run the p-forkexit
program. (Initially it’ll crash because
sys_exit()
isn’t implemented yet.) p-forkexit
combines two types of
behavior:
- Process 1 forks children indefinitely.
- The child processes, #2 and up, are memory allocators, as in the previous parts of the pset. But with small probability at each step, each child process either exits or attempts to fork a new child.
The result is that once your code is correct, p-forkexit
makes crazy
patterns forever. An example:
Your picture might look a little different; for example, thanks to Step 6, your processes should share a code page, which would appear as a darker-colored “1”.
Here’s your task.
sys_exit
should mark a process as free and free all of its memory. This includes the process’s code, data, heap, and stack pages, as well as the pages used for its page directory and page table pages.- In
p-forkexit
, unlike in previous parts of the pset,sys_fork
can run when there isn’t quite enough memory to create a new process. Your code should handle this case. If there isn’t enough free memory to allocate a process,fork()
should clean up after itself (i.e., free any memory that was allocated for the new process before memory ran out), and then return-1
to the caller. There should be no memory leaks.
The virtual_memory_check
function, which runs periodically, should
help catch some errors. Feel free to add checks of your own.
Extra credit
If you are finished and can't wait to do more of this type of work, try:
- Copy-on-write page allocation!
- Faster system calls, for instance using the
syscall
andsysexit
instructions!
Turnin
You will turn in your code by pushing your git repository to github.com/cs61/YOUR-PSET_REPO.git and updating the grading server.