Pagetables and Process Isolation
Virtual Memory Map
virtual_memory_map(x86_64_pagetable *pt, uintptr_t va, uintptr_t pa, size_t sz, int perm, x86_64_pagetable*(*allocator)(void))
Your problem set has a virtual_memory_map
function whose job it is
to create mappings from virtual to physical memory. A traditional x86
pagetable has four levels with the L4
pagetable pointing to true
physical memory.
However, it is not always possible for virtual_memory_map
to perform
the mapping. The last argument is something called the allocator
function. This function is designed to make working with multi level
pagetables easier. If you pass in a pagetable that has not been mapped
to a corresponding L2
,L3
, and L4
pagetable, the
virtual_memory_map
function will fail. The allocator
function is
designed to make working with multi level pagetables easier by making it
so you do not have to think about malloc
ing these extra pages. The
allocator
function will perform the malloc
s that are necessary
to create the multi-level pagetable. In order to tell the allocator
function who should be the owner of a newly created page, we can use
global variables.
Process Isolation
Two Processes Sharing Once Pagetable
The following is Eve's code.
#include "process.h"
#include "lib.h"
void process_main(void) {
unsigned i = 0;
while (1) {
++i;
if (i % 1024 == 0) {
app_printf(0, "Hi, I'm Eve! #%x\n", i);
}
if (i % 4096 == 0) {
app_printf(0, "EVE REKT\n");
uint8_t* code_ptr = (uint8_t*) 0x40042;
memcpy(code_ptr, "\xeb\xfe", 2);
(void) sys_getpid();
}
sys_yield();
}
}
This process is isolated from the kernel so it cannot directly modify
kernel code. However both Eve and Alice share the same pagetable so all
of user memory is accessible to both Eve and Alice. This makes Eve able
to destroy Alice by writing zeros to Alice's code segment. We can look
at Alice's symbol table and see where process_main
is located in
memory.
0x100000 T process_main
0x100040 t sys_yield
0x100050 T memcpy
0x100080 T memmove
.
.
.
Alice's process_main
code segment starts at 0x100000
. Eve can
directly write zeros to Alice's code segment by changing her code in the
following way:
#include "process.h"
#include "lib.h"
void process_main(void) {
unsigned i = 0;
while (1) {
++i;
if (i % 1024 == 0) {
app_printf(0, "Hi, I'm Eve! #%x\n", i);
}
if (i % 4096 == 0) {
app_printf(0, "EVE REKT\n");
uint8_t* code_ptr = (uint8_t*) 0x100000; // overwriting Alice's code segment now
memcpy(code_ptr, 0, PAGESIZE); // setting entire page to zero
(void) sys_getpid();
}
sys_yield();
}
}
Running this version of the code will result in an error message that
reads Process 2 pagefauly at 0x230 (rip 0x100042)!
. Alice is trying
to execute codepages that have been zeroed by Eve. This demonstrates the
danger of allowing two processes to share the same page table. Process
isolation can be achieved by giving Alice and Eve separate pagetables.
In the problem set, we solve this by copying the pagetable of another
process to create a new one. In this class exercise, we will create an
entirely new pagetable.
Working with Pagetables
Creating Pagetables
Processes are launched using a call to program_load
. This function
will load the program into the address mappings specified by the process
pagetable. Before program_load
is called, the appropriate pagetables
should be setup for the process in question. A pagetable needs:
- Kernel Memory
- Every process will execute system calls. In
x86
, the pagetable does not change when a process executes a system call so it is necessary for every process to have a copy of the kernel pagetable.
- Every process will execute system calls. In
- Own Code and Data
- Own Stack
- Write Access to Console
Based on these specifications, we can create a pagetable using calls to
virtual_memory_map
.
x86_64_pagetable* pt = allocator(); // create a new pagetable
memset(pt, 0, PAGESIZE); // clear memory
virtual_memory_map(pt, 0, 0, PROC_START_ADDR, // map the kernel code
PTE_P | PTE_W, allocator); // after this call succeeds subsequent calls to vmm do not need the allocator
virtual_memory_map(pt, (uintptr_t)console, (uintptr_t)console,
PAGESIZE, PTE_P|PTE_W|PTE_U, allocator);
virtual_memory_map(pt,
PROC_START_ADDR + PROC_SIZE * pid - PAGESIZE,
(uintptr_t) allocator(), PAGESIZE,
PTE_P|PTE_U|PTE_W, allocator);
uintptr_t loadaddr = program_get_load_address(program_number); // locate beginning of process code and data
for (int i = 0; i < 3; ++i) {
virtual_memory_map(pt, loadaddr + i * PAGESIZE,
(uintptr_t) allocator(), PAGESIZE, PTE_P|PTE_U|PTE_W, allocator);
}
Creating a pagetable simply consists of allocating the pagetable and
calling virtual memory map several times to install new address
mappings. In this example we assume that the process only has three
pages of code and data for simplicity. Notice that the allocator
function is used in two different ways.
- As the last argument to
virtual_memory_map
- As the physical address argument to
virtual_memory_map
When the allocator
function is passed as the physical address
argument to virtual_memory_map
, it is passed with the parentheses
allocator()
. This means that the the allocator
function has been
called and the return value of the function is being passed as the third
argument to virtual_memory_map
. This works because the allocator
function returns a free physical address. When the allocator
function is passed as the last argument to virtual_memory_map
, it is
passed with no parentheses. This allows the virtual_memory_map
to
conditionally call this function when it runs out of memory.
Copying Pagetables
If instead of creating a new pagetable we wanted to copy an existing
one, we could do so using a loop with virtual_memory_lookup
and
virtual_memory_map
. This will result in the copied pagetables having
mappings to the same physical pages as the original pagetable. In other
words, the L4
pagetable will be identical. Copying a pagetable is
useful when fork
ing. Recall that when a parent process calls
fork
, the child will inherit the address space of the parent. After
the child process begins to execute independently, any modifications to
the child pagetable will only affect the child's pagetable. If the child
process modifies a variable x
, the child will see the new value but
the parent will still see the old value.
This is a key distinction between threads and processes. Each process has its own pagetable. However, multiple threads in a process will share the same pagetable. This makes threads vulnerable to race conditions because multiple threads are modifying shared resources. As we saw in the Alice and Eve example, threads are free to change the memory state of another thread. Careful synchronization is needed when working with threads.
Linking Processes to Pagetables
After a pagetable has been created, the struct proc
structure should
be modified so that every process has x86_64_pagetable \*p_pagetable
field. This associates each process with its own unique pagetable. After
creating a new pagetable, you can then assign the pagetable to a given
process by assigning it to the x86_64_pagetable \*p_pagetable
field
in the process struct.
Writing an Allocator Function
Each call to virtual_memory_map
above referred to an allocator
function. In this section we see how to write an effective allocator
function.
The kernel has a stack and some code and data. There are several free pages between the stack and the code and data that the allocator function will return one page at a time. This means that the allocator function has the ability to overwrite the kernel's stack if it is called enough times.
x86_64_pagetable* allocator(void) {
static uintptr_t next_free_page;
if (next_free_page == 0) {
next_free_page = ROUNDUP((uintptr_t)end, PAGESIZE); // the 'end' variable marks the end of the kernel's code and data
x86_64_pagetable *pt = (x86_64_pagetabpe*)next_free_page;
next_free_page += PAGESIZE;
return pt;
}
}
With these components in place, process isolation can successfully succeed. Going back to Alice and Eve, Eve will not longer be able to overwrite Alice's code segment because Alice's memory is no longer accessible to Eve. Eve will be killed for trying to access invalid memory.
References
-
- The Evolution of the UNIX time-sharing system This is a lovely paper that explains some of the how and why behind the way we interact with operating systems today.