2017/Kernel4

From CS61
Jump to: navigation, search

Pagetables and Process Isolation

Virtual Memory Map

 virtual_memory_map(x86_64_pagetable *pt, uintptr_t va, uintptr_t pa, size_t sz, int perm, x86_64_pagetable*(*allocator)(void))

Your problem set has a virtual_memory_map function whose job it is to create mappings from virtual to physical memory. A traditional x86 pagetable has four levels with the L4 pagetable pointing to true physical memory.

However, it is not always possible for virtual_memory_map to perform the mapping. The last argument is something called the allocator function. This function is designed to make working with multi level pagetables easier. If you pass in a pagetable that has not been mapped to a corresponding L2,L3, and L4 pagetable, the virtual_memory_map function will fail. The allocator function is designed to make working with multi level pagetables easier by making it so you do not have to think about mallocing these extra pages. The allocator function will perform the mallocs that are necessary to create the multi-level pagetable. In order to tell the allocator function who should be the owner of a newly created page, we can use global variables.

Process Isolation

Two Processes Sharing Once Pagetable

The following is Eve's code.

 #include "process.h"
 #include "lib.h"
 void process_main(void) {
     unsigned i = 0;
     while (1) {
         ++i;
         if (i % 1024 == 0) {
             app_printf(0, "Hi, I'm Eve! #%x\n", i);
         }
         if (i % 4096 == 0) {
             app_printf(0, "EVE REKT\n");
             uint8_t* code_ptr = (uint8_t*) 0x40042;
             memcpy(code_ptr, "\xeb\xfe", 2);
             (void) sys_getpid();
         }
         sys_yield();
     }
 }

This process is isolated from the kernel so it cannot directly modify kernel code. However both Eve and Alice share the same pagetable so all of user memory is accessible to both Eve and Alice. This makes Eve able to destroy Alice by writing zeros to Alice's code segment. We can look at Alice's symbol table and see where process_main is located in memory.

 0x100000 T process_main 
 0x100040 t sys_yield 
 0x100050 T memcpy 
 0x100080 T memmove 
 . 
 . 
 .

Alice's process_main code segment starts at 0x100000. Eve can directly write zeros to Alice's code segment by changing her code in the following way:

 #include "process.h"
 #include "lib.h"
 void process_main(void) {
     unsigned i = 0;
     while (1) {
         ++i;
         if (i % 1024 == 0) {
             app_printf(0, "Hi, I'm Eve! #%x\n", i);
         }
         if (i % 4096 == 0) {
             app_printf(0, "EVE REKT\n");
             uint8_t* code_ptr = (uint8_t*) 0x100000;   // overwriting Alice's code segment now
             memcpy(code_ptr, 0, PAGESIZE);             // setting entire page to zero
             (void) sys_getpid();
         }
         sys_yield();
     }
 }

Running this version of the code will result in an error message that reads Process 2 pagefauly at 0x230 (rip 0x100042)!. Alice is trying to execute codepages that have been zeroed by Eve. This demonstrates the danger of allowing two processes to share the same page table. Process isolation can be achieved by giving Alice and Eve separate pagetables. In the problem set, we solve this by copying the pagetable of another process to create a new one. In this class exercise, we will create an entirely new pagetable.

Working with Pagetables

Creating Pagetables

Processes are launched using a call to program_load. This function will load the program into the address mappings specified by the process pagetable. Before program_load is called, the appropriate pagetables should be setup for the process in question. A pagetable needs:

  • Kernel Memory
    • Every process will execute system calls. In x86, the pagetable does not change when a process executes a system call so it is necessary for every process to have a copy of the kernel pagetable.
  • Own Code and Data
  • Own Stack
  • Write Access to Console

Based on these specifications, we can create a pagetable using calls to virtual_memory_map.

 x86_64_pagetable* pt = allocator();                            // create a new pagetable
 memset(pt, 0, PAGESIZE);                                       // clear memory
 virtual_memory_map(pt, 0, 0, PROC_START_ADDR,                  // map the kernel code
                    PTE_P | PTE_W, allocator);                  // after this call succeeds subsequent calls to vmm do not need the allocator
 virtual_memory_map(pt, (uintptr_t)console, (uintptr_t)console, 
                    PAGESIZE, PTE_P|PTE_W|PTE_U, allocator);    
 virtual_memory_map(pt, 
       PROC_START_ADDR + PROC_SIZE * pid - PAGESIZE, 
       (uintptr_t) allocator(), PAGESIZE, 
       PTE_P|PTE_U|PTE_W, allocator); 
 uintptr_t loadaddr = program_get_load_address(program_number);  // locate beginning of process code and data
 for (int i = 0; i  < 3; ++i) { 
   virtual_memory_map(pt, loadaddr + i * PAGESIZE, 
       (uintptr_t) allocator(), PAGESIZE, PTE_P|PTE_U|PTE_W, allocator); 
 }

Creating a pagetable simply consists of allocating the pagetable and calling virtual memory map several times to install new address mappings. In this example we assume that the process only has three pages of code and data for simplicity. Notice that the allocator function is used in two different ways.

  • As the last argument to virtual_memory_map
  • As the physical address argument to virtual_memory_map

When the allocator function is passed as the physical address argument to virtual_memory_map, it is passed with the parentheses allocator(). This means that the the allocator function has been called and the return value of the function is being passed as the third argument to virtual_memory_map. This works because the allocator function returns a free physical address. When the allocator function is passed as the last argument to virtual_memory_map, it is passed with no parentheses. This allows the virtual_memory_map to conditionally call this function when it runs out of memory.

Copying Pagetables

If instead of creating a new pagetable we wanted to copy an existing one, we could do so using a loop with virtual_memory_lookup and virtual_memory_map. This will result in the copied pagetables having mappings to the same physical pages as the original pagetable. In other words, the L4 pagetable will be identical. Copying a pagetable is useful when forking. Recall that when a parent process calls fork, the child will inherit the address space of the parent. After the child process begins to execute independently, any modifications to the child pagetable will only affect the child's pagetable. If the child process modifies a variable x, the child will see the new value but the parent will still see the old value.

This is a key distinction between threads and processes. Each process has its own pagetable. However, multiple threads in a process will share the same pagetable. This makes threads vulnerable to race conditions because multiple threads are modifying shared resources. As we saw in the Alice and Eve example, threads are free to change the memory state of another thread. Careful synchronization is needed when working with threads.

Linking Processes to Pagetables

After a pagetable has been created, the struct proc structure should be modified so that every process has x86_64_pagetable *p_pagetable field. This associates each process with its own unique pagetable. After creating a new pagetable, you can then assign the pagetable to a given process by assigning it to the x86_64_pagetable *p_pagetable field in the process struct.

Writing an Allocator Function

Each call to virtual_memory_map above referred to an allocator function. In this section we see how to write an effective allocator function.

The kernel has a stack and some code and data. There are several free pages between the stack and the code and data that the allocator function will return one page at a time. This means that the allocator function has the ability to overwrite the kernel's stack if it is called enough times.

 x86_64_pagetable* allocator(void) { 
   static uintptr_t next_free_page; 
   if (next_free_page == 0)  { 
     next_free_page = ROUNDUP((uintptr_t)end, PAGESIZE);   // the 'end' variable marks the end of the kernel's code and data
     x86_64_pagetable *pt = (x86_64_pagetabpe*)next_free_page;  
      next_free_page += PAGESIZE; 
      return pt; 
   }
 }

With these components in place, process isolation can successfully succeed. Going back to Alice and Eve, Eve will not longer be able to overwrite Alice's code segment because Alice's memory is no longer accessible to Eve. Eve will be killed for trying to access invalid memory.

References