Kernel4 Solutions – CS 61 2020

P1. Examining memory

The first question is how to find the addresses corresponding to these symbols. Well, here’s one way: ask GDB, which knows the addresses for things.

cs61-user@9d076324193b:~/cs61-lectures/kernel4$ make run
* Run `gdb -x build/weensyos.gdb` to connect gdb to qemu.
...
kohler@elsewhere$ gdb -x build/demoos.gdb
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
...blah blah blah...
(gdb) p syscall_entry
$1 = {<text variable, no debug info>} 0x40ad6 <syscall_entry>
(gdb) p process_main
$2 = {void (void)} 0x100000 <process_main()>
(gdb) p kernel_pagetable
$3 = 0x4e000 <kernel_pagetable>
(gdb)

Then you can pass those address to vmiter.

Another way is to take advantage of the symbol and assembly files created in the obj/ directory. If you’re not sure what file to check, use grep to check them all:

cs61-user@9d076324193b:~/cs61-lectures/kernel4$ grep syscall_entry obj/*
Binary file obj/kernel matches
obj/kernel.asm:0000000000040ad6 <syscall_entry>:
obj/kernel.asm:    wrmsr(MSR_IA32_LSTAR, reinterpret_cast<uint64_t>(syscall_entry));
Binary file obj/kernel.full matches
obj/kernel.sym:0000000000040ad6 T syscall_entry
Binary file obj/k-exception.ko matches
Binary file obj/k-hardware.ko matches

Or you can get addresses from the compiler—though this is not as easy as it might first appear.

`syscall_entry`

The syscall_entry function is defined in k-exception.S as an assembly function, so we cannot access it directly from inside kernel.cc since the C compiler will not know what the symbol refers to. To be able to access it from inside kernel.cc, we can define a syscall_entry function marked as extern, meaning the implementation is external and will be provided during linking.

Placing this line of code above the definition of kernel_start will allow us to access the function now!

extern "C" { extern void syscall_entry(); }

We mark the function with extern "C" because the name of the function in assembly code is just syscall_entry and we want to prevent the C++ compiler from automatically doing name mangling of function names.

Now we can look up the address of this function in our kernel_pagetable and print it (this code goes in kernel_start right before the call to process_setup):

auto se = vmiter(kernel_pagetable, (uintptr_t) &syscall_entry);
log_printf("syscall_entry pa: %p\n", (uintptr_t) se.pa());

Kernel pagetable

The kernel_pagetable variable has type x86_64_pagetable*. This means it is a pointer to the actually pagetable memory. We can just convert that virtual address to a physical address and print it:

auto kp = vmiter(kernel_pagetable, (uintptr_t) kernel_pagetable);
log_printf("kernel_pagetable pa: %p\n", kp.pa());

Process main

The process_main function is not linked into the kernel so it is not accessible by using the same trick we used to get the syscall_entry function. However, we can look in process_setup for some ideas! The address of process main will be the first address that gets executed by that process, so that process’s initial value of %rip will be the address of process main. We see the line

p->regs.reg_rip = pgm.entry();

in process_setup, so we can use the same functionality to print the address of process_main in kernel_start:

program_image pgm(WEENSYOS_FIRST_PROCESS);
auto pm = vmiter(kernel_pagetable, (uintptr_t) pgm.entry());
log_printf("process_main pa: %p\n", pm.pa());

Do the physical addresses change?

If we run the same print statements after the call to process_setup, the addresses do not appear to change. This makes sense, because process_setup does not modify any of the virtual addresses, and does not perform any mappings that are not identity mappings.

P2. Examining permissions

We can use the same vmiter objects as before, but print the permission bits instead:

log_printf("syscall_entry perm: 0x%x\n", se.perm());
log_printf("kernel_pagetable perm: 0x%x\n", kp.perm());
log_printf("process_main perm: 0x%x\n", pm.perm());

Do the permissions change?

After printing the permissions before and after process_setup it looks like the permissions for the process_main address have changed. This makes sense because that page needs to be user-accessible so that the user process p-hello can access its own code/data to execute. In particular, the page for process_main is now marked with the PTE_U bit, and it wasn’t before calling process_setup.

Some other bits also change. In particular, the permissions before process_setup equal 0x3 (PTE_W|PTE_P), but afterward you might see 0x27 or 39. x86-64.h says that these bits are PTE_U|PTE_W|PTE_P|PTE_A. The PTE_A bit is set automatically by the processor hardware to indicate that the relevant address range has been Accessed.

P3. Examining page table structures

Let’s use a ptiter to print the page table pages:

for (ptiter it(kernel_pagetable); it.va() < MEMSIZE_VIRTUAL; it.next()) {
    log_printf("[%p, %p): level-%d ptp at pa %p\n",
               it.va(), it.last_va(), it.level() + 1, it.kptr());
}

I see

[0x0, 0x200000): level-1 ptp at pa 0x51000
[0x200000, 0x400000): level-1 ptp at pa 0x52000
[0x0, 0x40000000): level-2 ptp at pa 0x50000
[0x0, 0x8000000000): level-3 ptp at pa 0x4f000

{.note}

Note. If you ran this code on the original handout, you would have seen only one page table page, the first. This is because the handout OS’s MEMSIZE_VIRTUAL was originally too small.

We can modify our code slightly to print the contents of each page table page.

for (ptiter it(kernel_pagetable); it.va() < MEMSIZE_VIRTUAL; it.next()) {
    uint64_t* bytes = (uint64_t*) it.kptr();
    for (unsigned i = 0; i < PAGESIZE / sizeof(uint64_t); i++) {
        log_printf("0x%lx\n", bytes[i]);
    }
}

We see a lot of numbers like

0x0
0x1003
0x2003
0x3003
0x4003
0x5003
0x6003
0x7003
0x8003
0x9003
0xa003
0xb003
0xc003
0xd003
...

Specifically, almost all end with 3—which we know from P2 is PTE_P|PTE_W—but some end with 7—which is PTE_P|PTE_W|PTE_U. The fourth hex digit always increases by 1. And in the later page table pages, we see some numbers that look like addresses of earlier page table pages. Fascinating!

P4. Warped virtual memory

Without any modification, p-bigdata prints CS 61 Is Awful :(. To fix this, we can notice that there is a page boundary at &big_data[4096], which happens to correspond to the point 10 bytes into the string. The | marks the page boundary in the string below:

CS 61 Is A|wful

Let’s try printing some of the addresses in the process. We can add another print statement below the initial one:

console_printf(0x3000, "%p %p %p\n", &big_data[0], &big_data[4086], &big_data[4096]);

This prints

0x102000 0x102ff6 0x103000

So what if we map virtual address 0x102000 to physical address 0x103000? Then the memory space will look like this after running the two calls to strcpy:

(va: 0x102000/pa: 0x103000): mazing
...
(va: 0x102ff6/pa: 0x103ff6): CS 61 Is A
(va: 0x103000/pa: 0x103000): mazing

This looks good! When we print the bytes at virtual address 0x102ff6, we will see CS 61 Is Amazing. Remember, virtual address 0x103000 is still mapped to physical address 0x103000 because we didn’t modify that mapping!

So if we add the following line of code to process_setup, we can perform the mapping described above:

vmiter(p->pagetable, 0x102000).map(0x103000, PTE_PWU);

(it doesn’t really matter where this code goes, but I put it after the assignment to reg_rip).

If we didn’t want to rely on the fact that the pagetable is an identity mapping, we could look up the physical address of virtual address 0x103000 and use that for the mapping:

vmiter(p->pagetable, 0x102000).map(vmiter(p->pagetable, 0x103000).pa(), PTE_PWU);

P5. Examining faults

1.

Here is the assembly for most of the recursive f function in p-recurse (the important bits):

__noinline unsigned f(unsigned i) {
  100000:	f3 0f 1e fa          	endbr64
  100004:	55                   	push   %rbp
  100005:	48 89 e5             	mov    %rsp,%rbp
  100008:	53                   	push   %rbx
  100009:	48 83 ec 08          	sub    $0x8,%rsp
  10000d:	89 fb                	mov    %edi,%ebx
  10000f:	85 ff                	test   %edi,%edi
  100011:	75 09                	jne    10001c <f(unsigned int)+0x1c>
  100013:	89 d8                	mov    %ebx,%eax
  100015:	48 83 c4 08          	add    $0x8,%rsp
  100019:	5b                   	pop    %rbx
  10001a:	5d                   	pop    %rbp
  10001b:	c3                   	retq
  ...

The only parts we care about for determining the size of the stack frame are the function entry or exit. At the entry we see:

push   %rbp
push   %rbx
sub    $0x8,%rsp

This uses 24 bytes of space on the stack. Combined with the fact that there is also a 64-bit return address on the stack for this function (pushed by the call instruction), is appears the stack frame for this function is 32 bytes. (Similarly at the exit, we see the corresponding instructions to undo the sub, push, and call instructions.)

2.

Testing around with the for loop maximum (by default it’s 10), I found that 127 is the smallest number where it still succeeds.

3.

For each process, WeensyOS allocates a 1-page stack. This means the stack is 4096 bytes. If there are 127 (recursive) calls to f, and each one uses 32 bytes of stack space, we use 4064 bytes of stack space, so we are fine (remember process_main also uses some stack space). If there are 128 calls, we would need 4096 bytes of stack space for just the recursive calls, and since process_main is using some stack space too, there is not enough space!

4.

Let’s allocate two stack pages for every process instead of just 1.

uintptr_t stack_addr = last_addr - PAGESIZE;
pages[stack_addr / PAGESIZE].refcount = 1;
vmiter(p->pagetable, stack_addr).map(stack_addr, PTE_P | PTE_W | PTE_U);

uintptr_t stack_addr_extra = last_addr - PAGESIZE - PAGESIZE;
pages[stack_addr_extra / PAGESIZE].refcount = 1;
vmiter(p->pagetable, stack_addr_extra).map(stack_addr_extra, PTE_P | PTE_W | PTE_U);

p->regs.reg_rsp = stack_addr + PAGESIZE;

Now we can handle more depth without faulting. With bigger numbers, the stack would still overflow though… As food for thought, is it possible to dynamically grow the stack if a process needs more memory? How could this be implemented?