Kernel 2: Protected control transfer, virtual memory

Overview

In lecture, we discuss protected control transfers and virtual memory.

Full lecture notes on kernelTextbook readings

Eve attacks

        if (n % 1024 == 0) {
            console_printf(CS_YELLOW "Hi, I'm Eve! #%u\n", n);
            while (true) {}
        }
obj/p-eve.asm
  140046: 89 de                 mov    %ebx,%esi
  140048: be 6d 0c 14 00        mov    $0x141171,%edi
  14004d: b8 00 00 00 00        mov    $0x0,%eax
  140052: e8 10 10 00 00        callq  141067 <console_printf(char const*, ...)>
  140057: 90                    nop
  140058: eb fe                 jmp    140058 <process_main()+0x58>    ; ****

Defending against processor time attack

Implementing timer interrupts

void kernel_start(const char* command) {
    // initialize hardware
    init_hardware();
    init_timer(100);    // 100 Hz
void exception(regstate* regs) {
    ...
    switch (regs->reg_intno) {
    case INT_IRQ + IRQ_TIMER:
        // handle timer interrupt
        lapicstate::get().ack();    // reset timer
        schedule();                 // run a different process

Interrupts and CPU starvation

Calling convention for interrupts

Differences between functions and interrupts

Protected control transfer

Inside an interrupt

Voluntary process → kernel control transfer

Inside a system call

Yielding in depth

Protected control transfer

Yielding in depth 1

1. p-alice calls sys_yield

void process_main() {
    unsigned n = 0;
    while (true) {
        ++n;
        if (n % 1024 == 0) {
            console_printf(CS_NORMAL "Hi, I'm Alice! #%u\n", n);
        }
        sys_yield();    // <- ********
    }
}

Yielding in depth 2

2. sys_yield prepares registers, executes syscall instruction

int sys_yield() {
    return make_syscall(SYSCALL_YIELD);
}
uintptr_t make_syscall(int syscallno) {
    register uintptr_t rax asm("rax") = syscallno;
    asm volatile ("syscall"
            : "+a" (rax)
            : /* all input registers are also output registers */
            : "cc", "memory", "rcx", "rdx", "rsi", "rdi", "r8", "r9",
              "r10", "r11");
    return rax;
}
obj/p-alice.asm
0000000000100ba0 <sys_yield()>:
  100ba0: f3 0f 1e fa           endbr64 
  100ba4: b8 02 00 00 00        mov    $0x2,%eax       ; `SYSCALL_YIELD` defined in `lib.hh`
  100ba9: 0f 05                 syscall 
  100bab: c3                    retq   

Yielding in depth 3

3. Processor performs protected control transfer

Why does syscall work this way?

Yielding in depth 4

4. Kernel entry point saves processor state, changes stack pointer to kernel stack

_Z13syscall_entryv:
        movq %rsp, KERNEL_STACK_TOP - 16 // save entry %rsp to kernel stack
        movq $KERNEL_STACK_TOP, %rsp     // change to kernel stack

        // structure used by `iret`:
        pushq $(SEGSEL_APP_DATA + 3)   // %ss
        subq $8, %rsp                  // skip saved %rsp
        pushq %r11                     // %rflags
        ...

        // call syscall()
        movq %rsp, %rdi
        call _Z7syscallP8regstate
        ...

Why switch to kernel stack?

Yielding in depth 5

5. syscall function in kernel runs; its argument, regs, contains a copy of all processor registers at the time of the system call

uintptr_t syscall(regstate* regs) {
    // Copy the saved registers into the `current` process descriptor.
    current->regs = *regs;
    regs = &current->regs;
    ...
    switch (regs->reg_rax) {
    case SYSCALL_YIELD:
        current->regs.reg_rax = 0;
        schedule();

Returning from a protected control transfer

Process state

struct proc {
    x86_64_pagetable* pagetable;        // process's page table
    pid_t pid;                          // process ID
    int state;                          // process state (see above)
    regstate regs;                      // process's current registers
    // The first 4 members of `proc` must not change, but you can add more.
};

extern proc ptable[16];

Kernel state note

Memory protection

Eve attacks kernel memory

uint8_t* ip = (uint8_t*) 0x40ed0;   // address of `_Z7syscall...` from `obj/kernel.sym`
ip[0] = 0xeb;
ip[1] = 0xfe;
(void) sys_getpid();

What happened?

Invisibility cloak

Exercise: Tradeoffs

Virtual memory

Invisibility cloak via virtual memory

Virtual memory performance

Paged virtual memory: Look up once per block

vmiter

vmiter mappings

Using vmiter to isolate the kernel

    for (; it.va() < MEMSIZE_PHYSICAL; it += PAGESIZE) {
        uintptr_t addr = it.va();
        int perm = PTE_P | PTE_W | PTE_U;
        if (addr == 0) {
            // nullptr is inaccessible even to the kernel
            perm = 0;
        } else if (addr < PROC_START_ADDR && addr != CONSOLE_ADDR) {
            perm = PTE_P | PTE_W;
        }
        // install identity mapping
        int r = it.try_map(addr, perm);
        assert(r == 0);
    }

Tries and x86-64 page tables