Kernel 2: Kernel isolation, protected control transfer

Overview

In lecture, we investigate our own kernel, discuss kernel isolation, and walk through a protected control transfer.

Full lecture notes on kernelTextbook readings

Our kernel: WeensyOS

WeensyOS commands

Emulation

Booting: How a computer starts up

  1. Computer turns on
  2. Built-in hardware initializes the system
  3. Built-in hardware loads a small program called the boot loader from a fixed location on attached storage (Flash memory, disk)
    • Operating in a very constrained environment
  4. Boot loader initializes the processor and loads the kernel

Watching the boot sequence

Kernel isolation

Alice and Eve

Why protected control transfer?

System call in depth

  1. User calls wrapper function for system call
  2. Wrapper function prepares registers, executes syscall instruction
  3. Processor performs protected control transfer to kernel
    • Switches privilege, starts executing kernel code at pre-configured address
  4. Kernel entry point saves processor registers so process can be restarted
  5. Kernel syscall function handles system call

Yielding in depth

Protected control transfer

Yielding in depth 1

1. p-alice calls sys_yield

void process_main() {
    unsigned n = 0;
    while (true) {
        ++n;
        if (n % 1024 == 0) {
            console_printf(0x0F00, "Hi, I'm Alice! #%u\n", n);
        }
        sys_yield();    // <- ********
    }
}

Yielding in depth 2

2. sys_yield prepares registers, executes syscall instruction

__noinline int sys_yield() {
    return make_syscall(SYSCALL_YIELD);
}
__always_inline uintptr_t make_syscall(int syscallno) {
    register uintptr_t rax asm("rax") = syscallno;
    asm volatile ("syscall"
            : "+a" (rax)
            : /* all input registers are also output registers */
            : "cc", "memory", "rcx", "rdx", "rsi", "rdi", "r8", "r9",
              "r10", "r11");
    return rax;
}
obj/p-alice.asm
0000000000100ba0 <sys_yield()>:
  100ba0: f3 0f 1e fa           endbr64 
  100ba4: b8 02 00 00 00        mov    $0x2,%eax       ; `SYSCALL_YIELD` defined in `lib.hh`
  100ba9: 0f 05                 syscall 
  100bab: c3                    retq   

Yielding in depth 3

3. Processor performs protected control transfer

Why does syscall work this way?

Yielding in depth 4

4. Kernel entry point saves processor state, changes stack pointer to kernel stack

_Z13syscall_entryv:
        movq %rsp, KERNEL_STACK_TOP - 16 // save entry %rsp to kernel stack
        movq $KERNEL_STACK_TOP, %rsp     // change to kernel stack

        // structure used by `iret`:
        pushq $(SEGSEL_APP_DATA + 3)   // %ss
        subq $8, %rsp                  // skip saved %rsp
        pushq %r11                     // %rflags
        ...

        // call syscall()
        movq %rsp, %rdi
        call _Z7syscallP8regstate
        ...

Why switch to kernel stack?

Yielding in depth 5

5. syscall function in kernel runs; its argument, regs, contains a copy of all processor registers at the time of the system call

uintptr_t syscall(regstate* regs) {
    // Copy the saved registers into the `current` process descriptor.
    current->regs = *regs;
    regs = &current->regs;
    ...
    switch (regs->reg_rax) {
    case SYSCALL_YIELD:
        current->regs.reg_rax = 0;
        schedule();

Returning from a protected control transfer

Process state

struct proc {
    x86_64_pagetable* pagetable;        // process's page table
    pid_t pid;                          // process ID
    int state;                          // process state (see above)
    regstate regs;                      // process's current registers
    // The first 4 members of `proc` must not change, but you can add more.
};

extern proc ptable[16];

Kernel state note

Eve attacks

        if (n % 1024 == 0) {
            console_printf(0x0E00, "Hi, I'm Eve! #%u\n", n);
            while (true) {}
        }
obj/p-eve.asm
  14004e: be 6d 0c 14 00        mov    $0x140c6d,%esi
  140053: bf 00 0e 00 00        mov    $0xe00,%edi
  140058: b8 00 00 00 00        mov    $0x0,%eax
  14005d: e8 d1 0a 00 00        callq  140b33 <console_printf(int, char 
const*, ...)>
  140062: eb fe                 jmp    140062 <process_main()+0x62>    ; ****

Defending against processor time attack

Voluntary vs. involuntary privileged control transfer

Setting up timer interrupts in kernel

void kernel_start(const char* command) {
    // initialize hardware
    init_hardware();
    init_timer(100);    // 100 Hz ***
void exception(regstate* regs) {
    ...
    switch (regs->reg_intno) {
    case INT_IRQ + IRQ_TIMER:
        // handle timer interrupt
        lapicstate::get().ack();    // reset timer
        schedule();                 // run a different process
}

Eve attacks kernel memory

uint8_t* ip = (uint8_t*) 0x4103c;   // address of `syscall` from `obj/kernel.sym`
ip[0] = 0xeb;
ip[1] = 0xfe;
(void) sys_getpid();