Kernel 1: Processes, isolation, robustness

Overview

In this first lecture in the kernel unit, we introduce the goals of operating systems and introduce our tiny operating system in depth.

Full lecture notes on kernelTextbook readings

Calling convention and red zone

Typical stack frame layout

Stack frame with base pointer

Other uses of condition flags

System instructions

What is happening behind syscall??

syscall invokes the kernel

Goal: Process isolation

Goal: Kernel isolation

Process isolation’s consequences for hardware design

Exception: Some experimental systems have fully-trusted source code chains, where, for example, the compiler has been proven correct, and all code running on the machine passes through the trusted compiler. In these systems, it’s theoretically possible to implement process isolation without hardware support.

Some processor features we’ll investigate

Why learn about kernels?

Alice and Eve in WeensyOS

WeensyOS commands

Emulation

Protected control transfer

System call in depth

  1. User calls wrapper function for system call
  2. Wrapper function prepares registers, executes syscall instruction
  3. Processor performs protected control transfer to kernel
    • Switches privilege, starts executing kernel code at pre-configured address
  4. Kernel entry point saves processor registers so process can be restarted
  5. Kernel syscall() function handles system call

Yielding in depth

Protected control transfer

Yielding in depth 1

1. p-alice calls sys_yield

void process_main() {
    unsigned n = 0;
    while (true) {
        ++n;
        if (n % 1024 == 0) {
            console_printf(0x0F00, "Hi, I'm Alice! #%u\n", n);
        }
        sys_yield();    // <- ********
    }
}

Yielding in depth 2

2. sys_yield prepares registers, executes syscall instruction

int sys_yield() {
    return make_syscall(SYSCALL_YIELD);
}
uintptr_t make_syscall(int syscallno) {
    register uintptr_t rax asm("rax") = syscallno;
    asm volatile ("syscall"
            : "+a" (rax)
            : /* all input registers are also output registers */
            : "cc", "memory", "rcx", "rdx", "rsi", "rdi", "r8", "r9",
              "r10", "r11");
    return rax;
}
obj/p-alice.asm
0000000000100ba0 <sys_yield()>:
  100ba0: f3 0f 1e fa           endbr64 
  100ba4: b8 02 00 00 00        mov    $0x2,%eax       ; `SYSCALL_YIELD` defined in `lib.hh`
  100ba9: 0f 05                 syscall 
  100bab: c3                    retq   

Yielding in depth 3

3. Processor performs protected control transfer

Why does syscall work this way?

Yielding in depth 4

4. Kernel entry point saves processor state, changes stack pointer to kernel stack

_Z13syscall_entryv:
        movq %rsp, KERNEL_STACK_TOP - 16 // save entry %rsp to kernel stack
        movq $KERNEL_STACK_TOP, %rsp     // change to kernel stack

        // structure used by `iret`:
        pushq $(SEGSEL_APP_DATA + 3)   // %ss
        subq $8, %rsp                  // skip saved %rsp
        pushq %r11                     // %rflags
        ...

        // call syscall()
        movq %rsp, %rdi
        call _Z7syscallP8regstate
        ...

Why switch to kernel stack?

Yielding in depth 5

5. syscall function in kernel runs; its argument, regs, contains a copy of all processor registers at the time of the system call

uintptr_t syscall(regstate* regs) {
    // Copy the saved registers into the `current` process descriptor.
    current->regs = *regs;
    regs = &current->regs;
    ...
    switch (regs->reg_rax) {
    case SYSCALL_YIELD:
        current->regs.reg_rax = 0;
        schedule();

Returning from a protected control transfer

Process state

struct proc {
    x86_64_pagetable* pagetable;        // process's page table
    pid_t pid;                          // process ID
    int state;                          // process state (see above)
    regstate regs;                      // process's current registers
    // The first 4 members of `proc` must not change, but you can add more.
};

extern proc ptable[16];

Kernel state note

Eve attacks

        if (n % 1024 == 0) {
            console_printf(0x0E00, "Hi, I'm Eve! #%u\n", n);
            while (true) {}
        }
obj/p-eve.asm
  14004e: be 6d 0c 14 00        mov    $0x140c6d,%esi
  140053: bf 00 0e 00 00        mov    $0xe00,%edi
  140058: b8 00 00 00 00        mov    $0x0,%eax
  14005d: e8 d1 0a 00 00        callq  140b33 <console_printf(int, char const*, ...)>
  140062: eb fe                 jmp    140062 <process_main()+0x62>    ; ****

Defending against processor time attack

Voluntary vs. involuntary privileged control transfer

Implementing timer interrupts

void kernel_start(const char* command) {
    // initialize hardware
    init_hardware();
    init_timer(100);    // 100 Hz ***
void exception(regstate* regs) {
    ...
    switch (regs->reg_intno) {
    case INT_IRQ + IRQ_TIMER:
        // handle timer interrupt
        lapicstate::get().ack();    // reset timer
        schedule();                 // run a different process
}

Booting: How a computer starts up

  1. Computer turns on
  2. Built-in hardware initializes the system
  3. Built-in hardware loads a small, extremely constrained program called the boot loader from a fixed location on attached storage (Flash memory, disk)
  4. Boot loader initializes the processor and loads the kernel