Kernel 1: Robustness and safety

What is the kernel?

Kernel is the OS software running with full privilege over machine operations. Its jobs include:

User land, the virtual environment provided by the kernel where user programs run: happy fun-time villege, sun is shining and everything looks perfect.

Kernel land: scary.

Evil program

Let's look at the following simple program, which in early days can bring down entire computer systems:

int main() {
    while (1) {
    }
}

This simple program compiles down to only 3 instructions

5fa:    push    $rbp
5fb:    mov     %rsp,%rbp
5fe:    jmp     5fe

When running this program in a modern OS, like Ubuntu Linux, macOS, and Windows, this program can't really do any damage. Even though the program appears stuck in an infinite loop, our computer is still responsive, and we can do other tasks running on the same OS just fine. We can even kill the problematic program by hitting Ctrl+C in the terminal. You may take these behaviors for granted, but a huge amount of effort went into the design and engineering of both OS software and processor hardware to enable these.

Protected control transfer

Recall what we saw how the stdio library functions invoke system calls like read() and write(). We saw in assembly that a syscall instruction was invoked, and it only returns after the system call was finished. This syscall instruction is a key interface via which user processes can interact with the kernel. It implements a form of Protected Control Transfer -- it transfers control of the processor to the kernel in a safe and limited way.

Protected control transfer is safe because a process can only enter kernel at well-specified entry points. The process can't just jump to random code reside within the kernel.

Every process's address space contains a portion (usually the higher half of the address space) reserved for the kernel, and many kernel code reside there. One can write a program that attempts to jump some of these instructions in the kernel:

int main() {
    unsigned long kernel_insn = 0xffffffff80000100;
    void (*f)() = (void (*)())kernel_insn;
    f();
}

This program will crash with a segmentation fault, because a user process is not allowed to access anything reserved for the kernel directly. A user process can only invoke the kernel at specific entry points by using the syscall instruction.

Q: Why must we only allow control transfer to kernel at specific points?

Answer: The kernel code executes in an environment that's privileged and unprotected, which means it has total and complete control over the machine. Preserving the integrity of the kernel's control flow is therefore extremely important, since we don't want any process on a computer to be able to execute abitrary kernel code in privelieged mode. A breach of this limitation can result in losing control over the machine to a malicous or misbehaving program.

The limitation and restriction guaranteed by protected control transfer is implemented by both the OS software and the hardware. This hardened interface between the user land and the kernel land is the conerstone of security in modern computer systems.

Memory isolation

The hardened interface between user and kernel lands not only introduces restrictions. They also make several useful features possible, like presenting different "views" of memory to different programs. Let's take a look at the program in storage4/r16-mmapbyte.cc as an example.

The "guts" of the program is shown below, which counts number of bytes in an mmapped file based on their values modulo 16:

size_t n = 0;
while (n < size) {
    memcpy(buf, &file_data[n], 1);
    n += 1;
    if (n % PRINT_FREQUENCY == 0) {
        report(n, tstamp() - start);
    }
    histogram[(unsigned char) buf[0] % 16] += 1;
}

If we run this program, we will see output like

0: 51199999
6:        1

This program accesses the file using mmap, a form of memory-mapped I/O. We can examine the region of memory used for I/O on the file by printing out the memory addresses of the buffer (file_data) returned by mmap.

We notice that the file_data buffer is located in the heap and changes every time the program runs.

It was shown in lecture that during a particular run of the program, the buffer file_data located at address 0x7fae4adc3000. It looked like a stack address, but an actual stack address, as shown by printing out the address of the file_data pointer variable itself, should be around 0x7ffdfebf07a8, which is about 319 GB above the location of the file_data buffer. So the buffer was actually in the heap.

We can suspend the histogram computation program, launch another program writing to the same file using memory-mapped I/O. We observe that the memory-mapped file buffer is located at a different address from the file buffer in the suspended program. However, the memory appears truly shared, because all data that was written by one program gets reflected in the other program's histogram computation once we resume it.

How can two programs seemingly access the exact same piece of memory at different memory addresses?

This is an effect of another important feature of the hardened interface between the kernel and the user: memory isolation. Memory isolation provides:

Without explicit sharing, all user programs' memories are completely isolated from each other.

Process vs. program

We've talked about programs and process, as if they are equivalent. In precise terms, a program is just any piece of code that can be executed. It can be a file that sits on a disk, or it can be the image of an application that's currently being executed. Process specifics refers to programs that are currently being executed on a system. In general, in the operating system context, when we say programs or processes we refer to user-level software running under the protection and isolation of the kernel, although technically the kernel is also a program.