Overview
In lecture, we discuss virtual memory.
Full lecture notes on kernel — Textbook readings
Last time: Eve’s infinite loop attack
- “The worst attack in the world”
if (n % 1024 == 0) {
console_printf(0x0E00, "Hi, I'm Eve! #%u\n", n);
while (true) {}
}
Kernel solution: Timer interrupts
- Hardware (an alarm-clock-like device attached to the processor) interrupts the processor every 1/100 sec
- Interrupt gives the kernel control no matter what
- Even if Eve is infinite-looping!
- This protected control transfer saves Eve’s processor state (registers)
- Eve can be restarted transparently
- From Eve’s perspective, instructions execute one at a time without interruption
Implementing timer interrupts
void kernel_start(const char* command) {
// initialize hardware
init_hardware();
init_timer(100); // 100 Hz ***
void exception(regstate* regs) {
...
switch (regs->reg_intno) {
case INT_IRQ + IRQ_TIMER:
// handle timer interrupt
lapicstate::get().ack(); // reset timer
schedule(); // run a different process
}
Timer interrupts and CPU starvation
- Alice still runs very slowly; why?
- Alice uses very little of each “timeslice” they are given, but Eve uses all of each timeslice
- Solution?
- Shorter timeslices
Eve attacks kernel memory
uint8_t* ip = (uint8_t*) 0x40ec1; // address of `syscall` from `obj/kernel.sym`
ip[0] = 0xeb;
ip[1] = 0xfe;
(void) sys_getpid();
What happened?
- Eve has written an infinite loop into kernel memory
- Specifically, Eve overwrote the instructions for
syscall
- With an infinite loop
- We must protect kernel memory from unprivileged access
Hardware support for isolation
- Process isolation needs every computer resource to be protected
- Protect some resources by hiding them
- Unprivileged processes cannot access them directly
- The kernel provides abstractions of these resources
- For example, only the kernel gets direct access to input/output devices like disk drives and keyboards; processes get access to a file abstraction instead
- Access to the resource requires system calls
- This works best for resources that are slow or rarely accessed
- Protect some resources by adding privilege checks to hardware
- Unprivileged processes directly access the processor (CPU instructions) and primary memory
- These resources are fast and constantly accessed
- Requiring explicit kernel intervention would be soooo slow
Implementing hardware privilege checks
- Depends on the resource
- Processor time: Process must not monopolize processor indefinitely
- Kernel configures timer interrupt
- Primary memory: Process must not access memory outside its isolated domain
- How to give different processes different views of memory?
- How to define a domain?
- How to enforce a domain?
Plato’s cave
Pokémon grandpa
What is virtual memory?
- The processor is Pokémon grandpa
CPU and memory
Memory blocks are called pages
The CPU’s view of memory can change
A mapping controls the relationship between virtual and physical memory
- The kernel controls page tables
Kernel can see all of physical memory
No protection if Eve has same rights as kernel
So give Eve a different view of memory!
- Use different page table, or permissions within a page table
Protected control transfer changes view of memory
Managing page tables with vmiter
Virtual memory, abstractly
- Processor accesses memory through a layer of indirection called virtual memory
- Virtual memory mapping function \mathscr{P} : \textit{VA} \mapsto \textit{PA}
- The addresses used by instructions are virtual addresses in VA
- The contents of memory chips are addressed by physical addresses in PA
- When an instruction accesses virtual address x, the processor accesses physical address \mathscr{P}(x)
Faults
- Address accesses can cause a fault (a processor exception)
- Really \mathscr{P} : \textit{VA} \mapsto \textit{PA} + 💩
- Fault causes processor to protected control transfer to the kernel
- Uses the same path as timer interrupts
- Kernel can kill process, attempt to patch fault, etc.
Virtual memory for kernel isolation
- How can virtual memory protect kernel data from processes?
- The kernel configures the mapping function processes use
- Process mapping functions only grant access to the physical memory containing process data
- Only the kernel can change mapping functions
Virtual memory for process isolation
- How can virtual memory protect one process’s data from another process?
- The kernel configures different mapping functions for each process
Virtual memory and kernel execution
- The kernel needs access to all memory
- It also must query and modify the memory mapping seen by processes (for example, to perform system calls)
- On WeensyOS, the kernel runs with an identity mapping function \mathscr{P}_\text{id}(x) = x that grants access to all physical memory
vmiter
helps kernel query and modify memory as seen by a process- Different operating systems and processor architectures make different choices
x86-64 virtual memory: Addresses
- Virtual addresses VA range over [0, 264)
- Well, almost (actually [0, 247) ∪ [264−247, 264))
- Physical addresses PA range over [0, NUMBER OF BYTES OF MEMORY YOU BOUGHT)
- On my laptop, [0, 234)
- Many many more virtual addresses than physical addresses!
- Potentially complicated to define a mapping function?
x86-64 virtual memory: Pages
- x86-64 memory is organized in aligned contiguous ranges called pages
- Mapping functions on x86-64 implemented by a page table
- x86-64 page is 212 = 4096 contiguous bytes of memory
- Starting at an address that is a multiple of 212
- Any virtual page can map onto any physical page
- But within a page, addresses map in fixed, piecewise-linear fashion
- All addresses in one virtual page map to the same physical page
- If any address in a virtual page faults, all addresses in the page fault
- Think of pages like fixed bricks—can’t change the mappings within a brick
- Mathematically
- \mathscr{P}(p \ll 12) = x \ll 12 for some integer x
- \mathscr{P}((p \ll 12) + o) = \mathscr{P}(p \ll 12) + o whenever 0\leq o <2^{12}
Why pages?
- Simplify and speed up hardware
- “Nearby” addresses are translated the same way
- Simplify specification of memory mapping functions
- A factor of 212 fewer mappings to define
- Consequence: Cannot protect memory at finer granularity
x86-64 virtual memory: Permissions and modes
- x86-64 virtual memory mappings are sensitive to type of access and access privilege
- These are specified with permission flags
PTE_P
: The mapping is PresentPTE_W
: The mapping allows Writing (if absent, all accesses must be read)PTE_U
: The mapping allows Unprivileged/User access (if absent, only kernel can access)
- Every process virtual memory mapping function must contain some kernel mappings
- But those mappings should lack
PTE_U
- But those mappings should lack
Eve attacks kernel memory
uint8_t* ip = (uint8_t*) 0x4103c;
// address of `syscall` from `obj/kernel.sym`
ip[0] = 0xeb;
ip[1] = 0xfe;
(void) sys_getpid();
Kernel fights back!
- Changes memory mapping function
- Handles the fault
x86-64 page table details
- x86-64 implements virtual memory mappings using four level page table
- Data structure: 512-ary tree indexed by parts of the virtual address
- Special
%cr3
register holds address of current page table- Changing
%cr3
requires privilege %cr3
must be a physical address
- Changing