- 1 Introduction
- 2 Large Code Bases
- 3 Processes
- 4 Exercise: Weensy Memory-Mapped I/O
- Provides tips and examples for working with large code bases
- Explains how a process is created
- Describes what a process looks like in memory
Large Code Bases
This problem set might be the largest code base you've ever dealt with. As such, we're offering tips and tools for working with sizable repositories. Ideally, you'll apply these tips and tools to both your problem set and your future programming work!
Your best goal in working with a large code base is conservation of your human memory. There’s so much code; you need to remember some aspects of the code, but remembering everything about the code is impossible and useless.
You don’t need to completely understand a code base to make progress, and working programmers never try to understand a whole code base at once. The problem set structure guides you to increasing understanding of WeensyOS: aim to understand only what’s needed to complete each stage in turn.
To get even that gradual understanding, do what working programmers do: develop skills, and use tools, for navigating a code base you understand only partially.
The biggest tip we can offer is to follow interesting threads through the code base. For example, you might ask, “What kinds of processes can be run?” To follow that thread, you’d start from a function that runs a process. A little bit of browsing turns up a function called
run that takes a
proc* argument: sounds interesting! Wonder what it does? Well, its header comment tells you:
// run(p) // Run process
p. This means reloading all the registers // from
iretinstructions. // // As a side effect, sets
current = p.
Interesting, now we know about the
current variable. We can also check the first lines of the
run() implementation to see if there’s anything meaningful. There is:
assert(p->p_state == P_RUNNABLE);
It looks like all processes that can be run have
p_state member set to
When you want to find a function in code, what do you do? Here are a couple very useful tools.
grep and git grep
grep command searches files you name for a given string or regular expression. For instance,
grep run *.c prints out all lines that match the pattern
run in files named
*.c in the current directory.
man grep for more.
git-managed code base, such as WeesnyOS, there is an even better tool:
git grep searches just the files that are checked in to your git repository. For instance,
git grep run prints out all lines that match the pattern
run in files checked in to
man git-grep for more.
- grep by default is case sensitive, so you can use the flags
--ignore-caseto ignore case.
- Another useful flag for both grep and git grep is the
-nflag, which shows line numbers next to matches
Ctags creates an index of the functions, variables, macros, etc. of a project, making it easy to jump to a definition when you know a name.
- On Ubuntu, run
sudo apt-get install ctags
- Once installed, you can run
ctags -R --tag-relative .to generate a tags file recursively, but you may want to hide it away by using a command like this
ctags -R --tag-relative -f ./.git/tags .
- You can also use this magical process which update the tags file every time you use git to change the working directory
- In an editor like vim, you can use commands like
:tag function namewhich is described briefly here,
- Further, you can use
Ctrl+]to jump to the a function definition, and
Ctrl+tto jump back
- However the CtrlP plugin is also very nice
- Further, you can use
For those of you who use Sublime, it has its own indexing engine with a lot of handy tools.
- If you don't already have Sublime installed on your VM, you can follow the instructions at https://www.sublimetext.com/docs/3/linux_repositories.html#apt to do so. Select the "Stable" version, not the "Dev".
- To take advantage of Sublime's indexing features, we need to be inside a folder or "project", where a project is essentially a collection of folders.
- Under the
Viewmenu, make sure "Side Bar" is selected.
- Open a folder. If you want to make this a project, go to
Project -> Save Project Asand choose a location.
- With the folder/project opened, you'll be able to use the handy project tools and see the directory structure in the side bar.
- To find the definition of a function, highlight the function's name and navigate to
Goto -> Goto Definition...
Let's try using some of these tools together in the
s09ramdiskos directory. This directory contains a small operating system called WeensyOS that we'll be using for today's exercises.
git grepto find which files include "process.h" in them.
ctagsrecursively in the
s09ramdiskosdirectory, and then use it to find the definition of
- Use Sublime's indexing engine to find where
sys_panicis defined in
A process is a program in execution. More specifically, a process consists of an address space and at least one thread of execution.
As a reminder, the address space consists of several logical areas, including the text segment, data segment, heap segment, and stack segment. A page in the address space is 4096 contiguous bytes of memory on x86_64.
A process's address space is designed to be isolated from other processes -- this is why we have virtual memory. Thus, the page tables that translate virtual addresses to physical addresses are process specific. This allows us to give virtual address X in Process A a different physical meaning from virtual address X in Process B.
To keep track of all these different processes and their associated data, the kernel maintains a data structure that contains the metadata of all running processes. One can imagine that a process structure would contain information such as its address space, page tables, and other implementation-specific information.
The second thing that a process has is at least one thread of execution (we'll talk about multithreading more later on in the semester). A thread is a container for the execution state of a logical flow of control, where execution state can include register values and other data.
Importantly, a thread includes the value of the instruction pointer (%rip on x86-64). When a thread is stopped, all of the current registers are saved into memory. When a thread resumes execution, the registers are repopulated with the values that were stored in memory. This is what allows a OS to run hundreds or thousands of "concurrent" threads/processes with only 1 or a couple CPU cores. In reality, threads and processes are simply started and stopped to give the impression that every process is running at the same time.
As a concrete example of thread behavior, let's use our tools for large code bases to find out how and where registers are stored and restored in WeensyOS!
Process Creation (Fork/Exec)
On UNIX-like systems such as Linux and MacOS, the OS creates an initial process from which all other processes can be spawned. The creation of this initial process is beyond the scope of this course, but we would like to talk about where all other processes come from.
The only way new processes are created is when one process "forks" another process. By this we mean that an existing process asks the OS to make an identical copy of itself in a new address space, and with a new PID that the OS assigns it.
fork() is a system call that copies a process's entire address space and execution state. When the system call is completed, both the parent process (the one which called fork()) and the child process (the one that fork() created) are almost identical. They even continue execution at the same point! Let's look at
fork_2.c together to see an example of the behavior of fork.
If we want the child process to run a different program than the parent, the exec() system call is used. exec() asks the OS to replace the current address space and execution state with an entirely new process. Of course, the exact details are implementation specific, but the combination of fork() + exec() is what happens when you start a program from the shell! The shell forks itself and immediately execs the program you asked it to run in the child process. Let's look at
fork_and_exec.c together to see an example of the behavior of exec.
Exercise: Weensy Memory-Mapped I/O
Your problem set requires you to provide memory isolation among processes. We know that processes share certain resources, such as the kernel. What happens when a process makes a system call? Specifically, how is isolation maintained? Let's find out!
Check out the
cs61-sections/s09ramdiskos directory (you’ll need to pull first). This directory contains a WeensyOS built around a ramdisk, which is a model of a normal operating system’s buffer cache. The ramdisk is stored in kernel memory in the
ramdisk character array. The kernel first fills the initial portion of the ramdisk with some data. Then the
p-reader process repeatedly reads from the ramdisk by calling the
sys_read system call and prints what it finds.
Look at the code, particularly
p-reader.c, and understand how it works.
Note: This weensy OS has kernel isolation (the kernel’s memory is protected from processes by the page table), but not full process isolation (all processes share the same page table,
Part 1: Write system call
Uncomment the body of
p-writer.c. Now your kernel will fail to compile, because the
sys_write system call is missing. So add it!
- Add a system call number for write to
process.h, basing it off
- Add a handler for your write system call to
When you’re done, the output of
p-reader should eventually become all “61”.
Part 2: System call checking
Our implementation of
sys_read doesn’t check its arguments. As a result, a process can kill the kernel by calling
sys_read (and likely
First, verify this: change either
p-writer to kill the kernel (the OS should crash in some way).
Second, fix it: change
sys_write to validate their arguments to provide kernel isolation. A full check will involve some arithmetic and some calls to
Part 3: Memory mapping
Uncomment the body of
p-mapreader.c. Now your kernel will fail to compile, because the
sys_mmap system call is missing. So add it!
- Add a system call number for mmap, and a definition for
process.h. The argument order is described in
p-mapreader.c. You don’t need to handle an
NULL(the user must always provide a valid
- Add a handler for your mmap system call to
kernel.c. (You don’t need to validate arguments at first, but once you’ve got it working, do go back and validate arguments.)
When you’re done, your output should alternate between
mapread lines, each printing the same thing (and both gradually becoming all “61”).
Part 4: Memory mapping for writes
Uncomment the body of
p-mapwriter.c, and re-comment the body of
p-writer.c. Then add support for read/write memory mapping!
- Add a definition for
- Update your mmap system call so that the mapping is read/write only if
When you’re done, the
mapread lines should gradually become all “61”, because
p-mapwriter is writing into the ramdisk.