Kernel Section 3: Pipes

College students: Attendance at section is required and recorded for all college students. All college students should fill out this attendance form when attending section. For more on the attendance policy and what to do if you miss section refer to the syllabus here and here.

Extension School students: Extension School students are required to self-report weekly section participation using this participation form. There are two ways to participate in section: (1) attend section live (for most students, this is one of the zoom sections, but local students are welcome to attend in-person sections) OR (2) watch a section recording (Canvas > Zoom > Published Recordings). Both participation methods are equally valid towards your participation grade.

This section bridges the kernel unit with our next unit, on data storage, by investigating the implementation of one of the most important and powerful abstractions provided by modern operating system kernels: the pipe.

Update your section code repository before beginning.

Part A: Inter-process communication

Processes in WeensyOS, like processes in any real operating system, are primarily isolated—one process can’t modify another’s memory, for example. But processes totally isolated from one another would be useless, and abstractions that allow processes to communicate are an important aspect of OS design.

Real OS inter-process communication mechanisms include:

The file system: One process can write a file that another process can read.
Emergency signals: One process can interrupt the operation of another, causing the interrupted process to exit prematurely.
Wait notification: One process can receive notification when another process exits.

We’ll focus on a particularly important communication channel, the byte stream.

Byte stream communication works like this.

A byte stream connects a writer process to a reader process. The writer sends bytes to the stream, and the reader receives bytes from the stream.
When the writer sends a possibly-infinite sequence of bytes b_0, b_1, \dots, the reader receives those bytes in the same order, with no duplicates or gaps.
Bytes are transferred only by explicit request. (The writer may perform tasks other than sending bytes, and the reader may perform tasks other than receiving bytes.)
When a reader stops requesting bytes, the writer’s attempts to send bytes will eventually fail or cause the writer to block (meaning the writer temporarily stops running). Similarly, when a writer stops sending bytes, the reader’s attempts to receive bytes will either fail or cause the reader to block.

EXERCISE A1. Take 5 minutes or so to design a system call interface for byte stream communication in WeensyOS. Think about questions like:

(Very important for prototyping!) How simple can you make the interface?

How shall a new byte stream be created?

Which processes should have access to the byte stream?

System calls are inherently expensive, since they involve protected control transfers. Can you minimize the number of system calls it takes to transfer a message from a writer process to a reader process?

How should bytes be stored when they are in transit (between when the writer sends them and when the reader receives them)?

Part B: Byte streams in WeensyOS

The WeensyOS we handed out in cs61-sections/kernels1 supports byte streams already! Run make run-pipe to see it in action.

The writer process is defined in p-pipewriter.cc, the reader process in p-pipereader.cc. The writer repeatedly (1) picks a random message to send (each message is terminated by a newline '\n'), (2) writes that message to the system pipe, (3) prints a success note to the screen, and then (4) waits for 1–3 seconds. The reader repeatedly (1) reads a message up to a newline and then (2) prints a success note to the screen.

These processes use system calls sys_pipewrite and sys_piperead, which access a single system-wide byte stream called the system pipe. Their specifications are:

// sys_pipewrite(buf, sz)
//    Copy up to `sz` bytes of data from `buf` into the system pipe.
//    Returns number of bytes written or a negative error code (if, for
//    instance, the system pipe’s data transfer buffer is full).
ssize_t sys_pipewrite(const void* buf, size_t sz);

// sys_piperead(buf, sz)
//    Read up to `sz` bytes of data from the system pipe into `buf`.
//    Bytes read are removed from the system pipe.
//    Returns number of bytes read or a negative error code (if, for
//    instance, the system pipe’s data transfer buffer is empty).
ssize_t sys_piperead(void* buf, size_t sz);

EXERCISE B1. How does this interface differ from the interface you designed? Is it simpler or more complicated? How does the interface stack up against the questions in Exercise A1?

This interface is simpler than many student choices have been in the past, making it easier to prototype. The main simplifications concern access control and byte stream creation: There is exactly one system-wide byte stream; it is created implicitly when the OS boots; and any process can write to it or read from it. These would be bad choices for a production operating system, since in production isolation becomes super important. However, the interface does allow a process to write or read multiple bytes at once, which lets processes minimize the number of system calls required to transfer a multi-byte message. Since we’re focusing on performance in this section, it’s a good tradeoff to minimize the interface’s isolation-oriented features and build out its performance-oriented features.

The kernel implementations of these system calls live in kernel.cc; here they are.

char pipebuf[1];
size_t pipebuf_len = 0;

ssize_t syscall_pipewrite(const char* buf, size_t sz) {
    // See `sys_pipewrite` in `u-lib.cc` for specification.
    if (sz == 0) {
        // nothing to write
        return 0;
    } else if (pipebuf_len == 1) {
        // kernel buffer full, process should try again
        return E_AGAIN;
    } else {
        // write one byte
        pipebuf[0] = buf[0];
        pipebuf_len = 1;
        return 1;
    }
}

ssize_t syscall_piperead(char* buf, size_t sz) {
    // See `sys_piperead` in `u-lib.cc` for specification.
    if (sz == 0) {
        // no room to read
        return 0;
    } else if (pipebuf_len == 0) {
        // kernel buffer empty, process should try again
        return E_AGAIN;
    } else {
        // read one byte
        buf[0] = pipebuf[0];
        pipebuf_len = 0;
        return 1;
    }
}

EXERCISE B2. Where are bytes stored when they are in transit (i.e., after the writer sends them, but before the reader receives them)?

In pipebuf.

EXERCISE B3. What’s the maximum number of in-transit bytes that can stored simultaneously?

One.

EXERCISE B4. What’s the minimum number of sys_pipewrite system calls required to write an N-byte message?

N.

EXERCISE B4. What’s the minimum number of sys_piperead system calls required to read an N-byte message?

N.

EXERCISE B5. When will sys_pipewrite return zero?

Only if asked to write zero bytes.

EXERCISE B6. Run make run-pipe and check out the number of system calls actually required to write and read N-byte messages. What do you observe?
The actual number of system calls required is vastly larger than the minimum! For instance, here’s the first portion of one run:
20813 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m20806 sys_piperead calls for 16B: read I am a plumber.
[m38486 sys_pipewrite calls for 28B: wrote I'm talking through a pipe.
[33m401709 sys_piperead calls for 28B: read I'm talking through a pipe.
[m20037 sys_pipewrite calls for 15B: wrote Hello, friend!
[33m185395 sys_piperead calls for 15B: read Hello, friend!
[m38577 sys_pipewrite calls for 28B: wrote I'm talking through a pipe.
[33m430689 sys_piperead calls for 28B: read I'm talking through a pipe.
The first message ("I am a plumber.\n"—16 bytes) took around 1300 system calls per byte to write and the same number to read. The next message took around 1370 system calls per byte to write, but more than 14000 system calls per byte to read! The other messages are similar.

Part C: Scheduling

In the rest of section, you’ll improve the kernel’s implementation of sys_pipewrite and sys_piperead to reduce the number of system calls per message as much as possible. There are three main techniques you’ll use: changing scheduling (the order processes run), increasing the transfer buffer size, and implementing blocking.

EXERCISE C1. Why does the pipe writer take so many system calls to write a message?

Hint: A good first step is to observe what’s happening in more detail. You may want to use log_printf to record all system calls made in order. For example, uncomment this line in syscall() in kernel.cc:
// log_printf("p%d: %s\n", current->pid, syscall_name(regs->reg_rax));
log_printf slows down WeensyOS a lot, so the system call counts will decrease, but the effects will still be visible.
We see a log like this:
Starting WeensyOS
p1: pipewrite
p1: pipewrite
p1: pipewrite
p1: pipewrite
p1: pipewrite
p1: pipewrite
p1: pipewrite
p2: piperead
p2: piperead
p2: piperead
p2: piperead
p2: piperead
p2: piperead
p2: piperead
p2: piperead
p2: piperead
p1: pipewrite
p1: pipewrite
...
So many pipewrite calls before the first piperead call, and similarly, many piperead calls before the next pipewrite call. This is definitely inefficient. Consider pipewrite: when the pipe buffer is full (which happens after the first pipewrite), the pipewrite system call returns the E_AGAIN error code—after which the pipewriter just immediately tries again, and fails again, and again and again, until the pipe reader gets a chance to run.

EXERCISE C2. Change the sys_pipewrite system call implementation to reduce this problem.
The intuition here is that when a system call like sys_pipewrite returns the E_AGAIN error code, then the kernel should run a different process. This seems likely to improve performance because the E_AGAIN message means that the current process is currently unable to do useful work.

We can tell the kernel to run another process, rather than the current process, by calling the schedule() kernel function. But remember that the schedule() function wipes out the kernel’s memory of how it’s processing the sys_pipewrite system call. When the current process runs again, it will pick up as if the sys_pipewrite system call had just returned. So before calling schedule(), we must set up the calling process’s regs.reg_rax to record the E_AGAIN return value.
ssize_t syscall_pipewrite(const char* buf, size_t sz) {
    ...
    } else if (pipebuf_len == 1) {
        // kernel buffer full, process should try again
        current->regs.reg_rax = E_AGAIN;
        schedule();
    ...
When we run make run-pipe with this change, the number of system calls per written byte goes way down, to roughly two per byte.
29 sys_pipewrite calls for 15B: wrote Hello, friend!
[33m18599 sys_piperead calls for 15B: read Hello, friend!
[m31 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m272628 sys_piperead calls for 16B: read I am a plumber.

EXERCISE C3. Change the sys_piperead system call implementation to reduce this problem.

The intuition is the same, and the code is also almost the same, but the effects are dramatic only for the first message.

ssize_t syscall_piperead(char* buf, size_t sz) {
    ...
    } else if (pipebuf_len == 0) {
        // kernel buffer empty, process should try again
        current->regs.reg_rax = E_AGAIN;
        schedule();
    ...

29 sys_pipewrite calls for 15B: wrote Hello, friend!
[33m29 sys_piperead calls for 15B: read Hello, friend!
[m31 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m92675 sys_piperead calls for 16B: read I am a plumber.
[m31 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m109870 sys_piperead calls for 16B: read I am a plumber.

Can you figure out why this is?

Part D: Buffer size

Even with these changes, it takes a minimum of one system call per byte to transfer a message. This is because the WeensyOS transfer buffer, which stores bytes in transit, is only one byte big.

EXERCISE D1. Change the size of the pipe buffer to fit more than one byte, and change the sys_pipewrite and sys_piperead handlers to use this bigger pipe buffer. Can you get the pipe writer down to 1 or 2 system calls per message?

Hint: Don’t be overly concerned with the performance of the system call implementation—feel free to copy data within the buffer as convenient, or to copy one byte at a time in an in-kernel loop.

Here are our solutions for sys_pipewrite and sys_piperead.

char pipebuf[100];
size_t pipebuf_len = 0;

ssize_t syscall_pipewrite(const char* buf, size_t sz) {
    // See `sys_pipewrite` in `u-lib.cc` for specification.
    size_t nw = 0;
    // write bytes until end of message or no more room in buffer
    while (nw != sz && pipebuf_len != sizeof(pipebuf)) {
        pipebuf[pipebuf_len] = buf[nw];
        ++nw;
        ++pipebuf_len;
    }
    if (nw == 0 && sz != 0) {
        // kernel buffer full, process should try again
        current->regs.reg_rax = E_AGAIN;
        schedule();
    }
    return nw;
}

ssize_t syscall_piperead(char* buf, size_t sz) {
    // See `sys_piperead` in `u-lib.cc` for specification.
    size_t nr = 0;
    // read bytes until out of space or no more bytes in buffer
    while (nr != sz && nr != pipebuf_len) {
        buf[nr] = pipebuf[nr];
        ++nr;
    }
    if (nr == 0 && sz != 0) {
        // kernel buffer empty, process should try again
        current->regs.reg_rax = E_AGAIN;
        schedule();
    }
    // might have bytes leftover in the buffer
    memmove(&pipebuf[0], &pipebuf[nr], pipebuf_len - nr);
    pipebuf_len -= nr;
    return nr;
}

Note that we only return E_AGAIN if no bytes were read or written (but the user’s buffer wasn’t empty), and that in syscall_piperead we are careful to rearrange the buffer if the user read fewer bytes than the kernel buffer contained.

After the pipewrite change, the pipe writer makes it down to 1 system call per message:

1 sys_pipewrite calls for 15B: wrote Hello, friend!
[33m15 sys_piperead calls for 15B: read Hello, friend!
[m1 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m92527 sys_piperead calls for 16B: read I am a plumber.
[m1 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m109069 sys_piperead calls for 16B: read I am a plumber.

After both changes, the pipe reader can read the first message in 1 system call, but still takes a bunch for all subsequent messages:

1 sys_pipewrite calls for 15B: wrote Hello, friend!
[33m1 sys_piperead calls for 15B: read Hello, friend!
[m1 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m89318 sys_piperead calls for 16B: read I am a plumber.
[m1 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m104689 sys_piperead calls for 16B: read I am a plumber.

EXERCISE D2. Why is the pipe reader still making so many system calls on the second and subsequent messages?

The pipe writer pauses for 1–3 seconds after each message. During that time, the kernel’s transfer buffer stands empty. But the pipe reader pounds the kernel with sys_piperead system calls, all of which return E_AGAIN.

Part E: Blocking

In the final part of the section, change the WeensyOS kernel so that the relevant system calls block instead of poll.

Blocking and polling are different techniques that processes can use while waiting for an event. In a polling implementation, a process remains runnable and constantly checks for whether an event has occurred. (It’s like a kid in a car ride: “Are we there yet? Are we there yet? Are we there yet? Are we there yet?”) Polling is often relatively easy to implement and can detect quickly when an event occurs, but while waiting for the event, polling wastes a lot of CPU time on active checks. In a blocking implementation, a process becomes non-runnable until the event occurs, when it is explicitly woken up. A blocking implementation is generally more complex than a polling implementation, but more efficient.

WeensyOS represents process runnability with the proc::state member. A runnable process has ptable[PID].state == P_RUNNABLE. To block a process, set its state to P_BLOCKED.

EXERCISE E1. Update the pipe reader and/or pipe writer to avoid the behavior you analyzed in Exercise D2.

The solution here involves blocking the pipe reader when it has no work to do. That is, when a sys_piperead system call would return E_AGAIN because the transfer buffer is empty, the kernel should instead mark the current process as non-runnable. If and when the transfer buffer fills up again, the kernel should unblock any process that’s waiting for it.

Blocking introduces many interesting design questions for operating systems. (For instance, should a process block forever? Should there be a way to wake up a blocked process in an emergency?) But a simplified implementation isn’t too bad. In this code, readers are blocked when the buffer is empty, and the writer wakes up any blocked readers after writing.

// in kernel.hh:

// Process descriptor type
struct proc { ...
    int block_reason = 0;               // system call that caused block ***** add this
};


// in kernel.cc:

ssize_t syscall_pipewrite(const char* buf, size_t sz) {
    ...
        ++pipebuf_len;
        if (pipebuf_len == 1) {
            // unblock readers
            for (pid_t p = 1; p < PID_MAX; ++p) {
                if (ptable[p].state == P_BLOCKED
                    && ptable[p].block_reason == SYSCALL_PIPEREAD) {
                    ptable[p].state = P_RUNNABLE;
                    ptable[p].block_reason = 0;
                    // adjust %rip so it does the syscall again
                    ptable[p].regs.reg_rip -= 2; // length of `syscall` instruction in bytes
                }
            }
        }
    ...
}

ssize_t syscall_piperead(char* buf, size_t sz) {
    ...
    if (nr == 0 && sz != 0) {
        // kernel buffer empty, process should try again
        current->block_reason = SYSCALL_PIPEREAD;
        current->state = P_BLOCKED;
        schedule();
    }
    ...
}

This results in the minimum number of system calls per message for both writers and readers!

1 sys_pipewrite calls for 15B: wrote Hello, friend!
[33m1 sys_piperead calls for 15B: read Hello, friend!
[m1 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m1 sys_piperead calls for 16B: read I am a plumber.
[m1 sys_pipewrite calls for 16B: wrote I am a plumber.
[33m1 sys_piperead calls for 16B: read I am a plumber.

Optional advanced work

This exercise touches on several big ideas in operating systems design, with respect to both performance (scheduling, blocking vs. polling) and convenience (the pipe abstraction). Here’s how to go further.

Make the writer block if the pipe buffer fills up. It should unblock when there is space in the buffer. You’ll need to change the process code to check your work.
The easiest way to write a pipe buffer involves memmove calls to shift memory around when data is read. This can be expensive. Change your pipe buffer so no memmoves are involved. You’ll need to add at least one global variable.
Implement multiple pipes!