Learning Objectives

Develop intuition about how caching of various types affects program performance
Explain how caching is used at different layers of the I/O path to improve performance
Read and understand assignment 3

Getting Started

Today's code is in the cs61-exercises repository under l09. Pull from our repository and use make to build today's programs.

Different ways to write data

The three programs w01_sync.c, w02_syscall.c, and w03_stdio.c all do essentially the same thing: they write data to a stdio (and if stdio is your terminal, they write to a file called data). However, they do so in rather different ways with rather different results.

w01_sync.c

Run this program. When I run it on the appliance, it takes approximately half a second and reports a couple of thousand characters per second. When I run it on my laptop, it runs a bit faster and reports close to 6000 characters per second. But wait a minute -- both of these seem painfully slow! Even a disk ought to be able to write more than 6000 bytes per second, right? Let's examine the source code to see what this program is doing.

Use the code to answer the following three questions: 1. How is the program writing the data to the file? (That is, what call is it making?) 2. How many characters are we writing per call? 3. What do each of the flags that we are passing to open do? (Hint, if man open does not give you the information you need, try man 2 open. Section 2 of the man pages refer to system calls.)

This program runs slowly, because it is sending bytes to the disk one at a time. Disks do not work well this way -- disks are structured in blocks, typically of 4096 bytes. As disks are fundamentally slow to begin with, using them this way is taking a bad situation and making it even worse. Copy w01_sync.c into w01_syncblock.c and then modify this new version to write more than one character at a time to the disk. Try a range of sizes. What happens to the performance as you increase the number of bytes you issue per write? (Note: once you get it performing well, you might change the number of bytes we write back to the initial 5120000 that we wanted to write.) Create a small table (or google doc) with the performance you get as you change the number of bytes you write.

We call the number of bytes you write at one time a block and refer to its block size.

Takeaway 1: Even if you are writing small things, by gathering them up in your program and writing them in much larger batches, you can make your program run much more quickly. When you do this, you are creating a write cache.

w02_syscall

Next, compare w01_sync with w02_syscal (you might find the diff program helpful here).

What has changed?

How do you suppose that change will affect the performance you see?

Run the program and compare the performance you get with what you got when you first ran w01_sync. We're guessing that you see a pretty dramatic difference. Let's start using some diagnostic tools to help us figure out what's happening. strace is a handy tool for watching how a program uses the operating system. It has a gazillion bells and whistles, which you may find useful while completing assignment 2, but for now, let's just have it tell us what system calls each program is making. Try running:

jharvard@appliance: strace -c ./w01_sync
jharvard@appliance: strace -c ./w02_syscall

Note: Your programs will run quite a bit more slowly under strace; don't let that throw you -- ignore the bytes/sec number and focus on the final printout that strace produces. (you should also run w01_syncblock under dtrace as well. Make sure you understand the results that you see?)

What do you notice about the number of calls to write in w01_sync and w01_syscall?

Disks can only process one request at a time and even if you have a solid state disk, it can typically only process a few (~4) requests at a time. Develop a hypothesis about why or how w02_syscall can run so much more quickly than w01_sync. Check out your hypothesis by talking with another group at your table or by talking with one of the teaching staff.

Now that you have a theory about why w02_syscall ran so much more quickly than w01_sync did, predict what will happen when you modify w02_syscall to write more bytes at a time as you did with w01_sync. Then, copy w02_syscall.c to w02_syscallblock.c and make the necessary changes and see what happens. Can you explain why you get the results you do? Complete the next Takeaway.

Takeaway #2: Even if you continue to write one byte at a time, removing the O_SYNC flag improves performance because .

w03_stdio

Now compare w02_syscall.c to w03_stdio.c. What is the difference? While write is a system call (a call directly to the operating system), fwrite is not. Where is fwrite implemented?

Now run w03_stdio. How does its performance compare with the other programs you ran? Can you explain why?

Run strace on the program to A) determine you if your explanation is correct or B) help you formulate an explanation. Can you explain your results? After you have explained your results, change the value of size from 51,200 to 5,120,000.

Do you think you can make w03_stdio faster? Give it a try (copying the source to w03_stdioblock.c) and making the same kinds of changes you made to the two previous benchmarks).

What results did you get? Can you explain them?

Stepping Back a Bit

When we talk about the cost of performing I/O, it is useful to separate the cost into two parts: the fixed part and the variable part. The fixed part is the cost required that is independent of how many bytes you write; the variable part is the part that varies depending on how many bytes you write.

Let's call the fixed cost F and the variable cost V (per byte). If we are writing N bytes, our cost will be F + NV.

If F >> V, should you batch your data into large batches or would small batches probably be OK?

If V >> F, should you batch your data into large batches or would small batches probably be OK?

Based on the numbers you get for the various programs with which we've experimented, what would you guess is the relationship between:

The fixed cost of system calls versus library calls.
The fixed cost of synchronous disk I/O (i.e, actually writing data to persistent memory) versus the cost of not synchronously writing.

Before you leave

Please take a minute and fill out this survey.