Learning Objectives
- Develop intuition about how caching of various types affects program performance
- Explain how caching is used at different layers of the I/O path to improve performance
- Read and understand assignment 3
Getting Started
Today's code is in the cs61-exercises repository under l09. Pull from
our repository and use make
to build today's programs.
Different ways to write data
The three programs w01_sync.c, w02_syscall.c, and w03_stdio.c
all do
essentially the same thing: they write data to a stdio (and if stdio is
your terminal, they write to a file called data). However, they do so in
rather different ways with rather different results.
w01_sync.c
Run this program. When I run it on the appliance, it takes approximately half a second and reports a couple of thousand characters per second. When I run it on my laptop, it runs a bit faster and reports close to 6000 characters per second. But wait a minute -- both of these seem painfully slow! Even a disk ought to be able to write more than 6000 bytes per second, right? Let's examine the source code to see what this program is doing.
Use the code to answer the following three questions: 1. How is the
program writing the data to the file? (That is, what call is it
making?)
2. How many characters are we writing per call?
3. What do each of the flags that we are passing to open
do? (Hint, if
man open
does not give you the information you need, try man 2 open
.
Section 2 of the man pages refer to system calls.)
This program runs slowly, because it is sending bytes to the disk one at
a time. Disks do not work well this way -- disks are structured in
blocks, typically of 4096 bytes. As disks are fundamentally slow to
begin with, using them this way is taking a bad situation and making it
even worse. Copy w01_sync.c
into w01_syncblock.c
and then modify
this new version to write more than one character at a time to the disk.
Try a range of sizes. What happens to the performance as you increase
the number of bytes you issue per write? (Note: once you get it
performing well, you might change the number of bytes we write back to
the initial 5120000 that we wanted to write.) Create a small table (or
google doc) with the performance you get as you change the number of
bytes you write.
We call the number of bytes you write at one time a block and refer to its block size.
Takeaway 1: Even if you are writing small things, by gathering them up in your program and writing them in much larger batches, you can make your program run much more quickly. When you do this, you are creating a write cache.
w02_syscall
Next, compare w01_sync
with w02_syscal
(you might find the diff
program helpful here).
What has changed?
How do you suppose that change will affect the performance you see?
Run the program and compare the performance you get with what you got
when you first ran w01_sync
. We're guessing that you see a pretty
dramatic difference. Let's start using some diagnostic tools to help us
figure out what's happening. strace
is a handy tool for watching how a
program uses the operating system. It has a gazillion bells and
whistles, which you may find useful while completing assignment 2, but
for now, let's just have it tell us what system calls each program is
making. Try running:
jharvard@appliance: strace -c ./w01_sync
jharvard@appliance: strace -c ./w02_syscall
Note: Your programs will run quite a bit more slowly under strace
;
don't let that throw you -- ignore the bytes/sec number and focus on the
final printout that strace
produces. (you should also run
w01_syncblock
under dtrace as well. Make sure you understand the
results that you see?)
What do you notice about the number of calls to write
in w01_sync
and w01_syscall
?
Disks can only process one request at a time and even if you have a
solid state disk, it can typically only process a few (~4) requests at a
time. Develop a hypothesis about why or how w02_syscall
can run so
much more quickly than w01_sync
. Check out your hypothesis by
talking with another group at your table or by talking with one of the
teaching staff.
Now that you have a theory about why w02_syscall
ran so much more
quickly than w01_sync
did, predict what will happen when you modify
w02_syscall
to write more bytes at a time as you did with w01_sync
.
Then, copy w02_syscall.c
to w02_syscallblock.c
and make the
necessary changes and see what happens. Can you explain why you get the
results you do? Complete the next Takeaway.
Takeaway #2: Even if you continue to write one byte at a time,
removing the O_SYNC
flag improves performance because
w03_stdio
Now compare w02_syscall.c
to w03_stdio.c
. What is the difference?
While write
is a system call (a call directly to the operating
system), fwrite
is not. Where is fwrite
implemented?
Now run w03_stdio
. How does its performance compare with the other
programs you ran? Can you explain why?
Run strace
on the program to A) determine you if your explanation is
correct or B) help you formulate an explanation. Can you explain your
results? After you have explained your results, change the value of
size
from 51,200 to 5,120,000.
Do you think you can make w03_stdio
faster? Give it a try (copying the
source to w03_stdioblock.c
) and making the same kinds of changes you
made to the two previous benchmarks).
What results did you get? Can you explain them?
Stepping Back a Bit
When we talk about the cost of performing I/O, it is useful to separate the cost into two parts: the fixed part and the variable part. The fixed part is the cost required that is independent of how many bytes you write; the variable part is the part that varies depending on how many bytes you write.
Let's call the fixed cost F and the variable cost V (per byte). If we are writing N bytes, our cost will be F + NV.
If F >> V, should you batch your data into large batches or would small batches probably be OK?
If V >> F, should you batch your data into large batches or would small batches probably be OK?
Based on the numbers you get for the various programs with which we've experimented, what would you guess is the relationship between:
- The fixed cost of system calls versus library calls.
- The fixed cost of synchronous disk I/O (i.e, actually writing data to persistent memory) versus the cost of not synchronously writing.
Before you leave
Please take a minute and fill out this survey.