Building a Read Cache
Learning objectives
- Explain how read performance changes in the presence of a cache
- Observe the effects of the differt caches available in libraries and the operating system
- Build a simple read cache.
- Comfortably Tackle Assignment 3
On Thursday you observed the effects that caching makes when we write files. Today, we will first observe the effects of caching on reads, and then we'll build our own simple cache. The techniques and even some of the code you write today should help you get started with assignment 3.
Getting Started
Today's code is in cs61-exercises/l10. When you first type make
you
should see all the r0N benchmarks build, but you'll get complaints that
r01-cache and randcache don't build. That's fine.
You will find a collection of read benchmarks named r01-r09. Each one
will read the file data
and print out performance numbers. (Ignore
randread
for now.) First, use diff
to determine how the benchmarks
are different. (You will undoubtedly want to compare byte benchmarks to
byte benchmarks and block benchmarks to block benchmarks.) Then predict
how the performance of each benchmark will compare to the others given
the differences you've observed -- force yourself to make comparisons
between diferent variations of byte benchmarks, between byte and block
benchmarks, and between any others where you might reasonably be able to
form a decent hypothesis. Write down your predictions in a file named
analysis.txt
. Finally, run each benchmark, recording the final
performance. IMPORTANT: You should run sudo ./dropcaches
before
each test. This program causes the operating system to clear out its
file cache, so that each benchmark starts in the same state. What do you
suppose would happen if you didn't do this? (Put that answer in your
analysis.txt
file too.)
Based on the results you obtained above, formulate hypotheses about how the stdio and file system buffer caches work. (You need write only a couple of sentences, but indicate which benchmark results you used to make your hypotheses.)
Building a Cache
Next, we're going to see if you can build a cache that will make r01-byte.c perform as well as r05-stdiobyte.c.
Hacker Edition
If you are feeling confident about all this material and want a real
challenge; stop reading here and go to town building a cache. The only
change you may make to r01-byte.c is to change the read(fd, buf, 1)
and open("data", O_RDONLY)
calls to call your routines. OK, that's a
bit of an overstatement -- you may also change the error message and add
include statements. Your read equivalent must return the next byte in
the file.
You should build your cache so that it can effectively handle the
workload in randread
.
Standard Edition
Don't worry if you don't feel prepared to tackle this without any guidance -- that's what we are here for. You will be able to do this -- just follow our advice below. Pay attention to the methodology we use walking through this implementation. At each step, we strive to have a working system that you can test to make sure it's doing what you think. We encourage you to approach your problem sets this way. All too often, it is tempting to write a large pile of code and only when you've written more code than you can possibly keep in your brain are you ready to build it and see if it works. Instead, try making small incremental changes that you can test as you go.
You will notice that we've created two files for you: cache.c
and
cache.h
. In these files, we've declared two functions,
cache_file_open
and cache_file_read
, that replace the open
and
read
system calls. Start by:
- Recording the performance you get before modifying
r01-byte.c
. - Copying
r01-byte.c
tor01-cache.c
. - Modifying
r01-cache.c
to call the caching routines instead of the system calls. - Building
r01-cache
and making sure it runs (since the cache routines currently do nothing, you'll get 0 bytes read -- that's fine).
Let's start by making our caching version behave identically to the
original program. How would you do that? Implement that and make sure
you get nearly the same performance from r01-byte
and r01-cache
(Don't forget to run dropcaches
before running any benchmark.)
Now, what would we really like to happen when a program calls our cache
routines? We'll begin by trying to get get r01-cache
to get the same
performance we got from r03-block
. It would seem that achieving that
performance is going to require that we make fewer calls to read
.
Making fewer calls to read
requires that we read more bytes each time,
and reading more bytes each time suggests that we need a place to store
bytes that we've read but not yet returned to the caller of
cache_file_read
.
OK, this seems to suggest that we're going to need a data structure that
we associate with an open file, and we're going to need a way to
associate that data structure with the file. Recall that file
descriptors are small integers. You may assume that your cache will
never be called with a file descriptor greater than 10 (but you should
still code defensively against bad inputs -- check out the manual page
for assert
).
TODO: Design a data structure that you'll use to keep track of data
that you've read but not yet returned to the caller. Place your data
structure declaration and definition in cache.c
. Be sure to design a
way to find the data structure corresponding to a particular file when
cache_file_read
is called. (Hint: recall that we said you may assume
that you'll never get a file descriptor larger than 10.)
Make sure that your declarations are correct by confirming that
r01-cache
still builds. Next you will need to create and/or initialize
your data structures during open. The precise code you write will vary
depending on your system design, but you should now be ready to write
the complete implementation of cache_file_open
.
TODO: You guessed it, write file_cache_open
. Make sure your
program r01-cache
still builds and runs.
You're almost there! Now you have to do the actual caching. This means
that inside of file_cache_read
, you'll want to:
- Check whether you have cached data you can immediately return.
- If you do not have data to immediately return, you'll need to perform
a
read
to get some data. - You'll want to update your data structures.
- You'll want to return the data to the caller.
TODO: Go for it! See if you can write file_cache_read
.
When you have something you think is right, compare its performance to
that of r03-block
and r05-stdiobyte
. How did you do? Can you explain
any differences you observed? You might find the UNIX time
command
useful in interpreting your data. (You might want to consult the man
page to understand the different numbers that time
reports.)
What you've probably just built is a single slot cache. That is, you have only a single cache entry (per file, perhaps). Even so, you should find that it improves performance significantly.
Congratulations! When you begin assignment 3, think back to this exercise -- you may reuse any code or ideas you used here, just be sure to document any collaboration that occurred.
A Multi-slot cache
There are many access patterns that your single slot cache may not be
able to handle well. For example, the program randread
randomly reads
a file in blocksize units, but you should find that its performance is
not as good as that of r03-block
. Although this program does not read
its blocks in sequential order, it generates the block numbers to read
according to a skewed distribution, which means that some blocks are
much more likely than other blocks. As the workload becomes more skewed,
a cache should become more effective (because you increase the
likelihood that you access a block you accessed before).
TODO: Your job is to implement a new function cache_read_block
that improves the performance of randread
. Your cache should perform
better than the current randread
for all workloads. Copy randread.c
into a new file randcache
and make your modifications there; that way
you'll be able to compare performance easily.
You can change various aspects of the workload on the command line:
-b blocksize -f file -o ops -p skew
If you change the blocksize, you
may need to change your cache block size as well. file
indicates the
input file from which to read, ops
is the number of blocks to read,
and skew
is a number indicating how skewed the distribution of block
numbers is. The default value for p is 75.0. Increasing p makes the
distribution more skewed; decreasing it makes it less skewed.
Your cache my be no larger than 100 KB. See if you can get performance
better than r06-stdioblock
.
You will need to augment your current data structures to:
- Keep more than one block in cache
- Identify each block that you do cache
- Remember the location of the file pointer (the location of the
next byte to read), in case one intersperses calls to cache_file_read
with calls to cache_read_block
.
You will need to modify your existing code to
- Use the current location of the file pointer.
You will need to implement cache_read_block
.
Implementation Notes/Hints
The data that you keep to identify a block in the cache is called a
cache tag. We encourage you to make your cache tag the address of the
first byte of a block. Although your cache_read_block
function is
defined in terms of blocks and therefore you'll always generate the
address that matches a tag, this will not always be the case. If a
program continues to use the cache_read
function, you may need to
locate a position in the file other than the first one in a cache block.
There are ways to make this simple and efficient.
We're going to go out on a limb here and assume that you'll cache things that are a power of 2 in size. Imagine that your block size is 16 bytes (this is probably not a great cache block size). Write, in binary, the starting offsets of the first four blocks of a file. What do you notice?
When we have cache blocks that are a power of 2 size and we want to
locate a given offset, we can view the offset as having two parts: a
block number and an offset within a block. Continuing our 16-byte
example, let's say that we want to find the block containing the offset
52. First we write 52 in binary: 00110100
. We can then consider the
low 4 bits the offset into a cache block and the high 4 bits a block
number. Therefore we would expect to find offset 52 in block number 3 at
offset 4.
You will undoubtedly find it handy to define some helper functions or macros (#define statements) that let you convert from file offsets to block offsets and from file offsets to cache tags. To convert from a file offset to a block offset, we want to consider only the low order bits. You might notice that subtracting 1 from the blocksize produces a number with ones in exactly those low-order bits (we sometimes call this value a mask, because it lets us mask out certain bits). What logical operation could you use to produce just the low order bits from a file position? What other logical operation(s) might you use to convert from a file offset to a cache tag (a value with 0's in all the low order bits)?
Finishing Up
Once you've got your cache working and your timing results look good for
the caching version of randread
, modify your cache to collect
statistics. Provide a function cache_stats
that prints out the cache
hit rate; call this function at the end of randcache.c
Please fill out this survey.