Building a Read Cache

Learning objectives

Explain how read performance changes in the presence of a cache
Observe the effects of the differt caches available in libraries and the operating system
Build a simple read cache.
Comfortably Tackle Assignment 3

On Thursday you observed the effects that caching makes when we write files. Today, we will first observe the effects of caching on reads, and then we'll build our own simple cache. The techniques and even some of the code you write today should help you get started with assignment 3.

Getting Started

Today's code is in cs61-exercises/l10. When you first type make you should see all the r0N benchmarks build, but you'll get complaints that r01-cache and randcache don't build. That's fine.

You will find a collection of read benchmarks named r01-r09. Each one will read the file data and print out performance numbers. (Ignore randread for now.) First, use diff to determine how the benchmarks are different. (You will undoubtedly want to compare byte benchmarks to byte benchmarks and block benchmarks to block benchmarks.) Then predict how the performance of each benchmark will compare to the others given the differences you've observed -- force yourself to make comparisons between diferent variations of byte benchmarks, between byte and block benchmarks, and between any others where you might reasonably be able to form a decent hypothesis. Write down your predictions in a file named analysis.txt. Finally, run each benchmark, recording the final performance. IMPORTANT: You should run sudo ./dropcaches before each test. This program causes the operating system to clear out its file cache, so that each benchmark starts in the same state. What do you suppose would happen if you didn't do this? (Put that answer in your analysis.txt file too.)

Based on the results you obtained above, formulate hypotheses about how the stdio and file system buffer caches work. (You need write only a couple of sentences, but indicate which benchmark results you used to make your hypotheses.)

Building a Cache

Next, we're going to see if you can build a cache that will make r01-byte.c perform as well as r05-stdiobyte.c.

Hacker Edition

If you are feeling confident about all this material and want a real challenge; stop reading here and go to town building a cache. The only change you may make to r01-byte.c is to change the read(fd, buf, 1) and open("data", O_RDONLY) calls to call your routines. OK, that's a bit of an overstatement -- you may also change the error message and add include statements. Your read equivalent must return the next byte in the file.

You should build your cache so that it can effectively handle the workload in randread.

Standard Edition

Don't worry if you don't feel prepared to tackle this without any guidance -- that's what we are here for. You will be able to do this -- just follow our advice below. Pay attention to the methodology we use walking through this implementation. At each step, we strive to have a working system that you can test to make sure it's doing what you think. We encourage you to approach your problem sets this way. All too often, it is tempting to write a large pile of code and only when you've written more code than you can possibly keep in your brain are you ready to build it and see if it works. Instead, try making small incremental changes that you can test as you go.

You will notice that we've created two files for you: cache.c and cache.h. In these files, we've declared two functions, cache_file_open and cache_file_read, that replace the open and read system calls. Start by:

Recording the performance you get before modifying r01-byte.c.
Copying r01-byte.c to r01-cache.c.
Modifying r01-cache.c to call the caching routines instead of the system calls.
Building r01-cache and making sure it runs (since the cache routines currently do nothing, you'll get 0 bytes read -- that's fine).

Let's start by making our caching version behave identically to the original program. How would you do that? Implement that and make sure you get nearly the same performance from r01-byte and r01-cache (Don't forget to run dropcaches before running any benchmark.)

Now, what would we really like to happen when a program calls our cache routines? We'll begin by trying to get get r01-cache to get the same performance we got from r03-block. It would seem that achieving that performance is going to require that we make fewer calls to read. Making fewer calls to read requires that we read more bytes each time, and reading more bytes each time suggests that we need a place to store bytes that we've read but not yet returned to the caller of cache_file_read.

OK, this seems to suggest that we're going to need a data structure that we associate with an open file, and we're going to need a way to associate that data structure with the file. Recall that file descriptors are small integers. You may assume that your cache will never be called with a file descriptor greater than 10 (but you should still code defensively against bad inputs -- check out the manual page for assert).

TODO: Design a data structure that you'll use to keep track of data that you've read but not yet returned to the caller. Place your data structure declaration and definition in cache.c. Be sure to design a way to find the data structure corresponding to a particular file when cache_file_read is called. (Hint: recall that we said you may assume that you'll never get a file descriptor larger than 10.)

Make sure that your declarations are correct by confirming that r01-cache still builds. Next you will need to create and/or initialize your data structures during open. The precise code you write will vary depending on your system design, but you should now be ready to write the complete implementation of cache_file_open.

TODO: You guessed it, write file_cache_open. Make sure your program r01-cache still builds and runs.

You're almost there! Now you have to do the actual caching. This means that inside of file_cache_read, you'll want to:

Check whether you have cached data you can immediately return.
If you do not have data to immediately return, you'll need to perform a read to get some data.
You'll want to update your data structures.
You'll want to return the data to the caller.

TODO: Go for it! See if you can write file_cache_read.

When you have something you think is right, compare its performance to that of r03-block and r05-stdiobyte. How did you do? Can you explain any differences you observed? You might find the UNIX time command useful in interpreting your data. (You might want to consult the man page to understand the different numbers that time reports.)

What you've probably just built is a single slot cache. That is, you have only a single cache entry (per file, perhaps). Even so, you should find that it improves performance significantly.

Congratulations! When you begin assignment 3, think back to this exercise -- you may reuse any code or ideas you used here, just be sure to document any collaboration that occurred.

A Multi-slot cache

There are many access patterns that your single slot cache may not be able to handle well. For example, the program randread randomly reads a file in blocksize units, but you should find that its performance is not as good as that of r03-block. Although this program does not read its blocks in sequential order, it generates the block numbers to read according to a skewed distribution, which means that some blocks are much more likely than other blocks. As the workload becomes more skewed, a cache should become more effective (because you increase the likelihood that you access a block you accessed before).

TODO: Your job is to implement a new function cache_read_block that improves the performance of randread. Your cache should perform better than the current randread for all workloads. Copy randread.c into a new file randcache and make your modifications there; that way you'll be able to compare performance easily.

You can change various aspects of the workload on the command line: -b blocksize -f file -o ops -p skew If you change the blocksize, you may need to change your cache block size as well. file indicates the input file from which to read, ops is the number of blocks to read, and skew is a number indicating how skewed the distribution of block numbers is. The default value for p is 75.0. Increasing p makes the distribution more skewed; decreasing it makes it less skewed.

Your cache my be no larger than 100 KB. See if you can get performance better than r06-stdioblock.

You will need to augment your current data structures to:

Keep more than one block in cache
Identify each block that you do cache
Remember the location of the file pointer (the location of the

next byte to read), in case one intersperses calls to cache_file_read with calls to cache_read_block.

You will need to modify your existing code to

Use the current location of the file pointer.

You will need to implement cache_read_block.

Implementation Notes/Hints

The data that you keep to identify a block in the cache is called a cache tag. We encourage you to make your cache tag the address of the first byte of a block. Although your cache_read_block function is defined in terms of blocks and therefore you'll always generate the address that matches a tag, this will not always be the case. If a program continues to use the cache_read function, you may need to locate a position in the file other than the first one in a cache block. There are ways to make this simple and efficient.

We're going to go out on a limb here and assume that you'll cache things that are a power of 2 in size. Imagine that your block size is 16 bytes (this is probably not a great cache block size). Write, in binary, the starting offsets of the first four blocks of a file. What do you notice?

When we have cache blocks that are a power of 2 size and we want to locate a given offset, we can view the offset as having two parts: a block number and an offset within a block. Continuing our 16-byte example, let's say that we want to find the block containing the offset 52. First we write 52 in binary: 00110100. We can then consider the low 4 bits the offset into a cache block and the high 4 bits a block number. Therefore we would expect to find offset 52 in block number 3 at offset 4.

You will undoubtedly find it handy to define some helper functions or macros (#define statements) that let you convert from file offsets to block offsets and from file offsets to cache tags. To convert from a file offset to a block offset, we want to consider only the low order bits. You might notice that subtracting 1 from the blocksize produces a number with ones in exactly those low-order bits (we sometimes call this value a mask, because it lets us mask out certain bits). What logical operation could you use to produce just the low order bits from a file position? What other logical operation(s) might you use to convert from a file offset to a cache tag (a value with 0's in all the low order bits)?

Finishing Up

Once you've got your cache working and your timing results look good for the caching version of randread, modify your cache to collect statistics. Provide a function cache_stats that prints out the cache hit rate; call this function at the end of randcache.c

Please fill out this survey.