Systems Programming and Machine Organization
This is the 2012 version of the course. Main site
Assignment 4: Caching and File I/O
In this assignment, you’ll gain experience with caching by writing your own buffered I/O library.
- Assigned Fri 11/2
- Due Wed 11/14 at 11:59pm for college students (1 day later for extension)
- This assignment may be completed in pairs.
Get the code
Our code is released in the
cs61-psets repository. To update it, run
git remote show handout
If this reports an error, run
git remote add handout git://code.seas.harvard.edu/cs61/cs61-psets.git
git pull handout master. This will merge our Assignment 4 code with your previous work. If you have any “conflicts” from Assignment 3, resolve them before continuing further. Run
git push to save your work back to code.seas.
Please use an explicit merge to create your repository. If you copy code by hand, our automated scripts will have trouble analyzing your code, and it’ll be harder for you to incorporate our updates.
You may also create a new
cs61-psets repository for this assignment.
We’ve released a very simple IO61 I/O library, in
io61.c. Our version of IO61 uses character-at-a-time system call I/O and is thus quite slow. Your goal is simple: speed it up using caching.
We also distribute several programs that use IO61.
cat61reads a file sequentially one character at a time and writes it to standard output.
cat61, but reads the file sequentially one block at a time. Its
-b BLOCKSIZEargument sets the block size.
blockcat61, but it chooses a random block size for each read and write. It still reads and writes sequentially (only the block sizes are random). The
-bargument is the maximum possible block size. This will test your buffering package.
stride61reads its input file using a strided access pattern, and outputs a transformed file. The
-sarguments set the block size and stride length, respectively.
reordercat61reads its input file one block at a time in random order and writes each block to the output file at the correct position. The result is a copy of the input file, but created non-sequentially.
reverse61reads a file a character at a time in reverse and writes the result to standard output.
Introduce buffers (that is, caches) to the
io61_file abstraction and use them to speed up IO61 operations.
We’re giving you tons of freedom to implement the file caches as you like. You may even use memory-mapped I/O, prefetching system calls like
posix_fadvise, or multiple threads or processes (although none of these are required). But you may not use another buffered I/O library or caching I/O library.
For full credit your code should:
- On sequential reads and writes for any files, including very large files (such as files that don’t fit in memory) and non-seekable files, perform at least as well as stdio.
- Generate correct results for all access patterns and file types, including very large files, non-seekable files, and random accesses.
- Improve the performance of some complex access patterns, such as simple strides or reverse access, relative to stdio.
We will grade your work based on how much you speed up sample executions of the handout programs—possibly among others—with points deducted for incorrect results.
All your code should fit in
We will evaluate you based on your code’s performance relative to a version of IO61 that uses stdio. That version is provided for you for testing in
stdio-io61.c (and the makefile builds
stdio-blockcat61, and so forth).
make check to see how your current implementation is doing on a battery of tests. This will also print summary statistics at the end. Run
make check-TESTNUMBERS (e.g.,
make check-9 or
make check-5-10 or
make check-1,2,3) to run selected tests. For full credit you need not beat stdio on all tests, but you certainly should beat stdio on more than half the tests.
We will also measure your code’s performance against that of other students in class. :)
We may update the tests during the pset release period.
- Read our file descriptors notes.
- Write code to handle sequential I/O first. This isn’t so hard and covers many important cases.
- You will likely need to change every function in
- To handle other cases, such as reordered I/O and particularly stride I/O, you will need to be clever and implement more complex cache algorithms. For instance, might stride I/O benefit from an associative cache?
- Some of our tests, particularly on shorter files, are noisy, and can produce different timing results when run multiple times.
- Stdio is a well-written package, so don’t feel bad if you can’t always beat it.
Extra credit: Deadlock
We also release a program,
pipeexchange61, that demonstrates a problem with conventional blocking I/O. The
pipeexchange61 program forks two copies of itself. One copy, the requester, sends requests to the other copy, the responder. The requester can send many requests back-to-back and only then wait for responses (a form of batching). When you run
pipeexchange61, the program appears to get stuck! Why? Can you construct an I/O library that unsticks
pipeexchange61, without modifying
You will turn in your code by pushing your git repository to code.seas.harvard.edu. Inform us ASAP if you have changed partner or repository from pset 3.
Don't forget to fill out
This pset was created for CS61, Fall 2012.