Assignment 2: Caching and File I/O
In this assignment, you’ll gain experience with caching by writing your own buffered I/O library.
- Assigned Fri 9/27
- Due Fri 10/11 at 11:59pm for college students (1 day later for extension)
- This assignment may be completed in pairs
Get the code
Our code is released in the cs61-psets
repository. To update
it, run
git remote show handout
If this reports an error, run
git remote add handout
git://code.seas.harvard.edu/cs61/cs61-psets.git
Then run git pull handout master
. This will
merge our Assignment 2 code with your
previous work. If you have any “conflicts” from Assignment 1, resolve
them before continuing further. Run
git push
to save this merge back to code.seas.
Please use an explicit merge to create your repository. If you copy code by hand, our automated scripts will have trouble analyzing your code, and it’ll be harder for you to incorporate our updates.
You may also create a new cs61-psets
repository for this assignment.
Make sure you update the grading
server.
Goal
We’ve released a very simple IO61 I/O library, in io61.c
. Our
version of IO61 uses character-at-a-time system call I/O and is thus
quite slow. Your goal is simple: speed it up using caching.
We also distribute several programs that use IO61.
cat61
reads a file sequentially one character at a time and writes it to standard output.blockcat61
is likecat61
, but reads the file sequentially one block at a time. Its-b BLOCKSIZE
argument sets the block size.randomcat61
is likeblockcat61
, but it chooses a random block size for each read and write. It still reads and writes sequentially (only the block sizes are random). The-b
argument is the maximum possible block size. This will test your buffering package.stride61
reads its input file using a strided access pattern, and outputs a transformed file. The-b
and-s
arguments set the block size and stride length, respectively.reordercat61
reads its input file one block at a time in random order and writes each block to the output file at the correct position. The result is a copy of the input file, but created non-sequentially.reverse61
reads a file a character at a time in reverse and writes the result to standard output.
Introduce buffers (that is, caches) to the io61_file
abstraction and
use them to speed up IO61 operations.
We’re giving you tons of freedom to implement the file caches as you
like. You may even use memory-mapped I/O, prefetching system
calls like madvise
or
posix_fadvise
, or multiple threads or processes (although none of
these are required). But you may not use another buffered I/O
library or caching I/O library.
For full credit your library should:
- Return correct results for all access patterns and file types,
including very large files, non-seekable files, mixtures of calls (for
instance, alternating
io61_read
andio61_readc
), and random accesses. - Perform as well as stdio on sequential reads and writes for any files, including very large files (such as files that don’t fit in memory) and non-seekable files.
- Improve on stdio’s performance for some non-sequential access patterns, such as simple strides or reverse access.
We will grade your work based on how much you speed up sample executions of the handout programs—and other secret tests—with points deducted for incorrect results.
All your code should fit in io61.c
.
Evaluation
We will evaluate you based on your code’s performance relative to a
version of IO61 that uses stdio. That version is provided for you for
testing in stdio-io61.c
(and the makefile builds stdio-cat61
,
stdio-blockcat61
, and so forth).
Run make check
to see how your current implementation is doing on a
battery of tests. This will also print summary statistics at the end.
Run make check-TESTNUMBERS
(e.g., make check-9
or make check-5-10
or make check-1,2,3
) to run selected tests.
We will also measure your code’s performance against that of other students in class. :)
We may update the tests during the pset release period.
Hints
- Read our file descriptors notes.
- Write code to handle sequential I/O first. This isn’t so hard and covers many important cases.
- You will likely need to change every function in
io61.c
(exceptio61_open_check
andio61_filesize
). - To handle other cases, such as reordered I/O and particularly stride I/O, you will need to be clever and implement more complex cache algorithms. For instance, might stride I/O benefit from an associative cache?
- Some of our tests, particularly on shorter files, are noisy, and can produce different timing results when run multiple times.
- Stdio is a well-written package! It is OK if you can’t always beat it.
- Write your own tests! This will help you shake out bugs in your
code, particularly correctness bugs. What diabolical things can you
think of to try? You can add new tests pretty easily; just edit
GNUmakefile
andcheck.pl
. - You can run the tests on their own, too.
make check
creates some useful input files for testing and puts them infiles/
.
Notes
The build process for this pset may be different than you’re used to.
We’ve changed the Makefile to generate less noise, so errors will stand
out more. We’ve also changed the default compiler to clang. If you want
to see the actual compilation commands, run make V=1
. If you want to
compile with gcc, run make CC=gcc
. If you want to compile without
optimization (which might help you debug), run make O=0
or edit the
GNUmakefile to set O = 0
by default.
Extra credit: Deadlock
We also release a program, pipeexchange61
, that demonstrates a problem
with conventional blocking I/O. The pipeexchange61
program forks two
copies of itself. One copy, the requester, sends requests to the other
copy, the responder. The requester can send many requests back-to-back
and only then wait for responses (a form of batching). When you run
pipeexchange61
, the program appears to get stuck! Why? Can you
construct an I/O library that unsticks pipeexchange61
, without
modifying pipeexchange61.c
?
Turnin
You will turn in your code by pushing your git repository to code.seas.harvard.edu. Inform us on the grading server if you have changed partner or repository from pset 1.
Don't forget to fill out README.txt
.
This pset was originally created for CS61.