In this problem set, you’ll gain experience with caching by writing your own buffered I/O library.
- Due Thursday 10/31 11:59pm EDT, the witching hour (one day later for DCE).
- Reminder: You may not use code or solutions from previous years.
Get the code
Get our code with
$ git pull; git pull handout main
or, alternately
$ git pull; git pull https://github.com/cs61/cs61-f24-psets.git main
This will merge our Problem Set 4 code with your previous
work. If you have any “conflicts” from prior problem sets, resolve
them before continuing further. Run git push
to save
your work back to your personal repository.
You may also create a new cs61-psets
repository for this assignment. Don’t
forget to enter your repository URL on the grading server.
Goal
Our simple IO61 library performs I/O on files. You will find the code in the
pset directory in the file io61.cc
. Our version of IO61 is correct for all
inputs, but otherwise pretty stupid: it uses byte-at-a-time system call I/O
and is thus quite slow.
You will introduce caching to the io61_file
abstraction and use your
cache to speed up IO61 operations. We’re giving you tons of freedom to
implement the cache as you like. You may even use memory-mapped
I/O, prefetching system calls like
madvise
or posix_fadvise
, or multiple threads or processes (although
none of these are required). But you may not use another buffered I/O
library or caching I/O library.
Your library:
-
Should return correct results for all access patterns and input files, including multiple files, very large files, non-seekable files, files that cannot be memory mapped, files that return short reads or accept short writes, mixtures of calls (for instance, alternating
io61_read
andio61_readc
), and random accesses.Note that your library should not assume anything about the content of its input files.
-
Should roughly equal stdio’s performance on sequential reads and writes for any files, including very large files (such as files that don’t fit in memory) and non-seekable files.
-
Should better stdio’s performance for some non-sequential access patterns, such as simple strides or reverse access.
For full credit, make check
on performance tests must report blue or green
results on all tests (which means “at least as good as stdio”). Full credit
on mastery points will require very high performance on some
non-sequential tests.
All your code (excepting extra credit) should fit in io61.cc
.
Evaluation
We will evaluate you based on your code’s correctness and on its performance. For performance, we compare your code with a version of IO61 that uses stdio, as well as to a high-performance staff solution.
We provide a battery of test programs that use IO61. We test correctness by comparing the output of these programs to the stdio version, and performance by comparing the time it takes the programs to complete with the stdio version. The test programs include:
cat61
reads a file sequentially one byte at a time and writes it to standard output.blockcat61
is likecat61
, but reads the file sequentially one block at a time. Its-b BLOCKSIZE
argument sets the block size.randblockcat61
is likeblockcat61
, but it chooses a random block size for each read and write. It still reads and writes sequentially (only the block sizes are random). The-b
argument is the maximum possible block size. This may very well stress your caching solution.reverse61
reads a file a byte at a time in reverse and writes the result to standard output.stride61
reads its input file using a strided access pattern, writing the data it reads sequentially. By strided, we mean that it begins at offset 0, reads BLOCKSIZE bytes, seeks to position STRIDE, reads another block, seeks to position (2 * STRIDE), etc. When it gets to the end of the file, it jumps back to the first bytes not yet read and repeats the pattern. The-b
and-s
arguments set the block size and stride length, respectively.reordercat61
reads its input file one block at a time in random order and writes each block to the output file at the correct position. The result is a copy of the input file, but created non-sequentially.scattergather61
can take two or more input files and two or more output files. It reads blocks (or lines) from the input files in round-robin order, and writes them to the output files in round-robin order. This will test your ability to handle multiple input and output files.
And there are others. The grading server has some secret extra programs, and it has tests that use our handout programs in different ways.
Although this pset is about performance, correctness matters most. A program that runs quickly but incorrectly is worse than a program that runs slowly but correctly.
-
78 pts are given based on correctness and your performance relative to stdio. This means no errors, no crashes, and performance tests showing all blue or green results.
-
12 points are “mastery” points, which are given based on your performance relative to the staff solution. Full credit on this rubric will require doing much better than stdio on at least one non-sequential test, or somewhat better than stdio on most non-sequential tests.
-
10 points are for style.
Running tests
Run make check
to check your current implementation on a battery of tests
and print summary statistics at the end.
The test battery divides into categories as follows:
-
The sequential correctness tests (tests C1, C2, etc.) stress-test your code for common errors on sequential I/O. Performance on these tests is not important.
-
The non-sequential correctness tests (tests CN1, CN2, etc.) stress-test your code for common errors on non-sequential I/O, such as strided I/O or reverse sequential I/O. Performance on these tests is not important.
-
The medium-file sequential performance tests (tests MP*) perform sequential I/O on 5MiB files.
-
The medium-file non-sequential performance tests (tests MPN*) perform non-sequential I/O (e.g., reverse, random, or stride) on 5MiB files.
-
The large-file sequential performance tests (tests LP*) perform sequential I/O on 64MiB files.
-
The large-file non-sequential performance tests (tests LPN*) perform non-sequential I/O on 64MiB files.
-
The secret tests (tests SECRET*) are only on the grading server.
Run make check-TESTS
(e.g., make check-mp
or make check-mp1-3
)
to run selected tests.
That version of IO61 that uses stdio is provided for you in stdio-io61.cc
.
(You can build a stdio test with make stdio-cat61
and so forth.)
Roadmap
It’s easy in this problem set to design something too complex and get yourself stuck. Avoid that problem by tackling the simple cases first.
Here’s a recommended roadmap. You should commit your code after each phase.
Phase 1
First improve the performance of io61_read
and io61_write
by ensuring that
these library calls actually read and write blocks of data, rather than bytes.
The blockcat61
test program reads and writes data in blocks—but do its
system calls use blocks or bytes? Run strace -o strace.out ./blockcat61 blockcat61.cc
. (That command line tells strace
to write its output to
strace.out
, so examine strace.out
with a command like less strace.out
or
through VS Code. If you're having trouble running strace
, make sure that you're in
Docker and that you've previously run the command make
.)
When you fix this problem, your code should still pass the correctness tests
(make check-c,cn
). Your performance should have improved on block I/O tests
(such as make check-mp1
) relative to the handout code, though it will still
be much slower than stdio. (We go from speed 0.01x stdio—that is, almost
infinitely slower—to 1.41x stdio. Your later changes may reduce this advantage
on test MP1, but in exchange for better performance on other tests.)
Phase 2
Implement a single-slot cache buffer for sequential reads. This will hold bytes [N, N+B) of the file, where B is some largish number (try different numbers). Any read request that lies within that range of bytes can be satisfied without making a system call.
This will improve your performance on some additional byte I/O tests (e.g.,
make check-mp3
). Your code should still pass the sequential correctness
tests (make check-c
). (Test C12, which combines io61_readc
with
io61_read
calls, might be a special challenge—think about why!)
The interaction between cache buffers and the
io61_seek
call, which is used in non-sequential access patterns, is fiddly to get right. You may want to temporarily ignore non-sequential access patterns in Phases 2 and 3. This will mean your code will temporarily produce incorrect results for non-sequential tests (CN*, MPN*, LPN*).
Phase 3
Extend your single-slot cache buffer to also support sequential writes.
This will improve your performance on byte I/O tests. If you do this well, you
should now match or beat stdio’s performance on all sequential I/O tests
(make check-mp,lp
)!
This step may break our sequential test that evaluates short writes and short reads, “short-piped, sequential correctness”. Passing the short-write test requires that you correctly handle restartable errors in your
io61_flush
function.
Phase 4
Fix your code to handle seeks correctly (but not necessarily in a high-performance way).
This should make your code produce correct results for all tests (make check-cn,mpn,lpn
), but it may underperform stdio on some non-sequential
tests.
Phase 5
Change your code to continue handling seeks correctly, but with good
performance for reverse61
.
For ideas, try running strace
on the stdio-reverse61
variant. What does it
look like stdio is doing?
A correct fix for this phase should make your code both roughly equal stdio’s performance and produce correct results on all tests.
Phase 6
Finally, it’s time to outperform stdio on some tests!
Perhaps a bigger cache buffer will outperform stdio; perhaps memory-mapped files will help. If you have worked through the previous phases, you will have ideas of your own.
Make sure you commit regularly as you experiment with different potential optimizations. Real I/O performance is never perfectly consistent. You may find that some optimization ideas don’t work as well as simpler code—that’s part of what we want you to learn! If you commit regularly, you will always be able to go back to a previous working version, and to use commands like
git branch
orgit bisect
to run performance comparisons.
Hints and troubleshooting
Read our file descriptors notes if you are confused about file descriptor system calls.
You will likely need to change most of the functions in io61.cc
.
Stdio is a well-written package! It is OK if you can’t always beat it, especially for sequential I/O.
Write your own tests! This will help you shake out bugs in your code,
particularly correctness bugs. What diabolical things can you think of to try?
You can add new tests pretty easily; just edit GNUmakefile
and check.pl
.
If a program using your library produces incorrect results, it can be hard to
figure out whether the problem is with reads, writes, or both. For instance,
perhaps your read cache is wrong, so the program is reading incorrect data; or
perhaps the write cache is wrong, so the the program is trying to write
correct data, but the data passed through to the kernel via system calls is
wrong. To distinguish these scenarios, you can use strace
to check things
like file positions. You can also use the read61
, write61
, blockread61
,
and blockwrite61
programs. These behave like cat61
and blockcat61
, but
they use stdio for half their tasks. The read
variants use IO61 for reading
and stdio for writing, whereas the write
variants use IO61 for writing and
stdio for reading. If cat61
and write61
produce bad results but read61
on the same workload produces correct results, you can be pretty sure the
problem is with writes.
You may make a couple assumptions:
- You may assume that any file descriptor passed to
io61_fdopen
has its file pointer set to 0 (the beginning of the file). - You don’t need to set the file pointer after every call to
io61_seek
.
CPU performance. Add assertions in your code, since they can help you
check correctness. However, assertions do take time to check, which could
leave your code at a disadvantage relative to stdio. We will check the
performance of your code with assertions disabled by running make NDEBUG=1 check-TEST
.
If your code is slow, consider (1) the system calls it makes and (2) its CPU
performance. For system calls, use strace
to check: is a mistake in your
code accidentally causing expensive system calls, such as many read
calls
that read one byte each? For CPU performance, consider whether there are
functions that can avoid some extra work. It might pay off to have specialized
io61_readc
and io61_writec
calls, for example.
Make targets
Your code must work correctly with sanitizers turned on (though it will
not be very fast). Run make SAN=1 check
to check for errors.
To get an strace log for your code on a specific test, use make STRACE=1 check-TEST
. The strace output will be stored in strace.out
.
To run without running the stdio tests, use make NOSTDIO=1 check
.
To run with one trial rather than multiple trials, use make TRIALS=1 check
.
To compile without optimization (which might help you debug), run make O=0
.
To disable assertions, use make NDEBUG=1 check
.
Extra Credit
-
More tests. Write your own tests that cover access patterns or scenarios that are different from our tests.
-
Associative cache. If you have additional time and feel ambitious, you can implement an associative cache that holds multiple blocks and works for (a) non-mappable files or (b) very large memory-mapped files. Something like this will get the highest performance. It is difficult and rewarding to implement an associative cache. You’ll learn a lot! But design carefully so you don't make your other code more difficult to read and maintain. In fact, you might try working on a separate branch initially.
Turnin
You will turn in your code by pushing your git repository to github.com and updating the grading server with your repository.
Please flag the commit on the grading server that you'd like to be graded.
Don’t forget to fill out README.md
and AUTHORS.md
.
This pset was originally created for CS61.