In this problem set, you’ll gain experience with caching by writing your own buffered I/O library.
- Due Friday 11/5 at 11:59pm for college students (1 day later for DCE).
- Reminder: You may not use code or solutions from previous years.
Get the code
Get our code with
$ git pull; git pull handout main
$ git pull; git pull git://github.com/cs61/cs61-f21-psets.git main
This will merge our Problem Set 4 code with your previous
work. If you have any “conflicts” from prior problem sets, resolve
them before continuing further. Run
git push to save
your work back to your personal repository.
You may also create a new
cs61-psets repository for this assignment. Don’t
forget to enter your repository URL on the grading server.
Our simple IO61 library performs I/O on files.
You will find the code in the pest directory in the file
version of IO61 is pretty stupid—it uses character-at-a-time system call
I/O and is thus quite slow. Your goal is simple: speed it up using
We also distribute several programs that use IO61. (The grading server has some secret extra programs too!)
cat61reads a file sequentially one character at a time and writes it to standard output.
cat61, but reads the file sequentially one block at a time. Its
-b BLOCKSIZEargument sets the block size.
blockcat61, but it chooses a random block size for each read and write. It still reads and writes sequentially (only the block sizes are random). The
-bargument is the maximum possible block size. This may very well stress your caching solution.
reverse61reads a file a character at a time in reverse and writes the result to standard output.
stride61reads its input file using a strided access pattern, writing the data it reads sequentially. By strided, we mean that it begins at offset 0, reads BLOCKSIZE bytes, seeks to position STRIDE, reads another block, seeks to position (2 * STRIDE), etc. When it gets to the end of the file, it jumps back to the first bytes not yet read and repeats the pattern. The
-sarguments set the block size and stride length, respectively.
reordercat61reads its input file one block at a time in random order and writes each block to the output file at the correct position. The result is a copy of the input file, but created non-sequentially.
scattergather61can take two or more input files and two or more output files. It reads blocks (or lines) from the input files in round-robin order, and writes them to the output files in round-robin order. This will test your ability to handle multiple input and output files.
You will introduce caching to the
io61_file abstraction and use your cache
to speed up IO61 operations.
We’re giving you tons of freedom to implement the cache as you
like. You may even use memory-mapped
I/O, prefetching system
posix_fadvise, or multiple threads or processes (although none of
these are required). But you may not use another buffered I/O
library or caching I/O library.
Your library should:
- Return correct results for all access patterns and file types,
including multiple files, very large files, non-seekable files,
files that return short reads or accept short writes, mixtures of
calls (for instance, alternating
io61_readc), and random accesses.
- Perform as well as stdio on sequential reads and writes for any files, including very large files (such as files that don’t fit in memory) and non-seekable files.
- Improve on stdio’s performance for some non-sequential access patterns, such as simple strides or reverse access.
Code will be graded based on performance on the grading server.
All your code should fit in
We will evaluate you based on your code’s performance relative to a
version of IO61 that uses stdio. That version is provided for you for
stdio-io61.cc (and the makefile builds
stdio-blockcat61, and so forth).
make check to check your current implementation on a battery of tests
and print summary statistics at the end. Run
make check-TESTNUMBERS (e.g.,
make check-9 or
make check-5-10 or
make check-1,2,3) to run selected
tests. For more, see Make targets.
We may update the tests during the pset release period. We may also run additional tests during grading. In all instances, correctness matters most. A program that runs quickly but incorrectly is worse than a program that runs slowly but correctly.
It’s easy in this problem set to design something too complex and get yourself stuck. Avoid that problem by tackling the simple cases first. Here’s a possible roadmap.
You can improve the way
io61_write work without
introducing a cache. Use
strace to investigate the operation of
blockcat61. Look at the
blockcat61 code; what system calls do you
think it should make? Then run
strace -o strace.out ./blockcat61 blockcat61.cc. (That command line tells
strace to write its output
strace.out, so examine
strace.out with a command like
less strace.out.) What system calls does
blockcat61 actually make?
This will improve your performance on the block I/O tests (e.g.,
Implement a single-slot cache buffer for reading sequential
files. This will hold bytes [N, N+B) of the file, where B is some
largish number (try different numbers). Any read request that lies
within that range of bytes can be satisfied without making a system
call. Check your work by examining
strace output. Don’t worry about
seeks yet. (This means your code will stop working on the “reverse
order,” “stride order,” and “random seek order” tests.)
This will improve your performance on character I/O tests.
Step 3. Implement a single-slot cache buffer for writing sequential files. This is like the cache buffer for reading, but it absorbs writes instead of prefetching reads.
This will improve your performance on character I/O tests. If you do this well, you should now match or beat stdio’s performance on all sequential I/O tests!
Step 4. Fix your code to handle seeks correctly (but not necessarily in a high-performance way).
This should make your code produce correct results for all tests.
Change your code to continue handling seeks correctly, but with good
reverse61. For ideas, try running
strace on the
stdio-reverse61 variant. What does it look like stdio is doing?
This should make your code roughly equal stdio’s performance on all tests.
If you feel ambitious, you might try changing your code to use memory-mapped I/O when this is possible. Memory-mapped I/O is not always possible, though, so your old block cache should be used for pipe files and other non-mappable files. This should make your code beat stdio on some non-sequential I/O tests!
If you have additional time and feel even more ambitious, you can implement an associative cache that holds multiple blocks and works for (a) non-mappable files or (b) very large memory-mapped files. Something like this will get the highest performance.
It is difficult and rewarding to implement an associative cache. You’ll learn a lot! But design carefully so you don't make your other code more difficult to read and maintain. In fact, you might try working on a separate branch initially.
- Read our file descriptors notes.
- Some of our tests, particularly on shorter files, are noisy, and can produce a range of timing results. Run them multiple times before becoming concerned with a result.
- You will likely need to change most of the functions in
- Stdio is a well-written package! It is OK if you can’t always beat it, especially for sequential I/O.
- Leave useful asserts in your code. They can help you achieve
correctness. When you finally want to check your code’s performance
against ours, then you can comment or
- Write your own tests! This will help you shake out bugs in your
code, particularly correctness bugs. What diabolical things can you
think of to try? You can add new tests pretty easily; just edit
- You can run the tests on their own, too.
make checkcreates some useful input files for testing and puts them in
You are allowed to make a couple assumptions.
- You may assume that any file descriptor passed to
io61_fdopenhas its file pointer set to 0 (the beginning of the file).
- You don’t need to set the file pointer after every call to
Your code must work correctly with sanitizers turned on (though it will
not be very fast). Run
make SAN=1 check to check for errors.
To get an strace log for your code on a specific test, use
make STRACE=1 check-TESTNUM. The strace output will be stored in
To run without running the stdio tests, use
make NOSTDIO=1 check.
To run with one trial rather than multiple trials, use
make TRIALS=1 check.
To compile without optimization (which might help you debug), run
Final extra credit step: Deadlock
We also release a program,
pipeexchange61, that demonstrates a problem
with conventional blocking I/O. The
pipeexchange61 program forks two
copies of itself. One copy, the requester, sends requests to the other
copy, the responder. The requester can send many requests back-to-back
and only then wait for responses (a form of batching). When you run
pipeexchange61, the program appears to get stuck! Why? Can you
construct an I/O library that unsticks
You will turn in your code by pushing your git repository to github.com and updating the grading server with your repository.
Don’t forget to fill out
This pset was originally created for CS61.