Problem set 4: Stdio

In this problem set, you’ll gain experience with caching by writing your own buffered I/O library.

Get the code

Our code is released in the cs61-psets repository. To update it, run

git remote show handout

If this reports an error, run

git remote add handout git://github.com/cs61/cs61-f19-psets.git

Then run

git pull handout master

This will merge our handout code with your previous work. If you have any “conflicts” from earlier problem sets, resolve them before continuing further. Run git push to save this merge back to your private repository.

Use an explicit merge. If you copy code by hand, our automated scripts will have trouble analyzing your code, and it’ll be harder for you to incorporate our updates.

You may also create a new cs61-psets repository for this problem set. However, you may need our help to do so. Write cs61-staff@g.harvard.edu and tell us (1) your GitHub username and (2) the name of the repo you want to create. Once the repo is ready, make sure you update the grading server.

Goal

Our simple IO61 library performs I/O on files. You will find the code in the pest directory in the file io61.cc. Our version of IO61 is pretty stupid—it uses character-at-a-time system call I/O and is thus quite slow. Your goal is simple: speed it up using caching.

We also distribute several programs that use IO61. (The grading server has some secret extra programs too!)

You will introduce caching to the io61_file abstraction and use your cache to speed up IO61 operations. We’re giving you tons of freedom to implement the cache as you like. You may even use memory-mapped I/O, prefetching system calls like madvise or posix_fadvise, or multiple threads or processes (although none of these are required). But you may not use another buffered I/O library or caching I/O library.

Your library should:

All your code should fit in io61.cc.

Evaluation

We will evaluate you based on your code’s performance relative to a version of IO61 that uses stdio. That version is provided for you for testing in stdio-io61.cc (and the makefile builds stdio-cat61, stdio-blockcat61, and so forth).

Run make check to check your current implementation on a battery of tests and print summary statistics at the end. Run make check-TESTNUMBERS (e.g., make check-9 or make check-5-10 or make check-1,2,3) to run selected tests. For more, see Make targets.

We may update the tests during the pset release period. We may also run additional tests during grading. In all instances, correctness is primary. A program that runs quickly but incorrectly is worse than a program that runs slowly but correctly.

Roadmap

It’s easy in this problem set to design something too complex and get yourself stuck. Avoid that problem by tackling the simple cases first. Here’s a possible roadmap.

Step 1. You can improve the way io61_read and io61_write work without introducing a cache. Use strace to investigate the operation of blockcat61. Look at the blockcat61 code; what system calls do you think it should make? Then run strace -o strace.out ./blockcat61 blockcat61.cc. (That command line tells strace to write its output to strace.out, so examine strace.out with a command like less strace.out.) What system calls does blockcat61 actually make?

This will improve your performance on the block I/O tests (e.g., make check-4-6).

Step 2. Implement a single-slot cache buffer for reading sequential files. This will hold bytes [N, N+B) of the file, where B is some largish number (try different numbers). Any read request that lies within that range of bytes can be satisfied without making a system call. Check your work by examining strace output. Don’t worry about seeks yet. (This means your code will stop working on the “reverse order,” “stride order,” and “random seek order” tests.)

This will improve your performance on character I/O tests.

Step 3. Implement a single-slot cache buffer for writing sequential files. This is like the cache buffer for reading, but it absorbs writes instead of prefetching reads.

This will improve your performance on character I/O tests. If you do this well, you should now match or beat stdio’s performance on all sequential I/O tests!

Step 4. Fix your code to handle seeks correctly (but not necessarily in a high-performance way).

This should make your code produce correct results for all tests.

Step 5. Change your code to continue handling seeks correctly, but with good performance for reverse61. For ideas, try running strace on the stdio-reverse61 variant. What does it look like stdio is doing?

This should make your code roughly equal stdio’s performance on all tests.

Extra Credit 1. If you feel ambitious, you might try changing your code to use memory-mapped I/O when this is possible. Note: Memory-mapped I/O is not always possible, though, so keep your old block cache around for pipe files and other non-mappable files.

This should make your code beat stdio on some non-sequential I/O tests!

Extra Credit 2. If you have additional time and feel even more ambitious, implement an associative cache that holds multiple blocks and works for (a) non-mappable files or (b) very large memory-mapped files.

Something like this will get the highest performance. It is difficult and rewarding to implement an associative cache. You’ll learn a lot! But design carefully so you don't make your other code more difficult to read and maintain. In fact, you might try working on a separate branch initially.

Hints

You are allowed to make a couple assumptions.

Make targets

Your code must work correctly with sanitizers turned on (though it will not be very fast). Run make SAN=1 check to check for errors.

To get an strace log for your code on a specific test, use make STRACE=1 check-TESTNUM. The strace output will be stored in strace.out.

To run without running the stdio tests, use make NOSTDIO=1 check.

To run with one trial rather than multiple trials, use make TRIALS=1 check.

To compile without optimization (which might help you debug), run make O=0.

Final extra credit step: Deadlock

We also release a program, pipeexchange61, that demonstrates a problem with conventional blocking I/O. The pipeexchange61 program forks two copies of itself. One copy, the requester, sends requests to the other copy, the responder. The requester can send many requests back-to-back and only then wait for responses (a form of batching). When you run pipeexchange61, the program appears to get stuck! Why? Can you construct an I/O library that unsticks pipeexchange61, without modifying pipeexchange61.cc?

Turnin

You will turn in your code by pushing your git repository to github.com and updating the grading server with your repository.

Don’t forget to fill out README.md and AUTHORS.md.


This pset was originally created for CS61.