Exercise IO
Can you write io61_read for a single-slot cache with the following properties?
- It works for any
sz
. - The code calls
memcpy
in exactly one place. - The code calls
read
in exactly one place. - It only needs to work for sequential I/O.
Hint:
struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
size_t cpos; // position of next character to read in cbuf
size_t csz; // number of characters in cbuf that are valid (that contain data)
};
And recall:
// io61_read(f, buf, sz)
` // Read up to `sz` characters from `f` into `buf`. Returns the number of `
` // characters read on success; normally this is `sz`. Returns a short `
` // count if the file ended before `sz` characters could be read. Returns `
// -1 an error occurred before any characters were read.
ssize_t io61_read(io61_file* f, char* buf, size_t sz);
Q1. How should we initialize io61_file
’s cpos
and csz
members?
Q2. What invariant relates cpos
and csz
? (Recall that
an invariant is a property that is always true at a given point in the
code, unless there is a bug. We express invariants with assertions.)
Q3. When is the single-slot cache in io61_file
invalid or empty,
meaning that its contents cannot be useful for the next sequential read?
Q4. Do we need an additional member, int cvalid
, that says whether
the cache is valid?
Say we have an io61_file* f
containing a very small cache: BUFSIZ
==
8. The file being read contains the alphabet in order.
f->cbuf = [ a b c d e f g h ]
f->cpos = 0
f->csz = 8
Q5. What happens if we read one character with
io61_read(f, buf, 1)
? What character(s) are read into buf
? How does
f
change? What is the return value?
Q6. What happens if we then call io61_read(f, buf, 3)
?
Q7. What happens if we then call io61_read(f, buf, 6)
?
Sequential read
Given how that works, you might imagine that you could write io61_read
with (A) a memcpy
in case the cache is useful, then (B) a read to
refill the cache, and then (C) another memcpy
to copy remaining data.
But this isn't good enough for two reasons. First, that involves two
places in the code where memcpy
is called, and we only want one. And
second, we might need to repeat (B) and (C)! What should you do? Clearly
a loop is called for!
Here's the skeleton of an io61_read implementation. Can you fill it in?
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
size_t pos = 0; // number of characters read so far
while (pos != sz) {
// ??? contains one memcpy call and one read call
}
return pos;
}
We recommend you try to work this through on your own, but here are some questions and hints to guide you.
Q8. Say the cache contains valid data. Which function should be
called first, memcpy
or read
?
Q9. Write an if statement in the loop that tests whether the cache is valid.
Q10. Write a C expression for a pointer to the next character into which data should be copied.
Q11. Write a C expression for a pointer to the next cached character from which data should be copied (assuming the cache is valid).
Q12. Inside the while loop, how many characters can be copied out of the cache?
Q13. Complete the code for the if branch when the cache is valid.
Q14. Say the cache is invalid/empty. What should we do next?
Q15. The read
system call can return -1, 0, or a positive number.
What do these three cases represent, and how should io61_read
respond?
Q16. Complete the code for the if branch when the cache is invalid.
This will complete your io61_read
for sequential I/O.
Seeks
To make this code correct for seeks, we need at least one more member in
struct io61_file
—the tag, which represents the file offset of the
first character in the cache.
struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
size_t cpos; // position of next character to read in cbuf
size_t csz; // number of characters in cbuf that are valid (that contain data)
off_t tag; // file offset of first character in cache
};
Here off_t
is the type that represents file positions (“offsets”).
Q17. Write an assertion that relates f->csz
, f->cpos
, and/or
f->tag
to the file pointer’s position in f->fd
.
Or you can change all the offset-type members to be file offsets, which is in some ways simpler.
struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
off_t tag;
` off_t end_tag; // file offset of last valid char in cache; end_tag - tag == old `csz` `
` off_t pos_tag; // file offset of next char to read in cache; pos_tag - tag == old `cpos` `
};
Q18. Write assertions that relate these member variables with each other and with the file pointer’s position.
Q19. Change your code for io61_read
to update tag
appropriately.
Use the cpos/csz
representation. You should need to add just one line
of code.
Q20. Write code for io61_read
that uses the pos_tag/end_tag
representation.
Q21. Consider a call to io61_seek(io61_file* f, off_t off)
. Assume
that the seek offset, off
, is contained within the cache (i.e., the
cache contains a valid character for file position off
). What members
in f
need to change in the cpos/csz
representation?
Q22. How about the pos_tag/end_tag
representation?
Q23. Implement io61_seek
. Your code should preserve the cache if
the new seek offset is contained within it. Otherwise, your code should
mark the cache as invalid/empty. This implementation should work
correctly for seeks without changing io61_read
(beyond the change
you made in Q19). Use the cpos/csz
representation.
Q24. Now redo that for the pos_tag/end_tag
representation.
Q25. Does this code speed up reverse sequential I/O?
Q26. Change your io61_read
and io61_seek
to work correctly and
speed up reverse sequential I/O. Use the pos_tag/end_tag
representation. You may need to change one of the invariants you wrote
in Q18; if so, give the new invariant assertion.