# 2016/Storage3X

Computer Science 61 and E61
Systems Programming and Machine Organization
Fall 2016

# Storage 3 Exercise: Single-Slot Cache

Today’s exercise is a coding exercise that you should complete using pen and paper. This long exercise is great preparation for the kind of thinking and arithmetic skills you’ll need to complete Assignment 3. We expect you to get through Part 1 and some of Part 2 in class, but complete all parts on your own as pset and midterm preparation.

The goal: Write io61_read for a single-slot cache, with the following properties:

• It works for any `sz`.
• The code calls `memcpy` in exactly one place.
• The code calls `read` in exactly one place.

Recall:

```// io61_read(f, buf, sz)
//    Read up to `sz` characters from `f` into `buf`. Returns the number of
//    characters read on success; normally this is `sz`. Returns a short
//    count, which might be zero, if the file ended before `sz` characters
//    could be read. Returns -1 if an error occurred before any characters
ssize_t io61_read(io61_file* f, char* buf, size_t sz);
```

## Part 1: Sequential I/O skeleton

In Part 1, your cache only needs to work for sequential I/O.

Here’s a skeleton for `io61_file`:

```struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
size_t cpos;   // position of next character to read in cbuf
size_t csz;    // number of characters in cbuf that are valid (that contain data)
};
```

Answer all the following questions in order, then check your work before continuing.

Q1. How should we initialize `io61_file`’s `cpos` and `csz` members?

Q2. What invariant relates `cpos` and `csz`? (Recall that an invariant is a property that is always true at a given point in the code, unless there is a bug. We express invariants with assertions.)

Q3. When is the single-slot cache in `io61_file` invalid or empty, meaning that its contents cannot be useful for the next sequential read?

Q4. We don’t need an additional member, `int cvalid`, that says whether the cache is valid. Why not?

Say we have an `io61_file* f` containing a very small cache: `BUFSIZ` == 8. The file being read contains the alphabet in order.

```f->cbuf = [ a b c d e f g h ]
f->cpos = 0
f->csz = 8
```

Q5. What happens if we read one character with `io61_read(f, buf, 1)`? What character(s) are read into `buf`? How does `f` change? What is the return value?

Q6. What happens if we then call `io61_read(f, buf, 3)`?

Q7. What happens if we then call `io61_read(f, buf, 6)`?

Now, you might imagine that you could write `io61_read` with (A) a `memcpy` in case the cache is useful, then (B) a read to refill the cache, and then (C) another `memcpy` to copy remaining data. But this isn't good enough for two reasons. First, that involves two places in the code where `memcpy` is called, and we only want one. And second, we might need to repeat (B) and (C)! A loop is called for.

Here’s the skeleton of an io61_read implementation.

```ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
size_t pos = 0;   // number of characters read so far
while (pos != sz) {
// ??? contains one memcpy call and one read call
}
return pos;
}
```

Can you complete the function?

Write the code out with pencil and paper. Then, answer the following questions about your code. Or answer the questions first; they may guide you to a better solution.

Q8. Say the cache contains valid data. Which function should be called first, `memcpy` or `read`?

Q9. Write an if statement in the loop that tests whether the cache is valid. You’ll fill in the “then” (cache-is-valid) and “else” (cache-is-invalid/empty) branches later.

Q10. Write a C expression for a pointer to the next character into which data should be copied.

Q11. Write a C expression for a pointer to the next cached character from which data should be copied (assuming the cache is valid).

Q12. Inside the while loop, how many characters can be copied out of the cache?

Q13. Complete the code for the if branch when the cache is valid.

Q14. Describe what should happen if the cache is invalid/empty.

Q15. The `read` system call can return -1, 0, or a positive number. What do these three cases represent, and how should `io61_read` respond?

Q16. Complete the code for the else branch (cache invalid/empty). This will complete your `io61_read` for sequential I/O.

## Part 3. Seeks

To make this code correct for seeks, we need at least one more member in `struct io61_file`—the tag, which represents the file offset of the first character in the cache.

```struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
size_t cpos;   // position of next character to read in cbuf
size_t csz;    // number of characters in cbuf that are valid (that contain data)
off_t tag;     // file offset of first character in cache
};
```

Here `off_t` is the type that represents file positions (“offsets”).

Q17. Write an assertion that relates `f->csz`, `f->cpos`, and/or `f->tag` to the file pointer’s position in `f->fd`. Assume the variable `fd_file_pointer` holds `f->fd`’s file pointer position.

But it’s better in many way to change all the offset-type members to be file offsets, which is in some ways simpler.

```struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
off_t tag;      // file offset of first character in cache (same as before)
off_t end_tag;  // file offset one past last valid char in cache; end_tag - tag == old `csz`
off_t pos_tag;  // file offset of next char to read in cache; pos_tag - tag == old `cpos`
};
```

Q18. Write assertions that relate these member variables with each other and with the file pointer’s position.

Q19. Change your code for `io61_read` to update `tag` appropriately. Use the `cpos/csz` representation. You should need to add just one line of code.

Q20. Write code for `io61_read` that uses the `pos_tag/end_tag` representation.

Q21. Consider a call to `io61_seek(io61_file* f, off_t off)` (specification below). Assume that the seek offset, `off`, is contained within the cache (i.e., the cache contains a valid character for file position `off`). What members in `f` need to change in the `cpos/csz` representation?

```// io61_seek(f, pos)
//    Change the file pointer for file `f` to `pos` bytes into the file.
//    Returns 0 on success and -1 on failure.
```

Q22. How about the `pos_tag/end_tag` representation?

Q23. Implement `io61_seek`. If the new seek offset is contained within the cache, your code should preserve the cached data (and it need not call the `lseek` system call). Otherwise, your code should call `lseek` and mark the cache as invalid/empty. This implementation should work correctly for seeks without changing `io61_read` (beyond the change you made in Q19). Use the `cpos/csz` representation.

Q24. Now redo that for the `pos_tag/end_tag` representation.

Q25. This code will not speed up reverse sequential I/O. Why not?

Q26. Change your `io61_read` and `io61_seek` to work correctly and speed up reverse sequential I/O. Use the `pos_tag/end_tag` representation. You may need to change one of the invariants you wrote in Q18; if so, give the new invariant assertion. Hint: Ensure that `f->tag` is always aligned on a multiple of `BUFSIZ`. This will take some effort!