Exercises: Storage – CS 61 2018

Many exercises that seem less appropriate this year, or which cover topics that we haven’t covered in class, are marked with ⚠️. However, we may have missed some.

IO-1. I/O caching

Mary Ruefle, a poet who lives in Vermont, is working on her caching I/O library for CS 61. She wants to implement a cache with N slots. Since searching those slots might slow down her library, she writes a function that maps addresses to slots. Here’s some of her code.

#define SLOTSIZ 4096
struct io61_slot {
    char buf[SLOTSIZ];
    off_t pos; // = (off_t) -1 for empty slots
    ssize_t sz;
};

#define NSLOTS 64
struct io61_file {
    int fd;
    off_t pos; // current file position
    io61_slot slots[NSLOTS];
};

static inline int find_slot(off_t off) {
    return off % NSLOTS;
}

int io61_readc(io61_file* f) {
    int slotindex = find_slot(f->pos);
    io61_slot* s = &f->slots[slotindex];

    if (f->pos < s->pos || f->pos >= s->pos + s->sz) {
        // slot contains wrong data, need to refill it
        off_t new_pos = lseek(f->fd, f->pos, SEEK_SET);
        assert(new_pos != (off_t) -1); // only handle seekable files for now
        ssize_t r = read(f->fd, s->buf, SLOTSIZ);
        if (r == -1 || r == 0) {
            return EOF;
        }
        s->pos = f->pos;
        s->sz = r;
    }

    int ch = (unsigned char) s->buf[f->pos - s->pos];
    ++f->pos;
    return ch;
}

Before she can run and debug this code, Mary is led “to an emergency of feeling that … results in a poem.” She’ll return to CS61 and fix her implementation soon, but in the meantime, let’s answer some questions about it.

QUESTION IO-1A. True or false: Mary’s cache is a direct-mapped cache.

QUESTION IO-1B. What changes to Mary’s code could change your answer to Part A? Circle all that apply.

New code for find_slot (keeping io61_readc the same)
New code in io61_readc (keeping find_slot the same)
New code in io61_readc and new code for find_slot
None of the above

QUESTION IO-1C. Which problems would occur when Mary’s code was used to sequentially read a seekable file of size 2MiB (2×2²⁰ = 2097152 bytes) one character at a time? Circle all that apply.

Excessive CPU usage (>10x stdio)
Many system calls to read data (>10x stdio)
Incorrect data (byte x read at a position where the file has byte y≠x)
Read too much data (more bytes read than file contains)
Read too little data (fewer bytes read than file contains)
Crash/undefined behavior
None of the above

QUESTION IO-1D. Which of these new implementations for find_slot would fix at least one of these problems with reading sequential files? Circle all that apply.

return (off * 2654435761) % NSLOTS; /* integer hash function from Stack Overflow */
return (off / SLOTSIZ) % NSLOTS;
return off & (NSLOTS - 1);
return 0;
return (off >> 12) & 0x3F;
None of the above

IO-2. Caches and reference strings

QUESTION IO-2A. True or false: A direct-mapped cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.

QUESTION IO-2B. True or false: A fully-associative cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.

Consider the following 5 reference strings.

Name	String
α	1
β	1, 2
γ	1, 2, 3, 4, 5
δ	2, 4
ε	5, 2, 4, 2

QUESTION IO-2C. Which of the strings might indicate a sequential access pattern? Circle all that apply.

α	β	γ	δ	ε	None of these

QUESTION IO-2D. Which of the strings might indicate a strided access pattern with stride >1? Circle all that apply.

α	β	γ	δ	ε	None of these

The remaining questions concern concatenated permutations of these five strings. For example, the permutation αβγδε refers to this reference string:

1, 1, 2, 1, 2, 3, 4, 5, 2, 4, 5, 2, 4, 2.

We pass such permutations through an initially-empty, fully-associative cache with 3 slots, and observe the numbers of hits.

QUESTION IO-2E. How many cold misses might a permutation observe? Circle all that apply.

0	1	2	3	4	5	Some other number

Under LRU eviction, the permutation αβεγδ observes 5 hits as follows. (We annotate each access with “h” for hit or “m” for miss.)

1m; 1h, 2m; 5m, 2h, 4m, 2h; 1m, 2h, 3m, 4m, 5m; 2m, 4h.

QUESTION IO-2F. How many hits does this permutation observe under FIFO eviction?

QUESTION IO-2G. Give a permutation that will observe 8 hits under LRU eviction, which is the maximum for any permutation. There are several possible answers. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 7 hits, etc.)

QUESTION IO-2H. Give a permutation that will observe 2 hits under LRU eviction, which is the minimum for any permutation. There is one unique answer. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 3 hits, etc.)

IO-3. Processor cache

The git version control system is based on commit hashes, which are 160-bit (20-byte) hash values used to identify commits. In this problem you’ll consider the processor cache behavior of several versions of a “grading server” that maps commits to grades. Here’s the first version:

struct commit_info {
    char hash[20];
    int grade[11];
};

commit_info* commits;
size_t N;

int get_grade1(const char* hash, int pset) {
    for (size_t i = 0; i != N; ++i) {
        if (memcmp(commits[i].hash, hash, 20) == 0) {
            return commits[i].grade[pset];
        }
    }
    return -1;
}

We will ask questions about the average number of cache lines accessed by variants of get_grade(hash, pset). You should make the following assumptions:

The hash argument is uniformly drawn from the set of known commits. That is, the probability that hash equals the ith commit’s hash is 1/N.
Only count cache lines accessible via commits. Don’t worry about cache lines used for local variables, for parameters, for global variables, or for instructions. For instance, do not count the hash argument or the global-data cache line that stores the commits variable itself.
The commits pointer is 64-byte aligned and cache lines are 64 bytes long.
Commit hashes are mathematically indistinguishable from random numbers. Thus, the probability that two different hashes have the same initial k bits equals 1/2^k.
We’ll ignore small errors; N/2 and (N+1)/2 will be considered equivalent.

QUESTION IO-3A. What is the expected number of cache lines accessed by get_grade1, in terms of N?

The second version:

struct commit_info {
   char hash[20];
   int grade[11];
};

commit_info** commits;
size_t N;

int get_grade2(const char hash[20], int pset) {
    for (size_t i = 0; i != N; ++i) {
        if (memcmp(commits[i]->hash, hash, 20) == 0) {
            return commits[i]->grade[pset];
        }
    }
    return -1;
}

QUESTION IO-3B. What is the expected number of cache lines accessed by get_grade2, in terms of N?

The third version:

struct commit_info {
    char hash[20];
    int grade[11];
};

struct commit_hint {
    char hint[8];
    commit_info* commit;
};

commit_hint* commits;
size_t N;

int get_grade3(const char* hash, int pset) {
    for (size_t i = 0; i != N; ++i) {
        if (memcmp(commits[i].hint, hash, 8) == 0
            && memcmp(commits[i].commit->hash, hash, 20) == 0) {
            return commits[i].commit->grade[pset];
        }
    }
    return -1;
}

QUESTION IO-3C. What is the expected number of cache lines accessed by get_grade3, in terms of N? (You may assume that N≤2000.)

The fourth version is a hash table.

struct commit_info {
    char hash[20];
    int grade[11];
};

commit_info** commits;
size_t commits_hashsize;

int get_grade4(const char* hash, int pset) {
    // choose initial bucket
    size_t bucket;
    memcpy(&bucket, hash, sizeof(bucket));
    bucket = bucket % commits_hashsize;
    // search for the commit starting from that bucket
    while (commits[bucket] != nullptr) {
        if (memcmp(commits[bucket]->hash, hash, 20) == 0) {
            return commits[bucket]->grade[pset];
        }
        bucket = (bucket + 1) % commits_hashsize;
    }
    return -1;
}

QUESTION IO-3D. Assume that a call to get_grade4 encounters B - 1 expected hash collisions (i.e., examines B buckets total, including the bucket that actually contains hash). What is the expected number of cache lines accessed by get_grade4, in terms of N and B?

IO-4. IO caching and strace

Elif Batuman is investigating several program executables left behind by her ex-roommate Fyodor. She runs each executable under strace in the following way:

strace -o strace.txt ./EXECUTABLE files/text1meg.txt > files/out.txt

Help her figure out properties of these programs based on their system call traces.

QUESTION IO-4A. Program ./mysterya:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x8193000
brk(0x81b5000)                          = 0x81b5000
read(3, "A", 1)                         = 1
write(1, "A", 1)                        = 1
read(3, "\n", 1)                        = 1
write(1, "\n", 1)                       = 1
read(3, "A", 1)                         = 1
write(1, "A", 1)                        = 1
read(3, "'", 1)                         = 1
write(1, "'", 1)                        = 1
read(3, "s", 1)                         = 1
write(1, "s", 1)                        = 1
...

Circle at least one option in each column.

Sequential IO
Reverse sequential IO
Strided IO

No read cache
Unaligned read cache
Aligned read cache

No write cache
Write cache

Cache size 4096
Cache size 2048
Cache size 1024
Other

QUESTION IO-4B. Program ./mysteryb:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x96c5000
brk(0x96e6000)                          = 0x96e6000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
read(3, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
write(1, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
...

Circle at least one option in each column.

Sequential IO
Reverse sequential IO
Strided IO

No read cache
Unaligned read cache
Aligned read cache

No write cache
Write cache

Cache size 4096
Cache size 2048
Cache size 1024
Other

QUESTION IO-4C. Program ./mysteryc:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9064000
brk(0x9085000)                          = 0x9085000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1046528, SEEK_SET)             = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1044480, SEEK_SET)             = 1044480
read(3, "Quinton\nQuinton's\nQuirinal\nQuisl"..., 2048) = 2048
write(1, "ehR\neehR\naehR\ns'hR\nhR\nsdlonyeR\ns"..., 2048) = 2048
lseek(3, 1042432, SEEK_SET)             = 1042432
read(3, "emyslid's\nPrensa\nPrensa's\nPrenti"..., 2048) = 2048
write(1, "\ns'nailitniuQ\nnailitniuQ\nnniuQ\ns"..., 2048) = 2048
lseek(3, 1040384, SEEK_SET)             = 1040384
read(3, "Pindar's\nPinkerton\nPinocchio\nPin"..., 2048) = 2048
write(1, "rP\ndilsymerP\ns'regnimerP\nregnime"..., 2048) = 2048
...

Circle at least one option in each column.

Sequential IO
Reverse sequential IO
Strided IO

No read cache
Unaligned read cache
Aligned read cache

No write cache
Write cache

Cache size 4096
Cache size 2048
Cache size 1024
Other

QUESTION IO-4D. Program ./mysteryd:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9a0e000
brk(0x9a2f000)                          = 0x9a2f000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1048575, SEEK_SET)             = 1048575
read(3, "o", 2048)                      = 1
lseek(3, 1048574, SEEK_SET)             = 1048574
read(3, "Ro", 2048)                     = 2
lseek(3, 1048573, SEEK_SET)             = 1048573
read(3, "\nRo", 2048)                   = 3
...
lseek(3, 1046528, SEEK_SET)             = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1046527, SEEK_SET)             = 1046527
read(3, "eingau\nRheingau's\nRhenish\nRhiann"..., 2048) = 2048
lseek(3, 1046526, SEEK_SET)             = 1046526
read(3, "heingau\nRheingau's\nRhenish\nRhian"..., 2048) = 2048
...

Circle at least one option in each column.

Sequential IO
Reverse sequential IO
Strided IO

No read cache
Unaligned read cache
Aligned read cache

No write cache
Write cache

Cache size 4096
Cache size 2048
Cache size 1024
Other

QUESTION IO-4E. Program ./mysterye:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x93e5000
brk(0x9407000)                          = 0x9407000
read(3, "A", 1)                         = 1
read(3, "\n", 1)                        = 1
read(3, "A", 1)                         = 1
...
read(3, "A", 1)                         = 1
read(3, "l", 1)                         = 1
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 1024) = 1024
read(3, "t", 1)                         = 1
read(3, "o", 1)                         = 1
read(3, "n", 1)                         = 1
...

Circle at least one option in each column.

Sequential IO
Reverse sequential IO
Strided IO

No read cache
Unaligned read cache
Aligned read cache

No write cache
Write cache

Cache size 4096
Cache size 2048
Cache size 1024
Other

QUESTION IO-4F. Program ./mysteryf:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9281000
brk(0x92a3000)                          = 0x92a3000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 4096) = 4096
write(1, "A", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, "A", 1)                        = 1
...
write(1, "A", 1)                        = 1
write(1, "l", 1)                        = 1
read(3, "ton's\nAludra\nAludra's\nAlva\nAlvar"..., 4096) = 4096
write(1, "t", 1)                        = 1
write(1, "o", 1)                        = 1
write(1, "n", 1)                        = 1
...

Circle at least one option in each column.

Sequential IO
Reverse sequential IO
Strided IO

No read cache
Unaligned read cache
Aligned read cache

No write cache
Write cache

Cache size 4096
Cache size 2048
Cache size 1024
Other

IO-5. Processor cache

The following questions use the following C definition for an NxM matrix (the matrix has N rows and M columns).

struct matrix {
    unsigned N;
    unsigned M;
    double elt[0];
};

matrix* matrix_create(unsigned N, unsigned M) {
    matrix* m = (matrix*) malloc(sizeof(matrix) + N * M * sizeof(double));
    m->N = N;
    m->M = M;
    for (size_t i = 0; i < N * M; ++i) {
        m->elt[i] = 0.0;
    }
    return m;
}

Typically, matrix data is stored in row-major order: element m_ij (at row i and column j) is stored in m->elt[i*m->M + j]. We might write this in C using an inline function:

inline double* melt1(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i * m->M + j];
}

But that’s not the only possible method to store matrix data. Here are several more.

inline double* melt2(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i + j * m->N];
}

inline double* melt3(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i + ((m->N - i + j) % m->M) * m->N];
}

inline double* melt4(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i + ((i + j) % m->M) * m->N];
}

inline double* melt5(matrix* m, unsigned i, unsigned j) {
    assert(m->M % 8 == 0);
    unsigned k = (i/8) * (m->M/8) + (j/8);
    return &m->elt[k*64 + (i % 8) * 8 + j % 8];
}

QUESTION IO-5A. Which method (of melt1–melt5) will have the best processor cache behavior if most matrix accesses use loops like this?

for (unsigned j = 0; j < 100; ++j) {
    for (unsigned i = 0; i < 100; ++i) {
        f(*melt(m, i, j));
    }
}

QUESTION IO-5B. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

for (unsigned i = 0; i < 100; ++i) {
    f(*melt(m, i, i));
}

QUESTION IO-5C. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

for (unsigned i = 0; i < 100; ++i) {
    for (unsigned j = 0; j < 100; ++j) {
        f(*melt(m, i, j));
    }
}

QUESTION IO-5D. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

for (int di = -3; di <= 3; ++di) {
    for (int dj = -3; dj <= 3; ++dj) {
        f(*melt(m, I + di, J + dj));
    }
}

QUESTION IO-5E. Here is a matrix-multiply function in ikj order.

matrix* matrix_multiply(matrix* a, matrix* b) {
    assert(a->M == b->N);
    matrix* c = matrix_create(a->N, b->M);
    for (unsigned i = 0; i != a->N; ++i) {
        for (unsigned k = 0; k != a->M; ++k) {
            for (unsigned j = 0; j != b->M; ++j) {
                *melt(c, i, j) += *melt(a, i, k) * *melt(b, k, j);
            }
        }
    }
}

This loop order is cache-optimal when data is stored in melt1 order. What loop order is cache-optimal for melt2?

QUESTION IO-5F. You notice that accessing a matrix element using melt1 is very slow. After some debugging, it seems like the processor on which you are running code has a very slow multiply instruction. Briefly describe a change to struct matrix that would let you write a version of melt1 with no multiply instruction. You may add members, change sizes, or anything you like.

IO-6. Caching

Assume that we have a cache that holds four slots. Assume that each letter below indicates an access to a block. Answer the following questions as they pertain to the following sequence of accesses.

E D C B A E D A A A B C D E

QUESTION IO-6A. What is the hit rate assuming an LRU replacement policy?

QUESTION IO-6B. What pages will you have in the cache at the end of the run?

QUESTION IO-6C. What is the best possible hit rate attainable if you could see into the future?

IO-7. Caching

Intel and CrossPoint have announced a new persistent memory technology with performance approaching that of DRAM. Your job is to calculate some performance metrics to help system architectects decide how to best incorporate this new technology into their platform.

Let's say that it takes 64ns to access one (32-bit) word of main memory (DRAM) and 256ns to access one (32-bit) word of this new persistent memory, which we'll call NVM (non-volatile memory). The block size of the NVM is 256 bytes. The NVM designers are quite smart and although it takes a long time to access the first byte, when you are accessing NVM sequentially, the devices perform read ahead and stream data efficiently -- at 32 GB/second, which is identical to the bandwidth of DRAM.

QUESTION IO-7A. Let's say that we are performing random accesses of 32 bits (on a 32-bit processor). What fraction of the accesses must be to main memory (as opposed to NVM) to achieve performance within 10% of DRAM?

Let X be the fraction of accesses to DRAM: access time = 64X + 256(1-X). We want that to be <= 1.1*64 (within 10% of DRAM). So, 1.1*64 = 70.4. So, let's solve for: 64X + 256(1-X) = 70.4.
64X + 256 - 256X = 70.4.
(256X - 64X) = 256 - 70.4
192X = 186
X = 186/192
about .97
So, we need a hit rate in main memory of 97%

QUESTION IO-7B. Let's say that they write every byte of a 256 block in units of 32 bits. How much faster will write-back cache perform relative to a write-through cache? (An approximate order of magnitude will be sufficient; showing work can earn partial credit.)

QUESTION IO-7C. Why might you not want to use a write-back cache?

IO-8. Reference strings

The following questions concern the FIFO (First In First Out), LRU (Least Recently Used), and LFU (Least Frequently Used) cache eviction policies.

Your answers should refer to seven-item reference strings made up of digits in the range 0–9. An example answer might be “1231231”. In each case, the reference string is processed by a 3-slot cache that’s initially empty.

QUESTION IO-8A. Give a reference string that has a 1/7 hit rate in all three policies.

QUESTION IO-8B. Give a reference string that has a 6/7 hit rate in all three policies.

QUESTION IO-8C. Give a reference string that has different hit rates under LRU and LFU policies, and compute the hit rates.

String:

LRU hit rate:

LFU hit rate:

QUESTION IO-8D. Give a reference string that has different hit rates under FIFO and LRU policies, and compute the hit rates.

String:

FIFO hit rate:

LRU hit rate:

QUESTION IO-8E. Now let's assume that you know a reference string in advance. Given a 3-slot cache and the following reference string, what caching algorithm discussed in class and/or exercises would produce the best hit rate, and would would that hit rate be?

“12341425321521”

IO-9. Caching: Access times and hit rates

Recall that x86-64 instructions can access memory in units of 1, 2, 4, or 8 bytes at a time. Assume we are running on an x86-64-like machine with 1024-byte cache lines. Our machine takes 32ns to access a unit if the cache hits, regardless of unit size. If the cache misses, an additional 8160ns are required to load the cache, for a total of 8192ns.

QUESTION IO-9A. What is the average access time per access to access all the data in a cache line as an array of 256 integers, starting from an empty cache?

QUESTION IO-9B. What unit size (1, 2, 4, or 8) minimizes the access time to access all data in a cache line, starting from an empty cache?

QUESTION IO-9C. What unit size (1, 2, 4, or 8) maximizes the hit rate to access all data in a cache line, starting from an empty cache?

IO-10. Single-slot cache code

Donald Duck is working on a single-slot cache for reading. He’s using the pos_tag/end_tag representation, which is:

struct io61_file {
   int fd;
   unsigned char cbuf[BUFSIZ];
   off_t tag;      // file offset of first character in cache (same as before)
   off_t end_tag;  // file offset one past last valid char in cache; end_tag - tag == old `csz`
   off_t pos_tag;  // file offset of next char to read in cache; pos_tag - tag == old `cpos`
};

Here’s our solution code; in case you want to scribble, the code is copied in the appendix.

 1.  ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
 2.      size_t pos = 0;
 3.      while (pos != sz) {
 4.          if (f->pos_tag < f->end_tag) {
 5.              ssize_t n = sz - pos;
 6.              if (n > f->end_tag - f->pos_tag)
 7.                  n = f->end_tag - f->pos_tag;
 8.              memcpy(&buf[pos], &f->cbuf[f->pos_tag - f->tag], n);
 9.              f->pos_tag += n;
10.              pos += n;
11.          } else {
12.              f->tag = f->end_tag;
13.              ssize_t n = read(f->fd, f->cbuf, BUFSIZ);
14.              if (n > 0)
15.                  f->end_tag += n;
16.              else
17.                  return pos ? pos : n;
18.          }
19.      }
20.      return pos;
21.  }

Donald has ideas for “simplifying” this code. Specifically, he wants to try each of the following independently:

Replacing line 4 with “if (f->pos_tag <= f->end_tag) {”.
Removing lines 6–7.
Removing line 9.
Removing lines 16–17.

QUESTION IO-10A. Which simplifications could lead to undefined behavior? List all that apply or say “none.”

QUESTION IO-10B. Which simplifications could cause io61_read to loop forever without causing undefined behavior? List all that apply or say “none.”

QUESTION IO-10C. Which simplifications could lead to io61_read returning incorrect data in buf, meaning that the data read by a series of io61_read calls won’t equal the data in the file? List all that apply or say “none.”

QUESTION IO-10D. Chastened, Donald decides to optimize the code for a specific situation, namely when io61_read is called with a sz that is larger than BUFSIZ. He wants to add code after line 11, like so, so that fewer read system calls will happen for large sz:

11.          } else if (sz - pos > BUFSIZ) {
                 // DONALD’S CODE HERE




11A.         } else {
12.              f->tag = f->end_tag;
                 ....

Finish Donald’s code. Your code should maintain the relevant invariants between tag, pos_tag, end_tag, and the file position, but you need not keep tag aligned.

IO-11. Caching

QUESTION IO-11A. If it takes 200ns to access main memory, which of the following two caches will produce a lower average access time?

A cache with a 10ns access time that produces a 90% hit rate
A cache with a 20ns access time that produces a 98% hit rate

Let's compute average access time for each case:
.9 * 10 + .1 * 200 = 9 + 20 = 29
.98 * 20 + .02 * 200 = 19.6 + 4 = 23.6
The 20ns cache produces a lower average access time.

QUESTION IO-11B. Let’s say that you have a direct-mapped cache with four slots. A page with page number N must reside in the slot numbered N % 4. What is the best hit rate this could achieve given the following sequence of page accesses?

3 6 7 5 3 2 1 1 1 8

Since it's direct mapped, each item can go in only one slot, so if we list the slots for each access, we get:
3 2 3 1 3 2 1 1 1 0
The only hits are the 2 1's, so your hit rate is 2/10 or 20% or .2.

QUESTION IO-11C. What is the best hit rate a fully-associative four-slot cache could achieve for that sequence of page accesses? (A fully-associative cache may put any page in any slot. You may assume you know the full reference stream in advance.)

QUESTION IO-11D. What hit rate would the fully-associative four-slot cache achieve if it used the LRU eviction policy?

IO-12. I/O traces

QUESTION IO-12A. Which of the following programs cannot be distinguished by the output of the strace utility, not considering open calls? List all that apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t be distinguished, and D & E can’t be distinguished, but the groups can be distinguished from each other), list them all.

Sequential byte writes using stdio
Sequential byte writes using system calls
Sequential byte writes using system calls and O_SYNC
Sequential block writes using stdio and block size 2
Sequential block writes using system calls and block size 2
Sequential block writes using system calls and O_SYNC and block size 2
Sequential block writes using stdio and block size 4096
Sequential block writes using system calls and block size 4096
Sequential block writes using system calls and O_SYNC and block size 4096

QUESTION IO-12B. Which of the programs in Part A cannot be distinguished using blktrace output? List all that apply.

QUESTION IO-12C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.

Application programs can obtain direct read access to the buffer cache
Application programs can obtain direct write access to the disk, bypassing the buffer cache
Other computers can communicate with the disk independently
The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost

QUESTION IO-12D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.

IO-13. Reference strings and eviction

QUESTION IO-13A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.

QUESTION IO-13B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.

QUESTION IO-13C. Give a single reference string with the following properties:

There exists a cache size and eviction policy that gives a 70% hit rate for the string.
There exists a cache size and eviction policy that gives a 0% hit rate for the string.

QUESTION IO-13D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.

FIFO
LRU
Random

IO-14. Cache code

Several famous musicians have just started working on CS61 Problem Set

They share the following code for their read-only, sequential, single-slot cache:

struct io61_file {
    int fd;
    unsigned char buf[4096];
    size_t pos;    // position of next character to read in `buf`
    size_t sz;     // number of valid characters in `buf`
};

int io61_readc(io61_file* f) {
    if (f->pos >= f->sz) {
        f->pos = f->sz = 0;
        ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
        if (nr <= 0) {
            f->sz = 0;
            return -1;
        } else {
            f->sz = nr;
        }
    }
    int ch = f->buf[f->pos];
    ++f->pos;
    return ch;
}

But they have different io61_read implementations. Donald (Lambert)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    return read(f->fd, buf, sz);
}

Solange (Knowles)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    for (size_t pos = 0; pos < sz; ++pos, ++buf) {
        *buf = io61_readc(f);
    }
    return sz;
}

Caroline (Shaw)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    if (f->pos >= f->sz) {
        return read(f->fd, buf, sz);
    } else {
        int ch = io61_readc(f);
        if (ch < 0) {
            return 0;
        }
        *buf = ch;
        return io61_read(f, buf + 1, sz - 1) + 1;
    }
}

You are testing each of these musicians’ codes by executing a sequence of io61_readc and/or io61_read calls on an input file and printing the resulting characters to standard output. There are no seeks, and your test programs print until end of file, so your tests’ output should equal the input file’s contents.

You should assume for these questions that no read system call ever returns -1.

QUESTION IO-14A. Describe an access pattern—that is, a sequence of io61_readc and/or io61_read calls (with lengths)—for which Donald’s code can return incorrect data.

QUESTION IO-14B. Which of these musicians’ codes can generate an output file with incorrect length?

For the remaining parts, assume the problem in Part B has been corrected, so that all musicians’ codes generate output files with correct lengths.

QUESTION IO-14C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.

QUESTION IO-14D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.

IO-15. Caches

Parts A–C concern different implementations of Pset 3’s stdio cache. Assume a program that reads a 32768-byte file a character at a time, like this:

while (io61_readc(inf) != EOF) {
}

This program will call io61_readc 32769 times. (32769 = 2¹⁵ + 1 = 8×2¹² + 1; the +1 accounts for the EOF return.) But the cache implementation might make many fewer system calls.

QUESTION IO-15A. How many read system calls are required assuming a single-slot, 4096-byte io61 cache?

QUESTION IO-15B. How many read system calls are required assuming an eight-slot, 4096-byte io61 cache?

QUESTION IO-15C. How many mmap system calls are required assuming an mmap-based io61 cache?

Parts D–F concern cache implementations and styles. We discussed many caches in class, including:

The buffer cache
The processor cache
Single-slot aligned stdio caches
Single-slot unaligned stdio caches
Circular bounded buffers

QUESTION IO-15D. Which of those caches are implemented entirely in hardware? List all that apply.

QUESTION IO-15E. Which of those software caches could help speed up reverse sequential access to a disk file? List all that apply.

QUESTION IO-15F. Which of those software caches could help speed up access to a pipe or network socket? List all that apply.

	E	D	C	B	A	E	D	A	A	A	B	C	D	E
1	Ⓔ				Ⓐ			A	A	A				Ⓔ
2		Ⓓ				Ⓔ						Ⓒ
3			Ⓒ				Ⓓ						D
4				Ⓑ							B

	E	D	C	B	A	E	D	A	A	A	B	C	D	E
1	Ⓔ					E								E
2		Ⓓ					D						D
3			Ⓒ		Ⓐ			A	A	A		Ⓒ
4				Ⓑ							B

Storage and caching exercises

IO-1. I/O caching

IO-2. Caches and reference strings

IO-3. Processor cache

IO-4. IO caching and strace

IO-5. Processor cache

IO-6. Caching

IO-7. Caching

IO-8. Reference strings

IO-9. Caching: Access times and hit rates

IO-10. Single-slot cache code

IO-11. Caching

IO-12. I/O traces

IO-13. Reference strings and eviction

IO-14. Cache code

IO-15. Caches