Exercises not as directly relevant to this year’s class are marked with ⚠️.
IO-1. I/O caching
Mary Ruefle, a poet who lives in Vermont, is working on her caching I/O library for CS 61. She wants to implement a cache with N slots. Since searching those slots might slow down her library, she writes a function that maps addresses to slots. Here’s some of her code.
#define SLOTSIZ 4096
struct io61_slot {
char buf[SLOTSIZ];
off_t tag; // offset of first cached byte; (off_t) -1 for empty slots
ssize_t sz;
};
#define NSLOTS 64
struct io61_file {
int fd;
off_t pos; // current file position
io61_slot slots[NSLOTS];
};
static inline int find_slot(off_t off) {
return off % NSLOTS;
}
int io61_readc(io61_file* f) {
int slotindex = find_slot(f->pos);
io61_slot* s = &f->slots[slotindex];
if (f->pos < s->tag || f->pos >= s->tag + s->sz) {
// slot contains wrong data, need to refill it
off_t new_pos = lseek(f->fd, f->pos, SEEK_SET);
assert(new_pos != (off_t) -1); // only handle seekable files for now
ssize_t r = read(f->fd, s->buf, SLOTSIZ);
if (r == -1 || r == 0) {
return EOF;
}
s->tag = f->pos;
s->sz = r;
}
int ch = (unsigned char) s->buf[f->pos - s->tag];
++f->pos;
return ch;
}
Before she can run and debug this code, Mary is led “to an emergency of feeling that … results in a poem.” She’ll return to CS61 and fix her implementation soon, but in the meantime, let’s answer some questions about it.
QUESTION IO-1A. True or false: Mary’s cache is direct-mapped, meaning that each cacheable data chunk has exactly one slot that can cache it.
QUESTION IO-1B. What changes to Mary’s code could change your answer to Part A? List all that apply.
- New code for
find_slot
that is also deterministic (keepingio61_readc
the same) - Any new code for
find_slot
(keepingio61_readc
the same) - New code in
io61_readc
(keepingfind_slot
the same) - New code in
io61_readc
and new code forfind_slot
- None of the above
QUESTION IO-1C. Which problems would occur when Mary’s code was used to sequentially read a seekable file of size 2MiB (2×220 = 2097152 bytes) one character at a time? List all that apply.
- Excessive process CPU usage (>10x stdio)
- Many system calls to read data (>10x stdio)
- Incorrect data (byte x read at a position where the file has byte y≠x)
- Read too much data (more bytes read than file contains)
- Read too little data (fewer bytes read than file contains)
- Crash/undefined behavior
- None of the above
QUESTION IO-1D. Which of these new implementations for find_slot
would fix at least one of these problems with reading sequential files?
List all that apply.
return (off * 2654435761) % NSLOTS; /* integer hash function from Stack Overflow */
return (off / SLOTSIZ) % NSLOTS;
return off & (NSLOTS - 1);
return 0;
return (off >> 12) & 0x3F;
- None of the above
IO-2. Caches and reference strings
QUESTION IO-2A. True or false: A direct-mapped cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.
QUESTION IO-2B. True or false: A fully-associative cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.
Consider the following 5 reference strings.
Name | String |
---|---|
α | 1 |
β | 1, 2 |
γ | 1, 2, 3, 4, 5 |
δ | 2, 4 |
ε | 5, 2, 4, 2 |
QUESTION IO-2C. Which of the strings might indicate a sequential access pattern? List all that apply.
α | β | γ | δ | ε | None of these |
QUESTION IO-2D. Which of the strings might indicate a strided access pattern with stride >1? List all that apply.
α | β | γ | δ | ε | None of these |
The remaining questions concern concatenated permutations of these five strings. For example, the permutation αβγδε refers to this reference string:
1, 1, 2, 1, 2, 3, 4, 5, 2, 4, 5, 2, 4, 2. |
We pass such permutations through an initially-empty, fully-associative cache with 3 slots, and observe the numbers of hits.
QUESTION IO-2E. How many cold misses might a permutation observe? List all that apply.
0 | 1 | 2 | 3 | 4 | 5 | Some other number |
Under LRU eviction, the permutation αβεγδ observes 5 hits as follows. (We annotate each access with “h” for hit or “m” for miss.)
1m; 1h, 2m; 5m, 2h, 4m, 2h; 1m, 2h, 3m, 4m, 5m; 2m, 4h. |
QUESTION IO-2F. How many hits does this permutation observe under FIFO eviction?
QUESTION IO-2G. Give a permutation that will observe 8 hits under LRU eviction, which is the maximum for any permutation. There are several possible answers. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 7 hits, etc.)
QUESTION IO-2H. Give a permutation that will observe 2 hits under LRU eviction, which is the minimum for any permutation. There is one unique answer. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 3 hits, etc.)
IO-3. Processor cache
The git version control system is based on commit hashes, which are 160-bit (20-byte) hash values used to identify commits. In this problem you’ll consider the processor cache behavior of several versions of a “grading server” that maps commits to grades. Here’s the first version:
struct commit_info {
char hash[20];
int grade[11];
};
commit_info* commits;
size_t N;
int get_grade1(const char* hash, int pset) {
for (size_t i = 0; i != N; ++i) {
if (memcmp(commits[i].hash, hash, 20) == 0) {
return commits[i].grade[pset];
}
}
return -1;
}
We will ask questions about the average number of distinct 64-byte cache lines accessed
by variants of get_grade(hash, pset)
. You should make the following
assumptions:
- The
hash
argument is uniformly drawn from the set of known commits. That is, the probability thathash
equals the ith commit’s hash is 1/N
. - Only count cache lines accessible via
commits
. Don’t worry about cache lines used for local variables, for parameters, for global variables, or for instructions. For instance, do not count thehash
argument or the global-data cache line that stores thecommits
variable itself. - The
commits
pointer is 64-byte aligned and cache lines are 64 bytes long. - Commit hashes are mathematically indistinguishable from random numbers. Thus, the probability that two different hashes have the same initial k bits equals 1/2k.
- We’ll ignore small errors; N/2 and (N+1)/2 will be considered equivalent.
QUESTION IO-3A. What is the expected number of cache lines accessed
by get_grade1
, in terms of N
?
The second version:
struct commit_info {
char hash[20];
int grade[11];
};
commit_info** commits;
size_t N;
int get_grade2(const char hash[20], int pset) {
for (size_t i = 0; i != N; ++i) {
if (memcmp(commits[i]->hash, hash, 20) == 0) {
return commits[i]->grade[pset];
}
}
return -1;
}
QUESTION IO-3B. What is the expected number of cache lines accessed
by get_grade2
, in terms of N
?
The third version:
struct commit_info {
char hash[20];
int grade[11];
};
struct commit_hint {
char hint[8];
commit_info* commit;
};
commit_hint* commits;
size_t N;
int get_grade3(const char* hash, int pset) {
for (size_t i = 0; i != N; ++i) {
if (memcmp(commits[i].hint, hash, 8) == 0
&& memcmp(commits[i].commit->hash, hash, 20) == 0) {
return commits[i].commit->grade[pset];
}
}
return -1;
}
QUESTION IO-3C. What is the expected number of cache lines accessed
by get_grade3
, in terms of N
? (You may assume that N
≤2000.)
The fourth version is a hash table.
struct commit_info {
char hash[20];
int grade[11];
};
commit_info** commits;
size_t commits_hashsize;
int get_grade4(const char* hash, int pset) {
// choose initial bucket
size_t bucket;
memcpy(&bucket, hash, sizeof(bucket));
bucket = bucket % commits_hashsize;
// search for the commit starting from that bucket
while (commits[bucket] != nullptr) {
if (memcmp(commits[bucket]->hash, hash, 20) == 0) {
return commits[bucket]->grade[pset];
}
bucket = (bucket + 1) % commits_hashsize;
}
return -1;
}
QUESTION IO-3D. Assume that a call to get_grade4
encounters B - 1
expected hash collisions (i.e., examines B
buckets total, including the
bucket that actually contains hash
). What is the expected number of cache
lines accessed by get_grade4
, in terms of N
and B
?
IO-4. IO caching and strace
Elif Batuman is investigating several program executables left behind by
her ex-roommate Fyodor. She runs each executable under strace
in the
following way:
strace -o strace.txt ./EXECUTABLE files/text1meg.txt > files/out.txt
Help her figure out properties of these programs based on their system call traces.
QUESTION IO-4A. Program ./mysterya
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x8193000
brk(0x81b5000) = 0x81b5000
read(3, "A", 1) = 1
write(1, "A", 1) = 1
read(3, "\n", 1) = 1
write(1, "\n", 1) = 1
read(3, "A", 1) = 1
write(1, "A", 1) = 1
read(3, "'", 1) = 1
write(1, "'", 1) = 1
read(3, "s", 1) = 1
write(1, "s", 1) = 1
...
List at least one option in each column.
|
|
|
|
QUESTION IO-4B. Program ./mysteryb
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x96c5000
brk(0x96e6000) = 0x96e6000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
read(3, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
write(1, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
...
List at least one option in each column.
|
|
|
|
QUESTION IO-4C. Program ./mysteryc
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x9064000
brk(0x9085000) = 0x9085000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1046528, SEEK_SET) = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1044480, SEEK_SET) = 1044480
read(3, "Quinton\nQuinton's\nQuirinal\nQuisl"..., 2048) = 2048
write(1, "ehR\neehR\naehR\ns'hR\nhR\nsdlonyeR\ns"..., 2048) = 2048
lseek(3, 1042432, SEEK_SET) = 1042432
read(3, "emyslid's\nPrensa\nPrensa's\nPrenti"..., 2048) = 2048
write(1, "\ns'nailitniuQ\nnailitniuQ\nnniuQ\ns"..., 2048) = 2048
lseek(3, 1040384, SEEK_SET) = 1040384
read(3, "Pindar's\nPinkerton\nPinocchio\nPin"..., 2048) = 2048
write(1, "rP\ndilsymerP\ns'regnimerP\nregnime"..., 2048) = 2048
...
List at least one option in each column.
|
|
|
|
QUESTION IO-4D. Program ./mysteryd
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x9a0e000
brk(0x9a2f000) = 0x9a2f000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1048575, SEEK_SET) = 1048575
read(3, "o", 2048) = 1
lseek(3, 1048574, SEEK_SET) = 1048574
read(3, "Ro", 2048) = 2
lseek(3, 1048573, SEEK_SET) = 1048573
read(3, "\nRo", 2048) = 3
...
lseek(3, 1046528, SEEK_SET) = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1046527, SEEK_SET) = 1046527
read(3, "eingau\nRheingau's\nRhenish\nRhiann"..., 2048) = 2048
lseek(3, 1046526, SEEK_SET) = 1046526
read(3, "heingau\nRheingau's\nRhenish\nRhian"..., 2048) = 2048
...
List at least one option in each column.
|
|
|
|
QUESTION IO-4E. Program ./mysterye
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x93e5000
brk(0x9407000) = 0x9407000
read(3, "A", 1) = 1
read(3, "\n", 1) = 1
read(3, "A", 1) = 1
...
read(3, "A", 1) = 1
read(3, "l", 1) = 1
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 1024) = 1024
read(3, "t", 1) = 1
read(3, "o", 1) = 1
read(3, "n", 1) = 1
...
List at least one option in each column.
|
|
|
|
QUESTION IO-4F. Program ./mysteryf
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x9281000
brk(0x92a3000) = 0x92a3000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 4096) = 4096
write(1, "A", 1) = 1
write(1, "\n", 1) = 1
write(1, "A", 1) = 1
...
write(1, "A", 1) = 1
write(1, "l", 1) = 1
read(3, "ton's\nAludra\nAludra's\nAlva\nAlvar"..., 4096) = 4096
write(1, "t", 1) = 1
write(1, "o", 1) = 1
write(1, "n", 1) = 1
...
List at least one option in each column.
|
|
|
|
IO-5. Processor cache
The following questions use the following C definition for an N
xM
matrix (the matrix has N
rows and M
columns).
struct matrix {
unsigned N;
unsigned M;
double elt[0];
};
matrix* matrix_create(unsigned N, unsigned M) {
matrix* m = (matrix*) malloc(sizeof(matrix) + N * M * sizeof(double));
m->N = N;
m->M = M;
for (size_t i = 0; i < N * M; ++i) {
m->elt[i] = 0.0;
}
return m;
}
Typically, matrix data is stored in row-major order: element
mij (at row i and column
j) is stored in m->elt[i*m->M + j]
. We might write this in C
using an inline function:
inline double* melt1(matrix* m, unsigned i, unsigned j) {
return &m->elt[i * m->M + j];
}
But that’s not the only possible method to store matrix data. Here are several more.
inline double* melt2(matrix* m, unsigned i, unsigned j) {
return &m->elt[i + j * m->N];
}
inline double* melt3(matrix* m, unsigned i, unsigned j) {
return &m->elt[i + ((m->N - i + j) % m->M) * m->N];
}
inline double* melt4(matrix* m, unsigned i, unsigned j) {
return &m->elt[i + ((i + j) % m->M) * m->N];
}
inline double* melt5(matrix* m, unsigned i, unsigned j) {
assert(m->M % 8 == 0);
unsigned k = (i/8) * (m->M/8) + (j/8);
return &m->elt[k*64 + (i % 8) * 8 + j % 8];
}
QUESTION IO-5A. Which method (of melt1
–melt5
) will have the best
processor cache behavior if most matrix accesses use loops like this?
for (unsigned j = 0; j < 100; ++j) {
for (unsigned i = 0; i < 100; ++i) {
f(*melt(m, i, j));
}
}
QUESTION IO-5B. Which method will have the best processor cache behavior if most matrix accesses use loops like this?
for (unsigned i = 0; i < 100; ++i) {
f(*melt(m, i, i));
}
QUESTION IO-5C. Which method will have the best processor cache behavior if most matrix accesses use loops like this?
for (unsigned i = 0; i < 100; ++i) {
for (unsigned j = 0; j < 100; ++j) {
f(*melt(m, i, j));
}
}
QUESTION IO-5D. Which method will have the best processor cache behavior if most matrix accesses use loops like this?
for (int di = -3; di <= 3; ++di) {
for (int dj = -3; dj <= 3; ++dj) {
f(*melt(m, I + di, J + dj));
}
}
QUESTION IO-5E. Here is a matrix-multiply function in ikj order.
matrix* matrix_multiply(matrix* a, matrix* b) {
assert(a->M == b->N);
matrix* c = matrix_create(a->N, b->M);
for (unsigned i = 0; i != a->N; ++i) {
for (unsigned k = 0; k != a->M; ++k) {
for (unsigned j = 0; j != b->M; ++j) {
*melt(c, i, j) += *melt(a, i, k) * *melt(b, k, j);
}
}
}
}
This loop order is cache-optimal when data is stored in melt1
order.
What loop order is cache-optimal for melt2
?
QUESTION IO-5F. You notice that accessing a matrix element using melt1
is very slow. After some debugging, it seems like the processor on which you
are running code has a very slow integer multiply instruction. Briefly
describe a change to struct matrix
that would let you write a version of
melt1
with no integer multiply instruction. You may add members, change
sizes, or anything you like.
IO-6. Caching
Assume that we have a cache that holds four slots. Assume that each letter below indicates an access to a block. Answer the following questions as they pertain to the following sequence of accesses.
E D C B A E D A A A B C D E
QUESTION IO-6A. What is the hit rate assuming an LRU replacement policy?
QUESTION IO-6B. What pages will you have in the cache at the end of the run?
QUESTION IO-6C. What is the best possible hit rate attainable if you could see into the future?
IO-7. Caching
Intel and CrossPoint have announced a new persistent memory technology with performance approaching that of DRAM. Your job is to calculate some performance metrics to help system architectects decide how to best incorporate this new technology into their platform.
Let's say that it takes 64ns to access one (32-bit) word of main memory (DRAM) and 256ns to access one (32-bit) word of this new persistent memory, which we'll call NVM (non-volatile memory). The block size of the NVM is 256 bytes. The NVM designers are quite smart and although it takes a long time to access the first byte, when you are accessing NVM sequentially, the devices perform read ahead and stream data efficiently -- at 32 GB/second, which is identical to the bandwidth of DRAM.
QUESTION IO-7A. Let's say that we are performing random accesses of 32 bits (on a 32-bit processor). What fraction of the accesses must be to main memory (as opposed to NVM) to achieve performance within 10% of DRAM?
QUESTION IO-7B. Let's say that they write every byte of a 256 block in units of 32 bits. How much faster will write-back cache perform relative to a write-through cache? (An approximate order of magnitude will be sufficient; showing work can earn partial credit.)
QUESTION IO-7C. Why might you not want to use a write-back cache?
IO-8. Reference strings
The following questions concern the FIFO (First In First Out), LRU (Least Recently Used), and LFU (Least Frequently Used) cache eviction policies.
Your answers should refer to seven-item reference strings made up of digits in the range 0–9. An example answer might be “1231231”. In each case, the reference string is processed by a 3-slot cache that’s initially empty.
QUESTION IO-8A. Give a reference string that has a 1/7 hit rate in all three policies.
QUESTION IO-8B. Give a reference string that has a 6/7 hit rate in all three policies.
QUESTION IO-8C. Give a reference string that has different hit rates under LRU and LFU policies, and compute the hit rates.
String:
LRU hit rate:
LFU hit rate:
QUESTION IO-8D. Give a reference string that has different hit rates under FIFO and LRU policies, and compute the hit rates.
String:
FIFO hit rate:
LRU hit rate:
QUESTION IO-8E. Now let's assume that you know a reference string in advance. Given a 3-slot cache and the following reference string, what caching algorithm discussed in class and/or exercises would produce the best hit rate, and would would that hit rate be?
“12341425321521”
IO-9. Caching: Access times and hit rates
Recall that x86-64 instructions can access memory in units of 1, 2, 4, or 8 bytes at a time. Assume we are running on an x86-64-like machine with 1024-byte cache lines. Our machine takes 32ns to access a unit if the cache hits, regardless of unit size. If the cache misses, an additional 8160ns are required to load the cache, for a total of 8192ns.
QUESTION IO-9A. What is the average access time per access to access all the data in a cache line as an array of 256 integers, starting from an empty cache?
QUESTION IO-9B. What unit size (1, 2, 4, or 8) minimizes the access time to access all data in a cache line, starting from an empty cache?
QUESTION IO-9C. What unit size (1, 2, 4, or 8) maximizes the hit rate to access all data in a cache line, starting from an empty cache?
IO-10. Single-slot cache code
Donald Duck is working on a single-slot cache for reading. He’s using
the pos_tag
/end_tag
representation, which
is:
struct io61_file {
int fd;
static constexpr off_t bufsize = 4096; // block size for this cache
unsigned char cbuf[bufsize];
off_t tag; // file offset of first character in cache (same as before)
off_t pos_tag; // file offset of next char to read in cache
off_t end_tag; // file offset one past last valid char in cache
};
Here’s our solution code.
1: ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
2: size_t pos = 0;
3: while (pos != sz) {
4: if (f->pos_tag < f->end_tag) {
5: ssize_t n = sz - pos;
6: if (n > f->end_tag - f->pos_tag) {
7: n = f->end_tag - f->pos_tag;
8: }
9: memcpy(&buf[pos], &f->cbuf[f->pos_tag - f->tag], n);
10: f->pos_tag += n;
11: pos += n;
12: } else {
13: f->tag = f->end_tag;
14: ssize_t n = read(f->fd, f->cbuf, BUFSIZ);
15: if (n > 0) {
16: f->end_tag += n;
17: } else {
18: return pos ? pos : n;
19: }
20: }
21: }
22: return pos;
23: }
Donald has ideas for “simplifying” this code. Specifically, he wants to try each of the following independently:
- Replacing line 4 with “
if (f->pos_tag <= f->end_tag) {
”. - Removing lines 6–8.
- Removing line 10.
- Removing lines 17–18.
QUESTION IO-10A. Which simplifications could lead to undefined behavior? List all that apply or say “none.”
QUESTION IO-10B. Which simplifications could cause io61_read
to
loop forever without causing undefined behavior? List all that apply or
say “none.”
QUESTION IO-10C. Which simplifications could lead to io61_read
returning incorrect data in buf
, meaning that the data read by a
series of io61_read
calls won’t equal the data in the file? List
all that apply or say “none.”
QUESTION IO-10D. Chastened, Donald decides to optimize the code for
a specific situation, namely when io61_read
is called with a sz
that is larger than BUFSIZ
. He wants to add code after line 11, like
so, so that fewer read
system calls will happen for large sz
:
12: } else if (sz - pos > BUFSIZ) {
// DONALD’S CODE HERE
12A: } else {
13: f->tag = f->end_tag;
....
Finish Donald’s code. Your code should maintain the relevant invariants
between tag
, pos_tag
, end_tag
, and the file position, but
you need not keep tag
aligned.
IO-11. Caching
QUESTION IO-11A. If it takes 200ns to access main memory, which of the following two caches will produce a lower average access time?
- A cache with a 10ns access time that produces a 90% hit rate
- A cache with a 20ns access time that produces a 98% hit rate
QUESTION IO-11B. Let’s say that you have a direct-mapped cache with
four slots. A block with address N
must reside in the slot numbered
N % 4
. What is the best hit rate this could achieve given the
following reference string?
3 6 7 5 3 2 1 1 1 8
QUESTION IO-11C. What is the best hit rate a fully-associative four-slot cache could achieve for that reference string? (A fully-associative cache may put any page in any slot. You may assume you know the full reference string in advance.)
QUESTION IO-11D. What hit rate would the fully-associative four-slot cache achieve if it used the LRU eviction policy?
IO-12. I/O traces
QUESTION IO-12A. Which of the following programs cannot be distinguished
by the output of the strace
utility, not considering open
calls? List all
that apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t
be distinguished, and D & E can’t be distinguished, but the groups can be
distinguished from each other), list them all.
- Sequential byte writes using stdio
- Sequential byte writes using system calls
- Sequential byte writes using system calls and
O_SYNC
- Sequential block writes using stdio and block size 2
- Sequential block writes using system calls and block size 2
- Sequential block writes using system calls and
O_SYNC
and block size 2 - Sequential block writes using stdio and block size 4096
- Sequential block writes using system calls and block size 4096
- Sequential block writes using system calls and
O_SYNC
and block size 4096
QUESTION IO-12B. Which of the programs in Part A cannot be distinguished
using blktrace
output? List all that apply.
QUESTION IO-12C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.
- Application programs can obtain direct read access to the buffer cache
- Application programs can obtain direct write access to the disk, bypassing the buffer cache
- Other computers can communicate with the disk independently
- The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost
QUESTION IO-12D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.
IO-13. Reference strings and eviction
QUESTION IO-13A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.
QUESTION IO-13B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.
QUESTION IO-13C. Give a single reference string with the following properties:
- There exists a cache size and eviction policy that gives a 70% hit rate for the string.
- There exists a cache size and eviction policy that gives a 0% hit rate for the string.
QUESTION IO-13D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.
- FIFO
- LRU
- Random
IO-14. Cache code
Several famous musicians have just started working on CS61 Problem Set 3. They share the following code for their read-only, sequential, single-slot cache:
struct io61_file {
int fd;
unsigned char buf[4096];
size_t pos; // position of next character to read in `buf`
size_t sz; // number of valid characters in `buf`
};
int io61_readc(io61_file* f) {
if (f->pos >= f->sz) {
f->pos = f->sz = 0;
ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
if (nr <= 0) {
f->sz = 0;
return -1;
} else {
f->sz = nr;
}
}
int ch = f->buf[f->pos];
++f->pos;
return ch;
}
But they have different io61_read
implementations. Donald
(Lambert)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
return read(f->fd, buf, sz);
}
Solange (Knowles)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
for (size_t pos = 0; pos < sz; ++pos, ++buf) {
*buf = io61_readc(f);
}
return sz;
}
Caroline (Shaw)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
if (f->pos >= f->sz) {
return read(f->fd, buf, sz);
} else {
int ch = io61_readc(f);
if (ch < 0) {
return 0;
}
*buf = ch;
return io61_read(f, buf + 1, sz - 1) + 1;
}
}
You are testing each of these musicians’ codes by executing a sequence
of io61_readc
and/or io61_read
calls on an input file and
printing the resulting characters to standard output. There are no
seeks, and your test programs print until end of file, so your tests’
output should equal the input file’s contents.
You should assume for these questions that no read
system call
ever returns -1.
QUESTION IO-14A. Describe an access pattern—that is, a sequence of
io61_readc
and/or io61_read
calls (with lengths)—for which
Donald’s code can return incorrect data.
QUESTION IO-14B. Which of these musicians’ codes can generate an output file with incorrect length?
For the remaining parts, assume the problem or problems in Part B have been corrected, so that all musicians’ codes generate output files with correct lengths.
QUESTION IO-14C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.
QUESTION IO-14D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.
IO-15. Caches
Parts A–C concern different implementations of Pset 3’s stdio cache. Assume a program that reads a 32768-byte file a character at a time, like this:
while (io61_readc(inf) != EOF) {
}
This program will call io61_readc
32769 times. (32769 =
215 + 1 = 8×212 + 1; the +1 accounts for the EOF
return.) But the cache implementation might make many fewer system
calls.
QUESTION IO-15A. How many read
system calls are required assuming a
single-slot io61 cache with 4096-byte blocks?
QUESTION IO-15B. How many read
system calls are required assuming an
eight-slot io61 cache with 4096-byte blocks?
QUESTION IO-15C. How many mmap
system calls are required assuming an
mmap
-based io61 cache?
Parts D–F concern cache implementations and styles. We discussed many caches in class, including:
- The buffer cache
- The processor cache
- Single-slot aligned stdio caches
- Single-slot unaligned stdio caches
- Circular bounded buffers
QUESTION IO-15D. Which of those caches are implemented entirely in hardware? List all that apply.
QUESTION IO-15E. Which of those software caches could help speed up reverse sequential access to a disk file? List all that apply.
QUESTION IO-15F. Which of those software caches could help speed up access to a pipe or network socket? List all that apply.
IO-16. LRU
These questions concern the least recently used (LRU) and first-in first-out (FIFO) cache eviction policies.
QUESTION IO-16A. List all that apply.
- LRU is better than FIFO for a workload that consists of reading a file in sequential order.
- If two LRU caches process the same reference string starting from an empty state, then the cache with more slots always has a better hit rate.
- If two LRU caches process the same reference string starting from an empty state, then the cache with more slots never has a worse hit rate.
- LRU and FIFO should have the same hit rate on average for a workload that consists of reading a file in random order.
- None of the above.
For the next two questions, consider a cache with 5 slots that has just processed the reference string 12345. (Thus, its slots contain 1, 2, 3, 4, and 5.)
QUESTION IO-16B. Write a reference string that will observe a higher hit rate under LRU than under FIFO if executed on this cache.
QUESTION IO-16C. Write a reference string that will observe a higher hit rate under FIFO than under LRU if executed on this cache.
The remaining questions in this problem concern the operating system’s buffer
cache. LRU requires detecting each use of a cache block (to track the time the
block was most recently used). In the buffer cache, the “blocks” are physical
memory pages, and blocks are “used” by reads, writes, and accesses to mmap
ed
memory.
QUESTION IO-16D. Which of these changes would let a WeensyOS-like operating system reliably track when buffer-cache physical memory pages are used? List all that apply.
- Adding a system call
track(uintptr_t physical_address)
that a process should call when it accesses a physical page. - Adding a member
boottime_t lru_time
tostruct proc
. (boottime_t
is a type that measures the time since boot.) - Adding a member
boottime_t lru_time
tostruct pageinfo
. - Modifying kernel system call implementations to update
the relevant
lru_time
members when buffer-cache pages are accessed. - None of these changes will help.
QUESTION IO-16E. The mmap
system call complicates LRU tracking for
buffer-cache pages. Why? List all that apply.
mmap
maps buffer-cache pages directly into a process’s address space.- Accessing memory in
mmap
ed regions does not normally invoke the kernel. - Accessing memory in
mmap
ed regions does not use a page table. mmap
starts with two of the letterm
, causing LRU to become confused about whichm
was used least recently.- None of the above.
IO-17. Reference strings and hit rates
QUESTION IO-17A. Write a purely-sequential reference string containing at least five accesses.
QUESTION IO-17B. What is the hit rate for this reference string? Tell us the eviction algorithm and number of slots you’ve chosen.
The next two questions concern this ten-element reference string:
We consider executing this reference string starting with different cache contents.
QUESTION IO-17C. A three-slot LRU cache processes this reference string and observes a 70% hit rate. What are the initial contents of the cache?
QUESTION IO-17D. A three-slot FIFO cache processes this reference string with initial contents 4 1 2 and observes a 60% hit rate. Which slot was next up for eviction when the reference string began?
The eviction algorithms we saw in class are entirely reactive: they only insert a block when that block is referenced. This limits how well the cache can perform. A read cache can also be proactive by inserting blocks before they’re needed, possibly speeding up later accesses. This is the essence of prefetching.
In a proactive caching model, the cache can evict and load two or more blocks per access in the reference string. A prefetching policy decides which additional, non-accessed blocks to load.
QUESTION IO-17E. Describe an access pattern for which the following prefetching policy would be effective.
When accessing block A, also load block A+1.
QUESTION IO-17F. Write a reference string and name an eviction policy for which this prefetching policy would be less effective (have a lower hit rate) than no prefetching at all.
IO-18. Coherence
QUESTION IO-18A. Which of the kinds of cache we discussed in class are typically coherent?
QUESTION IO-18B. Which of the kinds of cache we discussed in class are typically single-slot?
Stdio-like caches are not coherent. The remaining questions concern potential mechanisms to make them coherent with respect to disk files.
Pedantic note. Sometimes a read-from-cache operation will occur concurrently with (at the same time as) a write to stable storage. The read operation counts as coherent whether or not it reflects the concurrent write, because logically the read and write occurred “at the same time” (neither is older).
QUESTION IO-18C. First, the new bool changed()
system call returns true if
and only if a write
was performed on some file in the last second.
Describe briefly how changed
could be used to make a stdio
cache coherent, or explain why it could not.
QUESTION IO-18D. Second, the new int open_with_timestamp(const char* filename, unsigned long* timestamp, ...)
system call is like open
, except
that every time a change is made to the underlying filename
, the value in
*timestamp
is updated to the time, measured in milliseconds since last boot,
of the last write
operation on the file represented by file descriptor fd
.
Describe briefly how open_with_timestamp
could be used to
make a stdio cache coherent, or explain why it could not.
QUESTION IO-18E. Describe briefly how mmap
could be used to make a stdio
cache coherent, or explain why it could not.
IO-19. System calls
QUESTION IO-19A. A program makes these system calls:
int fd = open("f.txt", O_WRONLY | O_CREAT | O_TRUNC);
ssize_t nw = write(fd, "CS121 is awesome!", 17); // returned 17
What following series of system calls would ensure that, after all system
calls complete, the file f.txt
contains the text “CS 61 is terrible
”
(without the quotation marks)? Minimize the number of bytes written.
QUESTION IO-19B. Which of the following file access patterns might have similar
output from the strace
utility? List all that apply or say “none.”
- Sequential byte writes using stdio
- Sequential byte writes using
mmap
- Sequential byte writes using system calls
QUESTION IO-19C. Which of the following file access patterns might have similar
output from the strace
utility? List all that apply or say “none.”
- Sequential byte writes using stdio
- Sequential block writes using stdio
- Sequential byte writes using system calls
- Sequential block writes using system calls
QUESTION IO-19D. Which of the following file access patterns might have similar
output from the strace
utility? List all that apply or say “none.”
- Reverse-sequential byte writes using stdio
- Reverse-sequential block writes using stdio
- Reverse-sequential byte writes using system calls
- Reverse-sequential block writes using system calls
IO-20. Caches Big, Fast, and Cheap
QUESTION IO-20A. We discussed several kinds of computer storage in class, including:
- Disks and SSDs
- primary Memory
- processor Cache memory
- Registers
Put those storage technologies in order by latency, slowest first. (An answer might be “DMCR”.)
QUESTION IO-20B. Put those storage technologies in order by cost per byte, cheapest first.
QUESTION IO-20C. Put those storage technologies in order by capacity in bytes on a typical computer, smallest first.
QUESTION IO-20D. Which storage technology acts as the typical slow memory (the “underlying storage”) for each of the following caches? If unsure, explain briefly.
- Buffer cache
- Stdio cache
- Processor cache
QUESTION IO-20E. True or false? Given a cache with two or more blocks, implementing it as a fully associative cache would always produce as good or better hit rates than a direct-mapped implementation.
QUESTION IO-20F. True or false? Prefetching bytes from a file on disk into the buffer cache can cause the buffer cache to become incoherent.
IO-21. Refrigeration
Your minifridge can hold up to 4 items. Your picky roommate insists that Apple juice goes in the door, Berries go in the fruits and vegetables drawer, Cheese goes in the dairy drawer, and Dill pickles go on the top shelf.
QUESTION IO-21A. Think of the minifridge as a cache for food bought at your corner grocery store. What kind of cache has your roommate forced you to implement?
You decide that you’ve had enough of your roommate and move. The new place’s full-size fridge can hold all your favorite foods at once (not just 4 items), but the new counter can only hold 4 items at a time. When 4 items are already on the counter, you must bring an item back to the fridge before getting a new item.
QUESTION IO-21B. To cook lunch, you must use the following food items in the
following order:
Apple juice, Berries, Cheese, Dill pickles, Eggs, Fish sticks,
A, B, E, D, A, C, D, E
(So the full reference recipe is ABCDEFABEDACDE.) Assuming a FIFO
(round-robin) eviction policy, what items are in the fridge after
the last cooking step?
QUESTION IO-21C. What is the hit rate for FIFO on this reference recipe?
QUESTION IO-21D. What is the hit rate for LRU on this reference recipe?
NOT A QUESTION. (0 points) Could this meal possibly be delicious?
QUESTION IO-21E. True or false for this cache? List all that apply.
- All recipes that use each ingredient exactly once will have the same hit rate under FIFO and LRU.
- All recipes that use each ingredient exactly twice will have the same hit rate under FIFO and LRU.
- All recipes that use fewer than 5 ingredients will have the same hit rate under FIFO and LRU.
- None of the above.