IO and Caching
IO-2. Caches and reference strings
QUESTION IO-2A. True or false: A direct-mapped cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.
False. Direct-mapped caches can have conflict misses.
QUESTION IO-2B. True or false: A fully-associative cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.
True
Consider the following 5 reference strings.
Name | String |
---|---|
α | 1 |
β | 1, 2 |
γ | 1, 2, 3, 4, 5 |
δ | 2, 4 |
ε | 5, 2, 4, 2 |
QUESTION IO-2C. Which of the strings might indicate a sequential access pattern? Circle all that apply.
α | β | γ | δ | ε | None of these |
(α), β, γ
QUESTION IO-2D. Which of the strings might indicate a strided access pattern with stride >1? Circle all that apply.
α | β | γ | δ | ε | None of these |
(α), δ
One very clever person pointed out that β and γ could also represent large strides: for example, consider a file with 10 bytes accessed with stride 11!
The remaining questions concern concatenated permutations of these five strings. For example, the permutation αβγδε refers to this reference string:
1, 1, 2, 1, 2, 3, 4, 5, 2, 4, 5, 2, 4, 2. |
We pass such permutations through an initially-empty, fully-associative cache with 3 slots, and observe the numbers of hits.
QUESTION IO-2E. How many cold misses might a permutation observe? Circle all that apply.
0 | 1 | 2 | 3 | 4 | 5 | Some other number |
5. The first time a reference string address is encountered, it must cause a cold miss.
Under LRU eviction, the permutation αβεγδ observes 5 hits as follows. (We annotate each access with “h” for hit or “m” for miss.)
1m; 1h, 2m; 5m, 2h, 4m, 2h; 1m, 2h, 3m, 4m, 5m; 2m, 4h. |
QUESTION IO-2F. How many hits does this permutation observe under FIFO eviction?
4 hits.
QUESTION IO-2G. Give a permutation that will observe 8 hits under LRU eviction, which is the maximum for any permutation. There are several possible answers. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 7 hits, etc.)
The following four permutations observe 8 hits under LRU: αβγδε, αβγεδ, βαγδε, βαγεδ. 28 permutations observe 7 hits; 25 observe 6 hits; and 38 observe 5 hits.
QUESTION IO-2H. Give a permutation that will observe 2 hits under LRU eviction, which is the minimum for any permutation. There is one unique answer. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 3 hits, etc.)
δαεγβ. 4 permutations observe 3 hits and 20 observe 4 hits.
IO-4. IO caching and strace
Elif Batuman is investigating several program executables left behind by
her ex-roommate Fyodor. She runs each executable under strace
in the
following way:
strace -o strace.txt ./EXECUTABLE files/text1meg.txt > files/out.txt
Help her figure out properties of these programs based on their system call traces.
QUESTION IO-4A. Program ./mysterya
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x8193000
brk(0x81b5000) = 0x81b5000
read(3, "A", 1) = 1
write(1, "A", 1) = 1
read(3, "\n", 1) = 1
write(1, "\n", 1) = 1
read(3, "A", 1) = 1
write(1, "A", 1) = 1
read(3, "'", 1) = 1
write(1, "'", 1) = 1
read(3, "s", 1) = 1
write(1, "s", 1) = 1
...
Circle at least one option in each column.
|
|
|
|
1, a, i, D
QUESTION IO-4B. Program ./mysteryb
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x96c5000
brk(0x96e6000) = 0x96e6000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
read(3, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
write(1, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
...
Circle at least one option in each column.
|
|
|
|
1, b/c, ii, B
QUESTION IO-4C. Program ./mysteryc
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x9064000
brk(0x9085000) = 0x9085000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1046528, SEEK_SET) = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1044480, SEEK_SET) = 1044480
read(3, "Quinton\nQuinton's\nQuirinal\nQuisl"..., 2048) = 2048
write(1, "ehR\neehR\naehR\ns'hR\nhR\nsdlonyeR\ns"..., 2048) = 2048
lseek(3, 1042432, SEEK_SET) = 1042432
read(3, "emyslid's\nPrensa\nPrensa's\nPrenti"..., 2048) = 2048
write(1, "\ns'nailitniuQ\nnailitniuQ\nnniuQ\ns"..., 2048) = 2048
lseek(3, 1040384, SEEK_SET) = 1040384
read(3, "Pindar's\nPinkerton\nPinocchio\nPin"..., 2048) = 2048
write(1, "rP\ndilsymerP\ns'regnimerP\nregnime"..., 2048) = 2048
...
Circle at least one option in each column.
|
|
|
|
2, c, ii, B
QUESTION IO-4D. Program ./mysteryd
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x9a0e000
brk(0x9a2f000) = 0x9a2f000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1048575, SEEK_SET) = 1048575
read(3, "o", 2048) = 1
lseek(3, 1048574, SEEK_SET) = 1048574
read(3, "Ro", 2048) = 2
lseek(3, 1048573, SEEK_SET) = 1048573
read(3, "\nRo", 2048) = 3
...
lseek(3, 1046528, SEEK_SET) = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1046527, SEEK_SET) = 1046527
read(3, "eingau\nRheingau's\nRhenish\nRhiann"..., 2048) = 2048
lseek(3, 1046526, SEEK_SET) = 1046526
read(3, "heingau\nRheingau's\nRhenish\nRhian"..., 2048) = 2048
...
Circle at least one option in each column.
|
|
|
|
2, b, ii, B
QUESTION IO-4E. Program ./mysterye
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x93e5000
brk(0x9407000) = 0x9407000
read(3, "A", 1) = 1
read(3, "\n", 1) = 1
read(3, "A", 1) = 1
...
read(3, "A", 1) = 1
read(3, "l", 1) = 1
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 1024) = 1024
read(3, "t", 1) = 1
read(3, "o", 1) = 1
read(3, "n", 1) = 1
...
Circle at least one option in each column.
|
|
|
|
1, a, ii, C
Some people will circle C and D, because there’s no read cache, so the read cache is “other.” That’s OK.
QUESTION IO-4F. Program ./mysteryf
:
open("files/text1meg.txt", O_RDONLY) = 3
brk(0) = 0x9281000
brk(0x92a3000) = 0x92a3000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 4096) = 4096
write(1, "A", 1) = 1
write(1, "\n", 1) = 1
write(1, "A", 1) = 1
...
write(1, "A", 1) = 1
write(1, "l", 1) = 1
read(3, "ton's\nAludra\nAludra's\nAlva\nAlvar"..., 4096) = 4096
write(1, "t", 1) = 1
write(1, "o", 1) = 1
write(1, "n", 1) = 1
...
Circle at least one option in each column.
|
|
|
|
1, b/c, i, A
IO-6. Caching
Assume that we have a cache that holds four pages. Assume that each letter below indicates an access to a page. Answer the following questions as they pertain to the following sequence of accesses.
E D C B A E D A A A B C D E
QUESTION IO-6A. What is the hit rate assuming an LRU replacement policy?
A. Let's see what the cache looks like at each stage (the 4 letters represent the state of the cache and 1's after a line indicate hits). They do not have to have the resulting cache sorted.
E D C B
D C B A
C B A E
B A E D
B E D A 1 1 1 1(hits on A, A, A, B -- changes to next order)
E D A B
D A B C 1 (hit on D, reorders to next line)
A B C D
B C D E
So, the answer is 5/14
QUESTION IO-6B. What pages will you have in the cache at the end of the run?
B. What's left in the cache is: B C D E
QUESTION IO-6C. What is the best possible hit rate attainable if you could see into the future?
C. With Belady's, we get:
E D C B
A E D B 1 1 1 1 1 1
C E D B 1 1
So, our hit rate is 8/14 (or 4/7).
IO-8. Caching: Reference strings
The following questions concern the FIFO (First In First Out), LRU (Least Recently Used), and LFU (Least Frequently Used) cache eviction policies.
Your answers should refer to seven-item reference strings made up of digits in the range 0–9. An example answer might be “1231231”. In each case, the reference string is processed by a 3-slot cache that’s initially empty.
QUESTION IO-8A. Give a reference string that has a 1/7 hit rate in all three policies.
1123456
QUESTION IO-8B. Give a reference string that has a 6/7 hit rate in all three policies.
1111111
QUESTION IO-8C. Give a reference string that has different hit rates under LRU and LFU policies, and compute the hit rates.
String: 1123411
LRU hit rate: 2/7
LFU hit rate: 3/7
QUESTION IO-8D. Give a reference string that has different hit rates under FIFO and LRU policies, and compute the hit rates.
String: 1231411
FIFO hit rate: 2/7
LRU hit rate: 3/7
QUESTION IO-8E. Now let's assume that you know a reference string in advance. Given a 3-slot cache and the following reference string, what caching algorithm discussed in class and/or exercises would produce the best hit rate, and would would that hit rate be?
“12341425321521”
Bélády’s optimal algorithm (ACCENTS REQUIRED FOR FULL CREDIT!)(!*#^‡°
1m 2m 3m 4m [124] 1h 4h 2h 5m [125] 3m [123] 2h 1h 5m [125] 2h 1h
7/14 = 1/2
IO-9. Caching: Access times and hit rates
Recall that x86-64 instructions can access memory in units of 1, 2, 4, or 8 bytes at a time. Assume we are running on an x86-64-like machine with 1024-byte cache lines. Our machine takes 32ns to access a unit if the cache hits, regardless of unit size. If the cache misses, an additional 8160ns are required to load the cache, for a total of 8192ns.
QUESTION IO-9A. What is the average access time per access to access all the data in a cache line as an array of 256 integers, starting from an empty cache?
(8192ns * 1 + 32ns * 255)/256 (= 63.875)
QUESTION IO-9B. What unit size (1, 2, 4, or 8) minimizes the access time to access all data in a cache line, starting from an empty cache?
8
QUESTION IO-9C. What unit size (1, 2, 4, or 8) maximizes the hit rate to access all data in a cache line, starting from an empty cache?
1
Data Representation
DATAREP-3. Hello binary
This problem locates 8-bit numbers horizontally and vertically in the following 16x16 image. Black pixels represent 1 bits and white pixels represent 0 bits. For horizontal arrangements, the most significant bit is on the left as usual. For vertical arrangements, the most significant bit is on top.
Examples: The 8-bit number 15 (hexadecimal 0x0F, binary 0b00001111) is located horizontally at 3,4, which means X=3, Y=4.
- The pixel at 3,4 is white, which has bit value 0.
- 4,4 is white, also 0.
- 5,4 is white, also 0.
- 6,4 is white, also 0.
- 7,4 is black, which has bit value 1.
- 8,4, 9,4, and 10,4 are black, giving three more 1s.
- Reading them all off, this is 0b00001111, or 15.
15 is also located horizontally at 7,6.
The 8-bit number 0 is located vertically at 0,0. It is also located horizontally at 0,0 and 1,0.
The 8-bit number 134 (hexadecimal 0x86, binary 0b10000110) is located vertically at 8,4.
QUESTION DATAREP-3A. Where is 3 located vertically? (All questions refer to 8-bit numbers.)
9,6
QUESTION DATAREP-3B. Where is 12 located horizontally?
5,5
QUESTION DATAREP-3C. Where is 255 located vertically?
14,3
DATAREP-4. Hello memory
Shintaro Tsuji wants to represent the image of Question DATAREP-3 in computer memory. He stores it in an array of 16-bit unsigned integers:
uint16_t cute[16];
Row Y of the image is stored in integer cute[Y]
.
QUESTION DATAREP-4A. What is sizeof(cute)
, 2, 16, 32, or 64?
32
QUESTION DATAREP-4B. printf("%d\n", cute[0]);
prints 16384
. Is
Shintaro’s machine big-endian or little-endian?
Little-endian
DATAREP-5. Hello program
Now that Shintaro has represented the image in memory as an array of
uint16_t
objects, he can manipulate the image using C. For example,
here’s a function.
void swap(void) {
for (int i = 0; i < 16; ++i)
cute[i] = (cute[i] << 8) | (cute[i] >> 8);
}
Running swap
produces the following image:

Shintaro has written several other functions. Here are some images (A is the original):
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
A |
B |
C |
D |
E |
||||
|
||||||||
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
F |
G |
H |
I |
J |
For each function, what image does that function create?
QUESTION DATAREP-5A.
void f0() {
for (int i = 0; i < 16; ++i)
cute[i] = ~cute[i];
}
H. The code flips all bits in the input.
QUESTION DATAREP-5B.
void f1() {
for (int i = 0; i < 16; ++i) {
cute[i] = ((cute[i] >> 1) & 0x5555) | ((cute[i] << 1) & 0xAAAA);
cute[i] = ((cute[i] >> 2) & 0x3333) | ((cute[i] << 2) & 0xCCCC);
cute[i] = ((cute[i] >> 4) & 0x0F0F) | ((cute[i] << 4) & 0xF0F0);
cute[i] = (cute[i] >> 8) | (cute[i] << 8);
}
}
D
QUESTION DATAREP-5C.
void f2() {
char *x = (char *) cute;
for (int i = 0; i < 16; ++i)
x[2*i] = i;
}
J
For “fun”
The following programs generated the other images. Can you match them with their images?
f3
—I; f4
—B; f5
—C; f6
—F; f7
—G; f8
—A; f9
—E
void f3() {
for (int i = 0; i < 16; ++i)
cute[i] &= ~(7 << i);
}
void f4() {
swap();
for (int i = 0; i < 16; ++i)
cute[i] <<= i/4;
swap();
}
void f5() {
for (int i = 0; i < 16; ++i)
cute[i] = -1 * !!(cute[i] & 64);
}
void f6() {
for (int i = 0; i < 8; ++i) {
int tmp = cute[15-i];
cute[15-i] = cute[i];
cute[i] = tmp;
}
}
void f7() {
for (int i = 0; i < 16; ++i)
cute[i] = cute[i] & -cute[i];
}
void f8() {
for (int i = 0; i < 16; ++i)
cute[i] ^= cute[i] ^ cute[i];
}
void f9() {
for (int i = 0; i < 16; ++i)
cute[i] = cute[i] ^ 4080;
}
DATAREP-10. Sizes and alignments
Assume a 64-bit x86-64 architecture unless explicitly told otherwise.
Write your assumptions if a problem seems unclear, and write down your reasoning for partial credit.
QUESTION DATAREP-10A. Use the following members to create a struct
of size 16, using each member exactly once, and putting char a
first;
or say “impossible” if this is impossible.
char a;
(we’ve written this for you)unsigned char b;
short c;
int d;
struct size_16 {
char a;
};
Impossible!
QUESTION DATAREP-10B. Repeat Part A, but create a struct with size 12.
struct size_12 {
char a;
};
abdc, acbd, acdb
QUESTION DATAREP-10C. Repeat Part A, but create a struct with size 8.
struct size_8 {
char a;
};
abcd
QUESTION DATAREP-10D. Consider the following structs:
struct x {
T x1;
U x2;
};
struct y {
struct x y1;
V y2;
};
Give definitions for T, U, and V so that there is one byte of padding in
struct x
after x2
, and two bytes of padding in struct y
after
y1
.
Example: T = short[2]; U = char; V = int
Assembly
ASM-2. Assembly
Here is some x86 assembly code.
f:
movl a, %eax
movl b, %edx
andl $255, %edx
subl %edx, %eax
movl %eax, a
retq
QUESTION ASM-2A. Write valid C code that could have compiled into
this assembly (i.e., write a C definition of function f
), given the
global variable declarations “extern unsigned a, b;
.” Your C code
should compile without warnings. REMINDER: You are not permitted to
run a C compiler, except for the C compiler that is your brain.
Many answers:
void f(void) {
a -= b & 255;
}
void f(void) {
a += -(b % 256);
}
unsigned f(void) {
a = a - b % 0x100;
return a;
}
unsigned f(void) {
a -= (unsigned char) b; /* NB extra credit */
return a;
}
char* f(int x, int y, int z[1000]) {
a -= (unsigned char) b;
return (char*) a;
}
QUESTION ASM-2B. Write different valid, warning-free C code that could have compiled into that assembly. This version should contain different operators than your first version. (For extra credit, use only one operator.)
QUESTION ASM-2C. Again, write different valid, warning-free C code
that could have compiled into that assembly. In this version, f
should
have a different type than in your first version.
ASM-4. Assembly language
The next four questions pertain to the following four code samples.
f1
f3
|
f2
f4
Now answer the following questions. Pick the most likely sample; you will use each sample exactly once. QUESTION ASM-4A. Which sample contains a for loop? f2 QUESTION ASM-4B. Which sample contains a switch statement? f3 QUESTION ASM-4C. Which sample contains only an if/else construct? f1 QUESTION ASM-4D. Which sample contains a while loop? f4 |