CS61 Midterm Sample Answers
1. Sizes and alignment
QUESTION 1A. True or false: For any non-array type X, the size of X
(sizeof(X)
) is greater than or equal to the alignment of type X.
True.
This is also mostly true for arrays. The exception is zero-length
arrays: sizeof(X[0]) == 0
for all X
, but
alignof(X[i]) == alignof(X)
for all X
and i
.
QUESTION 1B. True or false: For any type X, the size of
struct Y { X a; char newc; }
is greater than the size of X.
True
QUESTION 1C. True or false: For any types A1
...An
(with n
≥
1), the size of struct Y
is greater than the size of struct X
,
given:
struct X { struct Y {
A1 a1; A1 a1;
... ...
An an; An an;
}; char newc;
};
False (example: A1 = int
, A2 = char
)
QUESTION 1D. True or false: For any types A1
...An
(with n
≥
1), the size of struct Y
is greater than the size of union X
, given:
union X { struct Y {
A1 a1; A1 a1;
... ...
An an; An an;
}; };
False (if n = 1
)
Let alignof(T)
equal the alignment of type T.
QUESTION 1E. Assume that structure struct Y { ... }
contains K
char
members and M int
members, with K≤M, and nothing else.
Write an expression defining the maximum sizeof(struct Y)
.
4M + 4K
QUESTION 1F. You are given a structure
struct Z { T1 a; T2 b; T3 c; }
that contains no padding. What does
(sizeof(T1) + sizeof(T2) + sizeof(T3)) % alignof(struct Z)
equal?
0
QUESTION 1G. Arrange the following types in increasing order by size. Sample answer: “1 < 2 = 4 < 3” (choose this if #1 has smaller size than #2, which has equal size to #4, which has smaller size than #3).
char
struct minipoint { uint8_t x; uint8_t y; uint8_t z; }
int
unsigned short[1]
char**
double[0]
#6 < #1 < #4 < #2 < #3 = #5
sizeof(x[0]) is actually 0!
2. Expressions
QUESTION 2A. Here are eight expressions. Group the expressions into
four pairs so that the two expressions in each pair have the same value,
and each pair has a different value from every other pair. There is one
unique answer that meets these constraints. m
has the same type and
value everywhere it appears (there’s one unique value for m
that meets
the problem’s constraints). Assume an x86 machine.
sizeof(&m)
-1
m & -m
m + ~m + 1
16 >> 2
m & ~m
m
1
1—5; 2—7; 3—8; 4—6
1—5 is easy. m + ~m + 1 == m + (-m) == 0
, and m & ~m == 0
, giving us
3—8. Now what about the others? m & -m
is, as we know, either 0 or a
power of 2. This eliminates -1
, leaving either m
or 1
. If m & -m
matched with m
, then the remaining pair would be 1
and -1
, which
clearly doesn’t work. Thus m & -m
matches with 1
, and m == -1
.
3. Hello binary
This problem locates 8-bit numbers horizontally and vertically in the following 16x16 image. Black pixels represent 1 bits and white pixels represent 0 bits. For horizontal arrangements, the most significant bit is on the left as usual. For vertical arrangements, the most significant bit is on top.
Examples: The 8-bit number 15 (hexadecimal 0x0F, binary 0b00001111) is located horizontally at 3,4, which means X=3, Y=4.
- The pixel at 3,4 is white, which has bit value 0.
- 4,4 is white, also 0.
- 5,4 is white, also 0.
- 6,4 is white, also 0.
- 7,4 is black, which has bit value 1.
- 8,4, 9,4, and 10,4 are black, giving three more 1s.
- Reading them all off, this is 0b00001111, or 15.
15 is also located horizontally at 7,6.
The 8-bit number 0 is located vertically at 0,0. It is also located horizontally at 0,0 and 1,0.
The 8-bit number 134 (hexadecimal 0x86, binary 0b10000110) is located vertically at 8,4.
QUESTION 3A. Where is 3 located vertically? (All questions refer to 8-bit numbers.)
9,6
QUESTION 3B. Where is 12 located horizontally?
5,5
QUESTION 3C. Where is 255 located vertically?
14,3
4. Hello memory
Shintaro Tsuji wants to represent this image in computer memory. He stores it in an array of 16-bit unsigned integers:
uint16_t cute[16];
Row Y of the image is stored in integer cute[Y]
.
QUESTION 4A. What is sizeof(cute)
, 2, 16, 32, or 64?
32
QUESTION 4B. printf("%d\n", cute[0]);
prints 16384
. Is
Shintaro’s machine big-endian or little-endian?
Little-endian
5. Hello program
Now that we have the image in memory, we can manipulate it using C. For example, here’s a function.
void swap(void) {
for (int i = 0; i < 16; ++i)
cute[i] = (cute[i] << 8) | (cute[i] >> 8);
}
Running swap
produces the following image:
Shintaro has written several other functions. Here are some images (A is the original):
|
|
|
|
|||||
A |
B |
C |
D |
E |
||||
|
||||||||
|
|
|
|
|||||
F |
G |
H |
I |
J |
For each function, what image does that function create?
QUESTION 5A.
void f0() {
for (int i = 0; i < 16; ++i)
cute[i] = ~cute[i];
}
H. The code flips all bits in the input.
QUESTION 5B.
void f1() {
for (int i = 0; i < 16; ++i) {
cute[i] = ((cute[i] >> 1) & 0x5555) | ((cute[i] << 1) & 0xAAAA);
cute[i] = ((cute[i] >> 2) & 0x3333) | ((cute[i] << 2) & 0xCCCC);
cute[i] = ((cute[i] >> 4) & 0x0F0F) | ((cute[i] << 4) & 0xF0F0);
cute[i] = (cute[i] >> 8) | (cute[i] << 8);
}
}
D
QUESTION 5C.
void f2() {
char *x = (char *) cute;
for (int i = 0; i < 16; ++i)
x[2*i] = i;
}
J
For “fun”
The following programs generated the other images in “Hello program.” Can you match them with their images?
f3
—I; f4
—B; f5
—C; f6
—F; f7
—G; f8
—A; f9
—E
void f3() {
for (int i = 0; i < 16; ++i)
cute[i] &= ~(7 << i);
}
void f4() {
swap();
for (int i = 0; i < 16; ++i)
cute[i] <<= i/4;
swap();
}
void f5() {
for (int i = 0; i < 16; ++i)
cute[i] = -1 * !!(cute[i] & 64);
}
void f6() {
for (int i = 0; i < 16; ++i) {
int tmp = cute[15-i];
cute[15-i] = cute[i];
cute[i] = tmp;
}
}
void f7() {
for (int i = 0; i < 16; ++i)
cute[i] = cute[i] & -cute[i];
}
void f8() {
for (int i = 0; i < 16; ++i)
cute[i] ^= cute[i] ^ cute[i];
}
void f9() {
for (int i = 0; i < 16; ++i)
cute[i] = cute[i] ^ 4080;
}
6. Memory regions
Consider the following program:
struct ptrs {
int** x;
int* y;
};
struct ptrs global;
void setup(struct ptrs* p) {
int* a = malloc(sizeof(int));
int* b = malloc(sizeof(int));
int* c = malloc(sizeof(int));
int* d = malloc(sizeof(int));
int* e = malloc(sizeof(int) * 2);
int** f = malloc(4 * sizeof(int *));
int** g = malloc(sizeof(int *));
*a = 0;
*b = 0;
*c = (int) a;
*d = *b;
e[0] = 29;
e[1] = (int) &d[100000];
f[0] = b;
f[1] = c;
f[2] = 0;
f[3] = 0;
*g = c;
global.x = f;
global.y = e;
p->x = g;
p->y = &e[1];
}
int main(int argc, char** argv) {
stack_bottom = (char*) &argc;
struct ptrs p;
setup(&p);
m61_collect();
do_stuff(&p);
}
This program allocates objects a
through g
on the heap and then
stores those pointers in some stack and global variables. (It then calls
our conservative garbage collector from class, but that won’t matter
until the next problem.) We recommend you draw a picture of the state
setup
creates.
QUESTION 6A. Assume that (uintptr_t) a == 0x8300000
, and that
malloc
returns increasing addresses. Match each address to the most
likely expression with that address value. The expressions are evaluated
within the context of main
. You will not reuse an expression.
Value | Expression | |||
---|---|---|---|---|
1. | 0x8300040 | A. | &p |
|
2. | 0x8049894 | B. | (int *) *p.x[0] |
|
3. | 0x8361AF0 | C. | &global.y |
|
4. | 0x8300000 | D. | global.y |
|
5. | 0xBFAE0CD8 | E. | (int*) *p.y |
1—D; 2—C; 3—E; 4—B; 5—A
Since p
has automatic storage duration, it is located on the stack,
giving us 5—A. The global
variable has static storage duration, and so
does its component global.y
; so the pointer &global.y
has an address
that is below all heap-allocated pointers. This gives us 2—C. The
remaining expressions go like this:
global.y == e |
p.y == &e[1] , so *p.y == e[1] == (int) &d[100000] , and (int *) *p.y == &d[100000] |
p.x == g , so p.x[0] == g[0] == *g == c , and *p.x[0] == *c == (int) a |
Address #4 has value 0x8300000, which by assumption is a
’s address;
so 4—B. Address #3 is much larger than the other heap addresses, so
3—E. This leaves 1—D.
7. Garbage collection
Here is the top-level function for the conservative garbage collector we
wrote in class (cs61-lectures/l07/m61-13.c
).
void m61_collect(void) {
char* stack_top = (char*) &stack_top;
// The entire contents of the heap start out unmarked
for (size_t i = 0; i != nmr; ++i)
mr[i].marked = 0;
// Mark all reachable objects, starting with the roots (the stack)
m61_markaccessible(stack_top, stack_bottom - stack_top);
// Free everything that wasn't marked
for (size_t i = 0; i != nmr; ++i)
if (mr[i].marked == 0) {
m61_free(mr[i].ptr);
--i; // m61_free moved different data into this
// slot, so we must recheck the slot
}
}
This garbage collector is not correct because it doesn’t capture all memory roots.
Consider the program from the previous section, and assume that an
object is reachable if do_stuff
can access an address within the
object via variable references and memory dereferences without casts or
pointer arithmetic. Then:
QUESTION 7A. Which reachable objects will m61_collect()
free?
Circle all that apply.
a |
b |
c |
d |
e |
f |
g |
None of these |
b
, f
.
The collector searches the stack for roots. This yields just the values
in struct ptrs p
(the only pointer-containing variable with automatic
storage duration at the time m61_collect
is called). The objects
directly pointed to by p
are g
and e
. The collector then
recursively marks objects pointed to by these objects. From g
, it
finds c
. From e
, it finds nothing. Then it checks one more time.
From c
, it finds the value of a
! Now, a
is actually not a pointer
here—the type of *c
is int
—so by the definition above, a
is not
actually reachable. But the collector doesn’t know this.
Putting it together, the collector marks a
, c
, e
, and g
. It
won’t free these objects; it will free the others (b
, d
, and f
).
But b
and f
are reachable from global
.
QUESTION 7B. Which unreachable objects will m61_collect()
not
free? Circle all that apply.
a |
b |
c |
d |
e |
f |
g |
None of these |
a
QUESTION 7C. Conservative garbage collection in C is often slower than precise garbage collection in languages such as Java. Why? Circle all that apply.
- C is generally slower than other languages.
- Conservative garbage collectors must search all reachable memory for pointers. Precise garbage collectors can ignore values that do not contain pointers, such as large character buffers.
- C programs generally use the heap more than programs in other languages.
- None of the above.
#2
8. I/O caching
Mary Ruefle, a poet who lives in Vermont, is working on her caching I/O library for CS61’s problem set 2. She wants to implement a cache with N slots. Since searching those slots might slow down her library, she writes a function that maps addresses to slots. Here’s some of her code.
#define SLOTSIZ 4096
typedef struct io61_slot {
char buf[SLOTSIZ];
off_t pos; // = (off_t) -1 for empty slots
ssize_t sz;
} io61_slot;
#define NSLOTS 64
struct io61_file {
int fd;
off_t pos; // current file position
io61_slot slots[NSLOTS];
};
static inline int find_slot(off_t off) {
return off % NSLOTS;
}
int io61_readc(io61_file* f) {
int slotindex = find_slot(f->pos);
io61_slot* s = &f->slots[slotindex];
if (f->pos < s->pos || f->pos >= s->pos + s->sz) {
// slot contains wrong data, need to refill it
off_t new_pos = lseek(f->fd, f->pos, SEEK_SET);
assert(new_pos != (off_t) -1); // only handle seekable files for now
ssize_t r = read(f->fd, s->buf, SLOTSIZ);
if (r == -1 || r == 0)
return EOF;
s->pos = f->pos;
s->sz = r;
}
int ch = (unsigned char) s->buf[f->pos - s->pos];
++f->pos;
return ch;
}
Before she can run and debug this code, Mary is led “to an emergency of feeling that ... results in a poem.” She’ll return to CS61 and fix her implementation soon, but in the meantime, let’s answer some questions about it.
QUESTION 8A. True or false: Mary’s cache is a direct-mapped cache.
True
QUESTION 8B. What changes to Mary’s code could change your answer to Question 5A? Circle all that apply.
- New code for
find_slot
(keepingio61_readc
the same) - New code in
io61_readc
(keepingfind_slot
the same) - New code in
io61_readc
and new code forfind_slot
- None of the above
#2 and #3. #1 is NOT a valid answer
QUESTION 8C. Which problems would occur when Mary’s code was used to sequentially read a seekable file of size 2MiB (2×220 = 2097152 bytes) one character at a time? Circle all that apply.
- Excessive CPU usage (>10x stdio)
- Many system calls to read data (>10x stdio)
- Incorrect data (byte x read at a position where the file has byte y≠x)
- Too much data
- Too little data
- Crash/undefined behavior
- None of the above
#2 only
QUESTION 8D. Which of these new implementations for find_slot
would fix at least one of these problems with reading sequential files?
Circle all that apply.
return (off * 2654435761) % NSLOTS; /* integer hash function from Stack Overflow */
return (off / SLOTSIZ) % NSLOTS;
return off & (NSLOTS - 1);
return 0;
return (off >> 12) & 0x3F;
- None of the above
#2, #4, #5
9. Memory errors
The following function constructs and returns a lower-triangular matrix of size N. The elements are random 2-dimensional points in the unit square. The matrix is represented as an array of pointers to arrays.
typedef struct point2 {
double d[2];
} point2;
typedef point2* point2_vector;
point2_vector* make_random_lt_matrix(size_t N) {
point2_vector* m = (point2_vector *) malloc(sizeof(point2_vector) * N);
for (size_t i = 0; i < N; ++i) {
m[i] = (point2*) malloc(sizeof(point2) * (i + 1)); /* LINE A */
for (size_t j = 0; j <= i; ++j)
for (int d = 0; d < 2; ++d)
m[i][j].d[d] = drand48(); /* LINE B */
}
return m;
}
This code is running on an x86 machine (size_t
is 32 bits). You may
assume that the machine has enough free physical memory and the process
has enough available virtual address space to satisfy any malloc()
call.
QUESTION 9A. Give a value of N so that, while
make_random_lt_matrix(N)
is running, no malloc()
returns NULL
, but
a memory error (such as a null pointer dereference or an out-of-bounds
dereference) happens on Line A. The memory error should happen
specifically when i == 1
.
(This problem is probably easier when you write your answer in hexadecimal.)
We are asked to produce a value of N so that no memory error happens
on Line A when i == 0
, but a memory error does happen when i == 1
.
So reason that through. What memory errors could happen on Line A if
malloc()
returns non-NULL
? There’s only one memory operation, namely
the dereference m[i]
. Perhaps this dereference is out of bounds.
If no memory error happens when i == 0
, then a m[0]
dereference must
not cause a memory error. So the m
object must contain at least 4
bytes. But a memory error does happen on Line A when i == 1
. So the
m
object must contain less than 8 bytes. How many bytes were
allocated for m
?
sizeof(point2_vector) * N == sizeof(point2 *) * N == 4 * N
. So we
have:
(4 * N)
≥ 4(4 * N)
< 8
It seems like the only possible answer is N == 1
. But no, this doesn’t
cause a memory error, because the loop body would never be executed with
i == 1
!
The key insight is that the multiplications above use 32-bit unsigned
computer arithmetic. Let’s write N
as X + 1
. Then these inequalities
become:
- 4 ≤ (
4 * (X + 1)
=4 * X + 4
) < 8 - 0 ≤
(4 * X)
< 4
(Multiplication distributes over addition in computer arithmetic.) What
values of X
satisfy this inequality? It might be easier to see if we
remember that multiplication by powers of two is equivalent to shifting:
- 0 ≤
(X << 2)
< 4
The key insight is that this shift eliminates the top two bits of X
.
There are exactly four values for X
that work: 0
, 0x40000000
,
0x80000000
, and 0xC0000000
. For any of these, 4 * X == 0
in 32-bit
computer arithmetic, because 4×X
= 0 (mod 232) in normal
arithmetic.
Plugging X
back in to N
, we see that
N ∈ {0x40000001, 0x80000001, 0xC0000001}
. These are the only values
that work.
Partial credit was awarded for values that acknowledged the possibility of overflow.
QUESTION 9B. Give a value of N so that no malloc()
returns
NULL
, and no memory error happens on Line A, but a memory error does
happen on Line B.
If no memory error happens on Line A, then N
< 230
(otherwise overflow would happen as seen above). But a memory error does
happen on Line B. Line B dereferences m[i][j]
, for 0 ≤ j
≤ i
; so
how big is m[i]
? It was allocated on Line A with size
sizeof(point2) * (i + 1) == 2 * sizeof(double) * (i + 1) == 16 * (i + 1)
.
If i + 1
≥ 232 / 16 = 228, this multiplication
will overflow. Since i < N
, we can finally reason that any N
greater
than or equal to 228 = 0x10000000
and less than
230 = 0x40000000
will cause the required memory error.
10. Caches and reference strings
QUESTION 10A. True or false: A direct-mapped cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.
False. Direct-mapped caches can have conflict misses.
QUESTION 10B. True or false: A fully-associative cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.
True
Consider the following 5 reference strings.
Name | String |
---|---|
α | 1 |
β | 1, 2 |
γ | 1, 2, 3, 4, 5 |
δ | 2, 4 |
ε | 5, 2, 4, 2 |
QUESTION 10C. Which of the strings might indicate a sequential access pattern? Circle all that apply.
α | β | γ | δ | ε | None of these |
(α), β, γ
QUESTION 10D. Which of the strings might indicate a strided access pattern with stride >1? Circle all that apply.
α | β | γ | δ | ε | None of these |
(α), δ
One very clever person pointed out that β and γ could also represent large strides: for example, consider a file with 10 bytes accessed with stride 11!
The remaining questions concern concatenated permutations of these five strings. For example, the permutation αβγδε refers to this reference string:
1, 1, 2, 1, 2, 3, 4, 5, 2, 4, 5, 2, 4, 2. |
We pass such permutations through an initially-empty, fully-associative cache with 3 slots, and observe the numbers of hits.
QUESTION 10E. How many cold misses might a permutation observe? Circle all that apply.
0 | 1 | 2 | 3 | 4 | 5 | Some other number |
5. The first time a reference string address is encountered, it must cause a cold miss.
Under LRU eviction, the permutation αβεγδ observes 5 hits as follows. (We annotate each access with “h” for hit or “m” for miss.)
1m; 1h, 2m; 5m, 2h, 4m, 2h; 1m, 2h, 3m, 4m, 5m; 2m, 4h. |
QUESTION 10F. How many hits does this permutation observe under FIFO eviction?
4 hits.
QUESTION 10G. Give a permutation that will observe 8 hits under LRU eviction, which is the maximum for any permutation. There are several possible answers. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 7 hits, etc.)
The following four permutations observe 8 hits under LRU: αβγδε, αβγεδ, βαγδε, βαγεδ. 28 permutations observe 7 hits; 25 observe 6 hits; and 38 observe 5 hits.
QUESTION 10H. Give a permutation that will observe 2 hits under LRU eviction, which is the minimum for any permutation. There is one unique answer. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 3 hits, etc.)
δαεγβ. 4 permutations observe 3 hits and 20 observe 4 hits.
11. Git
Edward Snowden is working on a CS61 problem set and he has some git questions.
QUESTION 11A. The CS61 staff has released some new code. Which commands will help Edward get the code from code.seas.harvard.edu into his repository? Circle all that apply.
git commit
git add
git push
git pull
#4
5pts; 4pts for #4 plus others; 3pts for #3 (assume brainfart)
QUESTION 11B. Edward has made some changes to his code. He hasn’t run git since making the changes. He wants to upload his latest version to code.seas.harvard.edu. Put the following git commands in an order that will accomplish this goal. You won’t necessarily use every command. You may add flags to a command (but you don’t have to). If you add flags, tell us what they are.
git commit
git add
git push
git pull
#2, #1, #3
OR: #1 -a, #3
5pts; 4pts if they say pull instead of push (brainfart), or they commit without adding, or they reorder add & commit; 3pts for 2 of those mistakes at once
Edward Snowden’s partner, Edward Norton, has been working on the problem set also. They’ve been working independently.
At midnight on October 10, here’s how things stood. The git log
for
the partners’ shared code.seas.harvard.edu repository looked like this.
The committer is listed in (parentheses).
52d44ee Pset release. (kohler)
The git log
for Snowden’s local repository:
3246d07 Save Greenwald's phone number (snowden)
8633fbd Start work on a direct-mapped cache (snowden)
52d44ee Pset release. (kohler)
The git log
for Norton’s local repository:
81f952e try mmap (norton)
52d44ee Pset release. (kohler)
At noon on October 11, their shared code.seas.harvard.edu repository has this log:
d446e60 Increase cache size (snowden)
b677e85 use mmap on mmappable files (norton)
b46cfda Merge branch 'master' of code.seas.harvard.edu:~TheTrueHOOHA/cs61/TheTrueHOOHAs-cs61-psets.git
(norton)
81f952e try mmap (norton)
3246d07 Save Greenwald's phone number (snowden)
8633fbd Start work on a direct-mapped cache (snowden)
52d44ee Pset release. (kohler)
QUESTION 11C. Give an order for these commands that could have produced that log starting from the midnight October 10 state. You might not use every command, and you might use some commands more than once. Sample (incorrect) answer: “1 4 4 5 2.”
- snowden:
git commit -a
- snowden:
git push
- snowden:
git pull
- norton:
git commit -a
- norton:
git push
- norton:
git pull
- #2 (snowden push)
- [#5 (norton push—OPTIONAL; this push would fail)]
- #6 (norton pull) (We know that Snowden pushed first, and Norton pulled before pushing, because Norton committed the merge) [CIRCLE FOR 1D]
- [#4 (norton commit—OPTIONAL for the merge commit; the merge commit will happen automatically if there are no conflicts] [ALLOW CIRCLE FOR 1D]
- #4 (norton commit for b677e85)
- #5 (norton push)
- #3 (snowden pull—snowden pulls before committing because there is no merge)
- #1 (snowden commit for d446e60)
- #2 (snowden push)
QUESTION 11D. In your answer to Question 11C, circle the step(s) where there might have been a merge conflict.
See above
12. Debugging
QUESTION 12A. Match each tool or technique with a debugging situation for which it is well suited. Produce the best overall match that uses each situation exactly once.
1. strace | A. Investigating segmentation faults |
2. gdb | B. Finding memory leaks |
3. valgrind --tool=memcheck | C. Checking your assumptions and verifying invariants |
4. printf statements | D. Discovering I/O patterns |
5. assert | E. Displaying program state |
1—D, 2—A, 3—B, 4—E, 5—C
13. Processor cache
The git version control system is based on commit hashes, which are 160-bit (20-byte) hash values used to identify commits. In this problem you’ll consider the processor cache behavior of several versions of a “grading server” that maps commits to grades. Here’s the first version:
typedef struct commit_info {
char hash[20];
int grade[11];
} commit_info;
commit_info* commits;
size_t N;
int get_grade1(const char* hash, int pset) {
for (size_t i = 0; i != N; ++i)
if (memcmp(commits[i].hash, hash, 20) == 0)
return commits[i].grade[pset];
return -1;
}
We will ask questions about the average number of cache lines accessed
by variants of get_grade(hash, pset)
. You should make the following
assumptions:
- The
hash
argument is uniformly drawn from the set of known commits. That is, the probability thathash
equals the ith commit’s hash is 1/N
. - Only count cache lines accessible via
commits
. Don’t worry about cache lines used for local variables, for parameters, for global variables, or for instructions. For instance, do not count thehash
argument or the global-data cache line that stores thecommits
variable itself. - Every object is 64-byte aligned, and no two objects share the same cache line.
- Commit hashes are mathematically indistinguishable from random numbers. Thus, the probability that two different hashes have the same initial k bits equals 1/2k.
- Fully correct answers would involve ceiling functions, but you don’t need to include them.
QUESTION 13A. What is the expected number of cache lines accessed by
get_grade1
, in terms of N
?
Each commit_info object is on its own cache line, and we will examine 1/2 of the objects on average, so the answer is ⌈N/2⌉. (Reminder: ceilings not required)
The second version:
typedef struct commit_info {
char hash[20];
int grade[11];
} commit_info;
commit_info** commits;
size_t N;
int get_grade2(const char hash[20], int pset) {
for (size_t i = 0; i != N; ++i)
if (memcmp(commits[i]->hash, hash, 20) == 0)
return commits[i]->grade[pset];
return -1;
}
QUESTION 13B. What is the expected number of cache lines accessed by
get_grade2
, in terms of N
?
This still examines N/2 commit_info objects. But in addition it examines cache lines to evaluate the POINTERS to those objects. There are 16 such pointers per cache line (16×4=64), and we examine N/2 pointers, for N/16/2 = N/32 additional cache lines. Thus ⌈N/2⌉+⌈N/32⌉ ≅ 17N/32.
The third version:
typedef struct commit_info {
char hash[20];
int grade[11];
} commit_info;
typedef struct commit_hint {
char hint[4];
commit_info* commit;
} commit_hint;
commit_hint* commits;
size_t N;
int get_grade3(const char* hash, int pset) {
for (size_t i = 0; i != N; ++i)
if (memcmp(commits[i].hint, hash, 4) == 0
&& memcmp(commits[i].commit->hash, hash, 20) == 0)
return commits[i].commit->grade[pset];
return -1;
}
QUESTION 13C. What is the expected number of cache lines accessed by
get_grade3
, in terms of N
? (You may assume that N
≤2000.)
The assumption that N≤2000 means we’re exceedingly unlikely to encounter a hint collision (i.e. a commit with the same hint, but different commit value). That means that we will examine N/2 commit_hint objects but ONLY ONE commit_info object. commit_hint objects are 8B big, so 8 hints/cache line: we examine N/8/2 = N/16 cache lines for hint objects, plus one for the info. ⌈N/16⌉ + 1.
The fourth version is actually a hash table.
typedef struct commit_info {
char hash[20];
int grade[11];
} commit_info;
commit_info** commits;
size_t commits_hashsize;
int get_grade4(const char* hash, int pset) {
// choose initial bucket
size_t bucket;
memcpy(&bucket, hash, sizeof(bucket));
bucket = bucket % commits_hashsize;
// search for the commit starting from that bucket
while (commits[bucket] != NULL) {
if (memcmp(commits[bucket]->hash, hash, 20) == 0)
return commits[bucket]->grade[pset];
bucket = (bucket + 1) % commits_hashsize;
}
return -1;
}
QUESTION 13D. Assume that a call to get_grade4
encounters C
expected hash collisions (i.e., examines C
buckets before finding the
bucket that actually contains hash
). What is the expected number of
cache lines accessed by get_grade4
, in terms of N
and C
?
For commit_info objects, the lookup will access C cache lines, for the collisions, plus 1, for the successful lookup. But we must also consider the commits[bucket] pointers. We will examine at least 1 cache line for the successful bucket. The C collisions that happen before that will access C buckets. But those buckets might be divided among multiple cache lines; for instance, if C=1, 2 buckets are accessed, but if the first bucket=15, those buckets will be divided among 2 cache lines. The correct formula for buckets, including the final lookup, is 1 + C/16. Thus the total lookup will examine 2 + C + C/16 cache lines on average.
14. Assembly
Here is some x86 assembly code.
f:
movl a, %eax
movl b, %edx
andl $255, %edx
subl %edx, %eax
movl %eax, a
ret
QUESTION 14A. Write valid C code that could have compiled into this
assembly (i.e., write a C definition of function f
), given the global
variable declarations “extern unsigned a, b;
.” Your C code should
compile without warnings. REMINDER: You are not permitted to run a C
compiler, except for the C compiler that is your brain.
Many answers:
void f(void) {
a -= b & 255;
}
void f(void) {
a += -(b % 256);
}
unsigned f(void) {
a = a - b % 0x100;
return a;
}
unsigned f(void) {
a -= (unsigned char) b; /* NB extra credit */
return a;
}
char* f(int x, int y, int z[1000]) {
a -= (unsigned char) b;
return (char*) a;
}
QUESTION 14B. Write different valid, warning-free C code that could have compiled into that assembly. This version should contain different operators than your first version. (For extra credit, use only one operator.)
QUESTION 14C. Again, write different valid, warning-free C code
that could have compiled into that assembly. In this version, f
should
have a different type than in your first version.