# 2017 Midterm Solutions

## General overview

Assume a CS61 VM running the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

## 1. Sizes and alignments (10 points)

Here’s a test struct with n members. Assume an x86-64 machine, where each `Ti` either is a basic x86-64 type (e.g., `int`, `char`, `double`) or is a type derived from such types (e.g., arrays, structs, pointers, unions, possibly recursively), and assume that ai≤8 for all i.

```struct test {
T1 m1;      // sizeof(T1) == s1, __alignof__(T1) == a1
T2 m2;      // sizeof(T2) == s2, __alignof__(T2) == a2
...
Tn mn;      // sizeof(Tn) == sn, __alignof__(Tn) == an
};
```

In these questions, you will compare this struct with other structs that have the same members, but in other orders.

QUESTION 1A. True or false: The size of `struct test` is minimized when its members are sorted by size. In other words, if s1s2≤…≤sn, then `sizeof(struct test)` is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample (i.e., concrete types for `T1`, …, `Tn` that do not minimize `sizeof(struct test)`).

False, because some data types can have large sizes but small alignments. Consider T1 = `char`, T2 = `int`, T3 = `char[5]`. On x86-64 this has size 16 (T1 = 1 byte; 3 padding bytes; T2 = 4 bytes; T3 = 5 bytes; and finally 3 padding bytes so the result has size equal to a multiple of the alignment, which is 4), but the order T3, T1, T2 would have size 12.

QUESTION 1B. True or false: The size of `struct test` is minimized when its members are sorted by alignment. In other words, if a1a2≤…≤an, then `sizeof(struct test)` is less than or equal to the struct size for any other member order.

True. Padding is added only between objects with different alignments, and the amount of padding added is always smaller than the alignment of the second object. (As a special case, padding is added at the end of the struct to ensure the struct’s size is a multiple of its alignment, but this can be modeled in the same way.) Sorting by alignment therefore minimizes padding.

QUESTION 1C. True or false: The alignment of `struct test` is minimized when its members are sorted in increasing order by alignment. In other words, if a1a2≤…≤an, then `__alignof__(struct test)` is less than or equal to the struct alignment for any other member order.

True. It’s all the same; alignment is max alignment of every component, and is independent of order.

QUESTION 1D. What is the maximum number of bytes of padding that `struct test` could contain for a given n? The answer will be a pretty simple formula involving n. (Remember that ai≤8 for all i.)

Alternating `char` and `long` gives the most padding, which is 7*(n/2) when n is even and 7*(n+1)/2 otherwise.

QUESTION 1E. What is the minimum number of bytes of padding that `struct test` could contain for a given n?

0 :)

## 2. Undefined behavior (10 points)

QUESTION 2A. Sometimes a conforming C compiler can assume that `a + 1 > a`, and sometimes it can’t. For each type below, consider this expression:

```a + (int) 1 > a
```

and say whether the compiler:

• Must reject the expression as a type error.
• May assume that the expression is true (that `a + (int) 1 > a` for all `a`).
• Must not assume that the expression is true.
1. `int a`
2. `unsigned a`
3. `char* a`
4. `unsigned char a`
5. `struct {int m;} a`
1. May assume, because signed integer overflow (`INT_MAX + 1`) is undefined behavior.
2. Must not assume, because `UINT_MAX + 1 == 0` without causing undefined behavior.
3. May assume, because pointer arithmetic overflow is undefined behavior.
4. May assume, because the maximum `unsigned char` is 255, and `255 + (int) 1 == (int) 256`. This is called integer promotion.
5. Must reject

QUESTION 2B. The following code checks its arguments for sanity, but not well: each check can cause undefined behavior.

```void sanity_check(int* array, size_t array_size, int* ptr_into_array) {
if (array + array_size < array) {
fprintf(stderr, "`array` is so big that it wraps around!\n");
abort();
}
if (ptr_into_array < array || ptr_into_array > array + array_size) {
fprintf(stderr, "`ptr_into_array` doesn’t point into the array!\n");
abort();
}
...
```

Rewrite these checks to avoid all undefined behavior. You will likely add one or more casts to `uintptr_t`. For full credit, write each check as a single comparison (no `&&` or `||`, even though the current `ptr_into_array` check uses `||`).

 `array_size` check: `ptr_into_array` check:

`array_size`: `(uintptr_t) array + 4 * array_size < (uintptr_t) array`

`ptr_into_array`: `(uintptr_t) ptr_into_array - (uintptr_t) array > 4 * array_size`

where 4 is shorthand for `sizeof(int)`.

QUESTION 2C. In Lecture datarep4, we discussed several ways to tell if a signed integer `x` is negative. One of them was the following:

```int isnegative = (x & (1UL << (sizeof(x) * CHAR_BIT))) != 0;
```

But this is incorrect: it has undefined behavior. Correct it by adding two characters.

`(x & (1UL << (sizeof(x) * CHAR_BIT - 1))) != 0`

## 3. Memory errors and garbage collection (10 points)

Recall that a conservative garbage collector is a program that can automatically free dynamically-allocated memory by detecting when that memory is no longer referenced. Such a GC works by scanning memory for currently-referenced pointers, starting from stack and global memory, and recursing over each referenced object until all referenced memory has been scanned. We built a conservative garbage collector in lecture datarep6.

QUESTION 3A. An application program that uses conservative GC, and does not call `free` directly, will avoid certain errors and undefined behaviors. Which of the following errors are avoided? List all that apply.

1. Use-after-free
2. Double free
3. Signed integer overflow
4. Boundary write error
5. Unaligned access
1, 2. Only the errors that relate to freeing memory are solved.

QUESTION 3B. Write a C program that leaks unbounded memory without GC, but does not do so with GC. You should need less than 5 lines. (Leaking “unbounded” memory means the program will exhaust the memory capacity of any machine on which it runs.)

```while (1) {
(void) malloc(1);
}
```

QUESTION 3C. Not every valid C program works with a conservative GC, because the C abstract machine allows a program to manipulate pointers in strange ways. Which of the following pointer manipulations might cause the conservative GC from class to inappropriately free a memory allocation? List all that apply.

1. Storing the pointer in a `uintptr_t` variable
2. Writing the pointer to a disk file and reading it back later
3. Using the least-significant bit of the pointer to store a flag:
```int* set_ptrflag(int* x, int flagval) {
return (int*) ((uintptr_t) x | (flagval ? 1 : 0));
}
int get_ptrflag(int* x) {
return (uintptr_t) x & 1;
}
int deref_ptrflag(int* x) {
return *((int*) ((uintptr_t) x & ~1UL));
}```
4. Storing the pointer in textual form:
```void save_ptr(char buf[100], void* p) {
sprintf(buf, "%p", p);
}
void* restore_ptr(const char buf[100]) {
void* p;
sscanf(buf, "%p", &p);
return p;
}```
(…option 5 on next page…)
5. Splitting the pointer into two parts and storing the parts in an array:
```typedef union {
unsigned long ival;
unsigned arr[2];
} value;
value save_ptr(void* p) {
value v;
v.arr[0] = (uintptr_t) p & 0xFFFFFFFFUL;
v.arr[1] = ((uintptr_t) p / 4294967296UL) & 0xFFFFFFFFUL;
return v;
}```
2, 4 will obscure the pointer value from GC, thereby leading

to a possible inappropriate free.

1. 1 (storing the pointer in a `uintptr_t` variable) will not obscure the

pointer value because conservative GC is independent of type, and the in-memory representation of a `uintptr_t` is the same as the in-memory representation of the pointer.

1. 3 (the pointer flag) will not obscure the pointer value because our

conservative GC looks for pointers into objects, so if the flag is set, the object will still be found. Also, since malloced pointers are maximally aligned, we will never have a returned pointer from `malloc` with the lower bit set.

1. 5 (the split pointer) will not obscure the pointer value because x86-64 is

little-endian. The in-memory representation of `value` will be the same as

the in-memory representation of `p`.

## 4. Bitwise (6 points)

QUESTION 4A. Consider this C fragment:

```uintptr_t x = ...;
uintptr_t r = 0;
if (a < b) {
r = x;
}
```

Or, shorter:

```uintptr_t r = a < b ? x : 0;
```

Write a single expression that evaluates to the same value, but that does not use the conditional ?: operator. You will use the fact that `a < b` always equals 0 or 1. For full credit, do not use expensive operators (multiply, divide, modulus).

For partial credit, `(a < b) * x` works. For full credit:

`-(uintptr_t) (a < b) & x`

because `-(uintptr_t) (a < b)` equals `(uintptr_t) 0b111111…1` iff `a < b`.

QUESTION 4B. This function returns one more than the index of the least-significant 1 bit in its argument, or 0 if its argument is zero.

```int ffs(unsigned long x) {
for (int i = 0; i < sizeof(x) * CHAR_BIT; ++i) {
if (x & (1UL << i)) {
return i + 1;
}
}
return 0;
}
```

This function runs in O(B) time, where B is the number of bits in an `unsigned long`. Write a version of `ffs` that runs instead in O(log B) time.

The idea we were looking for was binary search over the bits of `x`. Here’s a written-out version:
```int ffs(unsigned long x) {
if (!x) {
return 0;
}
int ans = 1;
if (x & 0xFFFFFFFF00000000UL) {
ans += 32; x >>= 32;
}
if (x & 0xFFFF0000) {
ans += 16; x >>= 16;
}
if (x & 0xFF00) {
ans += 8; x >>= 8;
}
if (x & 0xF0) {
ans += 4; x >>= 4;
}
if (x & 0xC) {
ans += 2; x >>= 2;
}
return ans + (x & 0x2 ? 1 : 0);
}
```

## 5. Golden Baron (12 points)

A very rich person has designed a new x86-64-based computer, the Golden Baron Supercomputer 9000, and is paying you handsomely to write a C compiler for it. There’s just one problem. This person, like many very rich people, is dumb, and on their computer, odd-numbered memory addresses don’t work for data. When data is loaded into a general-purpose register from an odd-numbered address, the value read is zero. For example, consider the following instructions:

```movl \$0x01020304, a(%rip)
movl a(%rip), %eax
```

(where the address of `a` is even). Executed on true x86-64, `%rax` will hold the value 0x01020304; on Golden Baron, `%rax` will hold 0x00020004.

But it is still possible to write a correct C compiler for this ungodly hardware—you just have to work around the bad memory with code. You plan to use two bytes of Golden Baron memory for every one byte of normal x86-64 memory. For instance, an array `int a[2] = {1, 0x0a0b0c0d};` would be stored in 16 bytes of memory, like so:

```01 00 00 00 00 00 00 00 0d 00 0c 00 0b 00 0a 00
```

Pointer arithmetic, and moving multi-byte values to and from registers, must account for the zero bytes that alternate with meaningful bytes. So to read the correct value for `a[2]`, the compiler must arrange to read the bytes at addresses `A+8`, `A+10`, `A+12`, and `A+14`, where `A` is the address of the first byte in `a`.

QUESTION 5A. What should `printf("%zu\n", sizeof(char))` print on Golden Baron?

1. The answer to this question is determined by the C abstract machine, which requires that `sizeof(char) == (size_t) 1` on every machine.

QUESTION 5B. This function

```int f(signed char* c, size_t i) {
return c[i];
}
```

can compile to two instructions on x86-64, including `retq`. It can also compile to two instructions on Golden Baron. (We’re assuming that memory used for Golden Baron instructions works normally.) What are those instructions?

```movsbl (%rdi,%rsi,2), %eax
retq
```
There are other possibilities too, such as `movsbq`.

QUESTION 5C. This function

```int g(int* a, size_t i) {
return a[i];
}
```

can compile to two instructions on x86-64, but Golden Baron requires more work. Write the Golden Baron translation of this function in x86-64 assembly language. For partial credit, write C code that, executed on x86-64, would return the correct value from a Golden Baron-formatted array.

```movzbl (%rdi,%rsi,8), %eax
movzbl 2(%rdi,%rsi,8), %ecx
shll \$8, %ecx
orl %ecx, %eax
movzbl 4(%rdi,%rsi,8), %ecx
shlq \$16, %ecx
orl %ecx, %eax
movzbl 6(%rdi,%rsi,8), %ecx
shlq \$24, %ecx
orl %ecx, %eax
retq
```

Or:

```movq (%rdi,%rsi,8), %rcx
movq %rcx, %rax
andq \$255, %rax

shrq \$8, %rcx
movq %rcx, %rdx
andq \$0xff00, %rdx
orl %edx, %eax

shrq \$8, %rcx
movq %rcx, %rdx
andq \$0xff0000, %rdx
orl %edx, %eax

shrq \$8, %rcx
movq %rcx, %rdx
andq \$0xff000000, %rdx
orl %edx, %eax

retq
```

QUESTION 5D. The Golden Baron’s x86-64 processor actually supports a secret instruction, `swizzleq SRC, REG`, which rearranges the nybbles (the hexadecimal digits—the aligned 4-bit slices) of the destination register `REG` based on the source argument `SRC`. Here’s some examples. Assuming that `%rax` holds the value `0x0123456789ABCDEF`, the following `swizzleq` instructions leave the indicated results in `%rax`:

• `swizzleq \$0, %rax` -> `0xFFFFFFFFFFFFFFFF` (the contents of nybble 0 [bits 0-3, inclusive], are repeated into every nybble)
• `swizzleq \$0xFEDCBA9876543210, %rax` -> `0x0123456789ABCDEF` (each nybble is mapped to its current value: nybble 0 [bits 0-3] is placed in nybble 0 [bits 0-3], nybble 1 in nybble 1, and so forth)
• `swizzleq \$0x0123456701234567, %rax` -> `0xFEDCBA98FEDCBA98` (nybble 0 [bits 0-3] is placed in nybbles 7 and 15 [bits 28-31 and 60-63]; nybble 1 [bits 4-7] is placed in nybbles 6 and 14 [bits 24-27 and 56-59]; etc.)
• `swizzleq \$0xEFCDAB8967452301, %rax` -> `0x1032547698BADCFE` (the nybbles of each byte are exchanged)

Use `swizzleq` to shorten your answer for Part C.

```movzbq (%rdi,%rsi,8), %rax
swizzleq \$0x22222222dc985410, %rax
retq
```

## 6. Instruction behavior (10 points)

QUESTION 6A. Name three different x86-64 instructions that always modify the stack pointer, no matter their arguments (instruction names only; suffixes don’t count, so `movl` and `movq` are the same instruction name).

push, pop, call, ret

QUESTION 6B. Name three different x86-64 instructions that sometimes modify the stack pointer, depending on their arguments.

mov, add, sub, or, and, …

QUESTION 6C. Name three different x86-64 instructions that never modify the stack pointer, no matter their arguments.

jmp, jne, je, jWHATEVER, cmp, test, nop, many others

QUESTION 6D. List three different instructions, including arguments, that if placed immediately before a `retq` instruction that ends a function, will never change the function’s behavior. The instructions should have different names. No funny business: assume the function was compiled from valid C, that relative jumps are fixed up, and that, for example, it doesn’t access its own code.

Many examples:
1. ret :)
2. jmp [next instruction]
3. test (any register), (any register)
4. cmp (any register), (any register)
5. nop
6. movs & arithmetic instructions that involve caller-saved registers other than `%rax`
7. arithmetic instructions that do not modify `%rax`, such as `addq \$0, %rax`

## 7. Calling convention (10 points)

The following questions concern valid C functions compiled using the normal x86-64 calling convention. True or false?

QUESTION 7A. If the function’s instructions do not save and restore any registers, then the C function did not call any other function.

False for two reasons. (1) If this function doesn't use any callee-saved registers, it doesn't need to explicitly save & restore anything. (2) Tail call elimination.

QUESTION 7B. If the function’s instructions do not change the stack pointer, then the function’s instructions do not contain a `call` instruction.

True because of stack alignment.

QUESTION 7C. If the function’s instructions do not change the stack pointer, then the C function did not call any other function. Explain your answer briefly.

False because of tail call elimination.

QUESTION 7D. If the function’s instructions do not modify the `%rax` register, then the C function must return `void`.

False; the function could return the result of calling another function.

QUESTION 7E. If the function’s instructions store a local variable on the stack, then that variable’s address will be less than the function’s initial stack pointer.

True.

## 8. I/O traces (10 points)

QUESTION 8A. Which of the following programs cannot be distinguished by the output of the `strace` utility? List all that apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t be distinguished, and D & E can’t be distinguished, but the groups can be distinguished from each other), list them all.

1. Sequential byte writes using stdio
2. Sequential byte writes using system calls
3. Sequential byte writes using system calls and `O_SYNC`
4. Sequential block writes using stdio and block size 2
5. Sequential block writes using system calls and block size 2
6. Sequential block writes using system calls and `O_SYNC` and block size 2
7. Sequential block writes using stdio and block size 4096
8. Sequential block writes using system calls and block size 4096
9. Sequential block writes using system calls and `O_SYNC` and block size 4096

1, 4, 7, 8 can’t be distinguished.

If you don’t consider the `open` system call, which lets you distinguish between the `O_SYNC` cases and the others, 2&3 can’t be distinguished, 4&5 can’t be distinguished, and 1,4,7,8,9 can’t be distinguished.

QUESTION 8B. Which of the programs in Part A cannot be distinguished using `blktrace` output? List all that apply.

1, 2, 4, 5, 7, 8 can’t be distinguished. 9 was also considered correct.

QUESTION 8C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.

2. Application programs can obtain direct write access to the disk, bypassing the buffer cache
3. Other computers can communicate with the disk independently
4. The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost
2, 3.

QUESTION 8D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.

1 only.

## 9. Reference strings and eviction (10 points)

QUESTION 9A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.

Sequential access.

QUESTION 9B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.

Prefetching. Batching is an OK answer, but mostly because it involves prefetching when done for reads.

QUESTION 9C. Give a single reference string with the following properties:

• There exists a cache size and eviction policy that gives a 70% hit rate for the string.
• There exists a cache size and eviction policy that gives a 0% hit rate for the string.
Example: 1231231231. 70% on any 3-slot cache; 0% on a 1-slot cache.

QUESTION 9D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.

1. FIFO
2. LRU
3. Random
Random, FIFO, LRU. Random needs no additional metadata; FIFO can deal with a single integer for the whole cache, pointing to the next index to use; LRU nees a least-recently-used time.

## 10. Cache code (12 points)

Several famous musicians have just started working on CS61 Problem Set 3. They share the following code for their read-only, sequential, single-slot cache:

```struct io61_file {
int fd;
unsigned char buf[4096];
size_t pos;    // position of next character to read in `buf`
size_t sz;     // number of valid characters in `buf`
};

if (f->pos >= f->sz) {
f->pos = f->sz = 0;
ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
if (nr <= 0) {
f->sz = 0;
return -1;
} else {
f->sz = nr;
}
}
int ch = f->buf[f->pos];
++f->pos;
return ch;
}
```

But they have different `io61_read` implementations. Donald (Lambert)’s is:

```ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
}
```

Solange (Knowles)’s is:

```ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
for (size_t pos = 0; pos < sz; ++pos, ++buf) {
}
return sz;
}
```

Caroline (Shaw)’s is:

```ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
if (f->pos >= f->sz) {
} else {
if (ch < 0) {
return 0;
}
*buf = ch;
return io61_read(f, buf + 1, sz - 1) + 1;
}
}
```

You are testing each of these musicians’ codes by executing a sequence of `io61_readc` and/or `io61_read` calls on an input file and printing the resulting characters to standard output. There are no seeks, and your test programs print until end of file, so your tests’ output should equal the input file’s contents.

You should assume for these questions that no `read` system call ever returns -1.

QUESTION 10A. Describe an access pattern—that is, a sequence of `io61_readc` and/or `io61_read` calls (with lengths)—for which Donald’s code can return incorrect data.

`io61_readc`, `io61_read(1)`, …. Any alternation between `readc`

and `read` is a disaster, because data cached by `io61_readc` should be returned by `io61_read`,

but will not be.

QUESTION 10B. Which of these musicians’ codes can generate an output file with incorrect length?

Solange’s code never returns EOF from `io61_read`, so it will read too many bytes.

If a program using Donald’s code calls `io61_readc` once and then switches to `io61_read`, then they will read too few bytes (bytes in the `readc` buffer won’t be returned).

Caroline’s code is correct!

For the remaining parts, assume the problem in Part B has been corrected, so that all musicians’ codes generate output files with correct lengths.

QUESTION 10C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.

Solange’s code will outperform Donald’s when `io61_read` is called with small sizes. Donald’s code will outperform Solange’s when `io61_read` is called with large sizes.

QUESTION 10D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.

I would change Caroline’s test from `if (f->pos >= f->sz)` to `if (f->pos >= f->sz && sz >= 1024)` (or 4096, or something similar). This uses the cache when `read` is called with small sizes, but avoids the extra copy when `read` is called with large sizes.

## 11. Feedback (2 points)

QUESTION 11A. What aspects of the course are working well for you? Be brief; any answer but no answer will receive credit.

QUESTION 11B. What aspects of the course are not working well for you? Be brief; any answer but no answer will receive credit.