2017 Midterm Solutions

General overview

Assume a CS61 VM running the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Sizes and alignments (10 points)

Here’s a test struct with n members. Assume an x86-64 machine, where each Ti either is a basic x86-64 type (e.g., int, char, double) or is a type derived from such types (e.g., arrays, structs, pointers, unions, possibly recursively), and assume that ai≤8 for all i.

struct test {
    T1 m1;      // sizeof(T1) == s1, __alignof__(T1) == a1
    T2 m2;      // sizeof(T2) == s2, __alignof__(T2) == a2
    ...
    Tn mn;      // sizeof(Tn) == sn, __alignof__(Tn) == an
};

In these questions, you will compare this struct with other structs that have the same members, but in other orders.

QUESTION 1A. True or false: The size of struct test is minimized when its members are sorted by size. In other words, if s1≤s2≤…≤sn, then sizeof(struct test) is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample (i.e., concrete types for T1, …, Tn that do not minimize sizeof(struct test)).

False, because some data types can have large sizes but small alignments. Consider T1 = char, T2 = int, T3 = char[5]. On x86-64 this has size 16 (T1 = 1 byte; 3 padding bytes; T2 = 4 bytes; T3 = 5 bytes; and finally 3 padding bytes so the result has size equal to a multiple of the alignment, which is 4), but the order T3, T1, T2 would have size 12.

QUESTION 1B. True or false: The size of struct test is minimized when its members are sorted by alignment. In other words, if a1≤a2≤…≤an, then sizeof(struct test) is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample.

True. Padding is added only between objects with different alignments, and the amount of padding added is always smaller than the alignment of the second object. (As a special case, padding is added at the end of the struct to ensure the struct’s size is a multiple of its alignment, but this can be modeled in the same way.) Sorting by alignment therefore minimizes padding.

QUESTION 1C. True or false: The alignment of struct test is minimized when its members are sorted in increasing order by alignment. In other words, if a1≤a2≤…≤an, then __alignof__(struct test) is less than or equal to the struct alignment for any other member order.

If true, briefly explain your answer; if false, give a counterexample.

True. It’s all the same; alignment is max alignment of every component, and is independent of order.

QUESTION 1D. What is the maximum number of bytes of padding that struct test could contain for a given n? The answer will be a pretty simple formula involving n. (Remember that ai≤8 for all i.)

Alternating char and long gives the most padding, which is 7*(n/2) when n is even and 7*(n+1)/2 otherwise.

QUESTION 1E. What is the minimum number of bytes of padding that struct test could contain for a given n?

0 :)

2. Undefined behavior (10 points)

QUESTION 2A. Sometimes a conforming C compiler can assume that a + 1 \> a, and sometimes it can’t. For each type below, consider this expression:

a + (int) 1 > a

and say whether the compiler:

Must reject the expression as a type error.
May assume that the expression is true (that a + (int) 1 \> a for all a).
Must not assume that the expression is true.

int a
unsigned a
char\* a
unsigned char a
struct {int m;} a

May assume, because signed integer overflow (INT_MAX + 1) is undefined behavior.
Must not assume, because UINT_MAX + 1 == 0 without causing undefined behavior.
May assume, because pointer arithmetic overflow is undefined behavior.
May assume, because the maximum unsigned char is 255, and 255 + (int) 1 == (int) 256. This is called integer promotion.
Must reject

QUESTION 2B. The following code checks its arguments for sanity, but not well: each check can cause undefined behavior.

void sanity_check(int* array, size_t array_size, int* ptr_into_array) {
    if (array + array_size < array) {
`         fprintf(stderr, "`array` is so big that it wraps around!\n"); `
        abort();
    }
    if (ptr_into_array < array || ptr_into_array > array + array_size) {
`         fprintf(stderr, "`ptr_into_array` doesn’t point into the array!\n"); `
        abort();
    }
    ...

Rewrite these checks to avoid all undefined behavior. You will likely add one or more casts to uintptr_t. For full credit, write each check as a single comparison (no && or ||, even though the current ptr_into_array check uses ||).


`array_size` check:
`ptr_into_array` check:

array_size`: `(uintptr_t) array + 4 * array_size < (uintptr_t) array

ptr_into_array:

(uintptr_t) ptr_into_array - (uintptr_t) array > 4 * array_size

where 4 is shorthand for sizeof(int).

QUESTION 2C. In Lecture datarep4, we discussed several ways to tell if a signed integer x is negative. One of them was the following:

int isnegative = (x & (1UL << (sizeof(x) * CHAR_BIT))) != 0;

But this is incorrect: it has undefined behavior. Correct it by adding two characters.

(x & (1UL << (sizeof(x) * CHAR_BIT `<b>`- 1`</b>`))) != 0

3. Memory errors and garbage collection (10 points)

Recall that a conservative garbage collector is a program that can automatically free dynamically-allocated memory by detecting when that memory is no longer referenced. Such a GC works by scanning memory for currently-referenced pointers, starting from stack and global memory, and recursing over each referenced object until all referenced memory has been scanned. We built a conservative garbage collector in lecture datarep6.

QUESTION 3A. An application program that uses conservative GC, and does not call free directly, will avoid certain errors and undefined behaviors. Which of the following errors are avoided? List all that apply.

Use-after-free
Double free
Signed integer overflow
Boundary write error
Unaligned access

1, 2. Only the errors that relate to freeing memory are solved.

QUESTION 3B. Write a C program that leaks unbounded memory without GC, but does not do so with GC. You should need less than 5 lines. (Leaking “unbounded” memory means the program will exhaust the memory capacity of any machine on which it runs.)

while (1) {
    (void) malloc(1);
}

QUESTION 3C. Not every valid C program works with a conservative GC, because the C abstract machine allows a program to manipulate pointers in strange ways. Which of the following pointer manipulations might cause the conservative GC from class to inappropriately free a memory allocation? List all that apply.

Storing the pointer in a uintptr_t variable
Writing the pointer to a disk file and reading it back later
Using the least-significant bit of the pointer to store a flag: int* set_ptrflag(int* x, int flagval) { return (int*) ((uintptr_t) x | (flagval ? 1 : 0)); } int get_ptrflag(int* x) { return (uintptr_t) x & 1; } int deref_ptrflag(int* x) { return ((int) ((uintptr_t) x & ~1UL)); }
Storing the pointer in textual form: void save_ptr(char buf[100], void* p) { sprintf(buf, "%p", p); } void* restore_ptr(const char buf[100]) { void* p; sscanf(buf, "%p", &p); return p; }

(…option 5 on next page…)
Splitting the pointer into two parts and storing the parts in an array: typedef union { unsigned long ival; unsigned arr[2]; } value; value save_ptr(void* p) { value v; v.arr[0] = (uintptr_t) p & 0xFFFFFFFFUL; v.arr[1] = ((uintptr_t) p / 4294967296UL) & 0xFFFFFFFFUL; return v; }

2, 4 will obscure the pointer value from GC, thereby leading to a possible inappropriate free.

1 (storing the pointer in a uintptr_t variable) will not obscure the

pointer value because conservative GC is independent of type, and the in-memory representation of a uintptr_t is the same as the in-memory representation of the pointer.

3 (the pointer flag) will not obscure the pointer value because our

conservative GC looks for pointers into objects, so if the flag is set, the object will still be found. Also, since malloced pointers are maximally aligned, we will never have a returned pointer from malloc with the lower bit set.

5 (the split pointer) will not obscure the pointer value because x86-64 is

little-endian. The in-memory representation of value will be the same as the in-memory representation of p.

4. Bitwise (6 points)

QUESTION 4A. Consider this C fragment:

uintptr_t x = ...;
uintptr_t r = 0;
if (a < b) {
    r = x;
}

Or, shorter:

uintptr_t r = a < b ? x : 0;

Write a single expression that evaluates to the same value, but that does not use the conditional ?: operator. You will use the fact that a \< b always equals 0 or 1. For full credit, do not use expensive operators (multiply, divide, modulus).

For partial credit, (a \< b) \* x works. For full credit:

-(uintptr_t) (a < b) & x

because -(uintptr_t) (a \< b) equals (uintptr_t) 0b111111…1 iff a \< b.

QUESTION 4B. This function returns one more than the index of the least-significant 1 bit in its argument, or 0 if its argument is zero.

int ffs(unsigned long x) {
    for (int i = 0; i < sizeof(x) * CHAR_BIT; ++i) {
        if (x & (1UL << i)) {
            return i + 1;
        }
    }
    return 0;
}

This function runs in O(B) time, where B is the number of bits in an unsigned long. Write a version of ffs that runs instead in O(log B) time.

The idea we were looking for was binary search over the bits of x. Here’s a written-out version:

int ffs(unsigned long x) {
    if (!x) {
        return 0;
    }
    int ans = 1;
    if (x & 0xFFFFFFFF00000000UL) {
        ans += 32; x >>= 32;
    }
    if (x & 0xFFFF0000) {
        ans += 16; x >>= 16;
    }
    if (x & 0xFF00) {
        ans += 8; x >>= 8;
    }
    if (x & 0xF0) {
        ans += 4; x >>= 4;
    }
    if (x & 0xC) {
        ans += 2; x >>= 2;
    }
    return ans + (x & 0x2 ? 1 : 0);
}

We saw many similar answers.

5. Golden Baron (12 points)

A very rich person has designed a new x86-64-based computer, the Golden Baron Supercomputer 9000, and is paying you handsomely to write a C compiler for it. There’s just one problem. This person, like many very rich people, is dumb, and on their computer, odd-numbered memory addresses don’t work for data. When data is loaded into a general-purpose register from an odd-numbered address, the value read is zero. For example, consider the following instructions:

movl $0x01020304, a(%rip)
movl a(%rip), %eax

(where the address of a is even). Executed on true x86-64, %rax will hold the value 0x01020304; on Golden Baron, %rax will hold 0x00020004.

But it is still possible to write a correct C compiler for this ungodly hardware—you just have to work around the bad memory with code. You plan to use two bytes of Golden Baron memory for every one byte of normal x86-64 memory. For instance, an array int a[2] = {1, 0x0a0b0c0d}; would be stored in 16 bytes of memory, like so:

01 00 00 00 00 00 00 00 0d 00 0c 00 0b 00 0a 00

Pointer arithmetic, and moving multi-byte values to and from registers, must account for the zero bytes that alternate with meaningful bytes. So to read the correct value for a[2], the compiler must arrange to read the bytes at addresses A+8, A+10, A+12, and A+14, where A is the address of the first byte in a.

QUESTION 5A. What should printf("%zu\n", sizeof(char)) print on Golden Baron?

1. The answer to this question is determined by the C abstract machine, which requires that sizeof(char) == (size_t) 1 on every machine.

QUESTION 5B. This function

int f(signed char* c, size_t i) {
    return c[i];
}

can compile to two instructions on x86-64, including retq. It can also compile to two instructions on Golden Baron. (We’re assuming that memory used for Golden Baron instructions works normally.) What are those instructions?

movsbl (%rdi,%rsi,2), %eax
retq

There are other possibilities too, such as movsbq.

QUESTION 5C. This function

int g(int* a, size_t i) {
    return a[i];
}

can compile to two instructions on x86-64, but Golden Baron requires more work. Write the Golden Baron translation of this function in x86-64 assembly language. For partial credit, write C code that, executed on x86-64, would return the correct value from a Golden Baron-formatted array.

movzbl (%rdi,%rsi,8), %eax
movzbl 2(%rdi,%rsi,8), %ecx
shll $8, %ecx
orl %ecx, %eax
movzbl 4(%rdi,%rsi,8), %ecx
shlq $16, %ecx
orl %ecx, %eax
movzbl 6(%rdi,%rsi,8), %ecx
shlq $24, %ecx
orl %ecx, %eax
retq

Or:

movq (%rdi,%rsi,8), %rcx
movq %rcx, %rax
andq $255, %rax

shrq $8, %rcx
movq %rcx, %rdx
andq $0xff00, %rdx
orl %edx, %eax

shrq $8, %rcx
movq %rcx, %rdx
andq $0xff0000, %rdx
orl %edx, %eax

shrq $8, %rcx
movq %rcx, %rdx
andq $0xff000000, %rdx
orl %edx, %eax

retq

QUESTION 5D. The Golden Baron’s x86-64 processor actually supports a secret instruction, swizzleq SRC, REG, which rearranges the nybbles (the hexadecimal digits—the aligned 4-bit slices) of the destination register REG based on the source argument SRC. Here’s some examples. Assuming that %rax holds the value 0x0123456789ABCDEF, the following swizzleq instructions leave the indicated results in %rax:

swizzleq \$0, %rax -> 0xFFFFFFFFFFFFFFFF (the contents of nybble 0 [bits 0-3, inclusive], are repeated into every nybble)
swizzleq \$0xFEDCBA9876543210, %rax -> 0x0123456789ABCDEF (each nybble is mapped to its current value: nybble 0 [bits 0-3] is placed in nybble 0 [bits 0-3], nybble 1 in nybble 1, and so forth)
swizzleq \$0x0123456701234567, %rax -> 0xFEDCBA98FEDCBA98 (nybble 0 [bits 0-3] is placed in nybbles 7 and 15 [bits 28-31 and 60-63]; nybble 1 [bits 4-7] is placed in nybbles 6 and 14 [bits 24-27 and 56-59]; etc.)
swizzleq \$0xEFCDAB8967452301, %rax -> 0x1032547698BADCFE (the nybbles of each byte are exchanged)

Use swizzleq to shorten your answer for Part C.

movzbq (%rdi,%rsi,8), %rax
swizzleq $0x22222222dc985410, %rax
retq

6. Instruction behavior (10 points)

QUESTION 6A. Name three different x86-64 instructions that always modify the stack pointer, no matter their arguments (instruction names only; suffixes don’t count, so movl and movq are the same instruction name).

push, pop, call, ret

QUESTION 6B. Name three different x86-64 instructions that sometimes modify the stack pointer, depending on their arguments.

mov, add, sub, or, and, …

QUESTION 6C. Name three different x86-64 instructions that never modify the stack pointer, no matter their arguments.

jmp, jne, je, jWHATEVER, cmp, test, nop, many others

QUESTION 6D. List three different instructions, including arguments, that if placed immediately before a retq instruction that ends a function, will never change the function’s behavior. The instructions should have different names. No funny business: assume the function was compiled from valid C, that relative jumps are fixed up, and that, for example, it doesn’t access its own code.

Many examples:

ret :)
jmp [next instruction]
test (any register), (any register)
cmp (any register), (any register)
nop
movs & arithmetic instructions that involve caller-saved registers other than %rax
arithmetic instructions that do not modify %rax, such as addq $0, %rax

7. Calling convention (10 points)

The following questions concern valid C functions compiled using the normal x86-64 calling convention. True or false?

QUESTION 7A. If the function’s instructions do not save and restore any registers, then the C function did not call any other function.

False for two reasons. (1) If this function doesn't use any callee-saved registers, it doesn't need to explicitly save & restore anything. (2) Tail call elimination.

QUESTION 7B. If the function’s instructions do not change the stack pointer, then the function’s instructions do not contain a call instruction.

True because of stack alignment.

QUESTION 7C. If the function’s instructions do not change the stack pointer, then the C function did not call any other function. Explain your answer briefly.

False because of tail call elimination.

QUESTION 7D. If the function’s instructions do not modify the %rax register, then the C function must return void.

False; the function could return the result of calling another function.

QUESTION 7E. If the function’s instructions store a local variable on the stack, then that variable’s address will be less than the function’s initial stack pointer.

True.

8. I/O traces (10 points)

QUESTION 8A. Which of the following programs cannot be distinguished by the output of the strace utility? List all that apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t be distinguished, and D & E can’t be distinguished, but the groups can be distinguished from each other), list them all.

Sequential byte writes using stdio
Sequential byte writes using system calls
Sequential byte writes using system calls and O_SYNC
Sequential block writes using stdio and block size 2
Sequential block writes using system calls and block size 2
Sequential block writes using system calls and O_SYNC and block size 2
Sequential block writes using stdio and block size 4096
Sequential block writes using system calls and block size 4096
Sequential block writes using system calls and O_SYNC and block size 4096

1, 4, 7, 8 can’t be distinguished.

If you don’t consider the open system call, which lets you distinguish between the O_SYNC cases and the others, 2&3 can’t be distinguished, 4&5 can’t be distinguished, and 1,4,7,8,9 can’t be distinguished.

QUESTION 8B. Which of the programs in Part A cannot be distinguished using blktrace output? List all that apply.

1, 2, 4, 5, 7, 8 can’t be distinguished. 9 was also considered correct.

QUESTION 8C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.

Application programs can obtain direct read access to the buffer cache
Application programs can obtain direct write access to the disk, bypassing the buffer cache
Other computers can communicate with the disk independently
The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost

2, 3.

QUESTION 8D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.

1 only.

9. Reference strings and eviction (10 points)

QUESTION 9A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.

Sequential access.

QUESTION 9B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.

Prefetching. Batching is an OK answer, but mostly because it involves prefetching when done for reads.

QUESTION 9C. Give a single reference string with the following properties:

There exists a cache size and eviction policy that gives a 70% hit rate for the string.
There exists a cache size and eviction policy that gives a 0% hit rate for the string.

Example: 1231231231. 70% on any 3-slot cache; 0% on a 1-slot cache.

QUESTION 9D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.

FIFO
LRU
Random

Random, FIFO, LRU. Random needs no additional metadata; FIFO can deal with a single integer for the whole cache, pointing to the next index to use; LRU nees a least-recently-used time.

10. Cache code (12 points)

Several famous musicians have just started working on CS61 Problem Set 3. They share the following code for their read-only, sequential, single-slot cache:

struct io61_file {
    int fd;
    unsigned char buf[4096];
`     size_t pos;    // position of next character to read in `buf` `
`     size_t sz;     // number of valid characters in `buf` `
};

int io61_readc(io61_file* f) {
    if (f->pos >= f->sz) {
        f->pos = f->sz = 0;
        ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
        if (nr <= 0) {
            f->sz = 0;
            return -1;
        } else {
            f->sz = nr;
        }
    }
    int ch = f->buf[f->pos];
    ++f->pos;
    return ch;
}

But they have different io61_read implementations. Donald (Lambert)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    return read(f->fd, buf, sz);
}

Solange (Knowles)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    for (size_t pos = 0; pos < sz; ++pos, ++buf) {
        *buf = io61_readc(f);
    }
    return sz;
}

Caroline (Shaw)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    if (f->pos >= f->sz) {
        return read(f->fd, buf, sz);
    } else {
        int ch = io61_readc(f);
        if (ch < 0) {
            return 0;
        }
        *buf = ch;
        return io61_read(f, buf + 1, sz - 1) + 1;
    }
}

You are testing each of these musicians’ codes by executing a sequence of io61_readc and/or io61_read calls on an input file and printing the resulting characters to standard output. There are no seeks, and your test programs print until end of file, so your tests’ output should equal the input file’s contents.

You should assume for these questions that no read system call ever returns -1.

QUESTION 10A. Describe an access pattern—that is, a sequence of io61_readc and/or io61_read calls (with lengths)—for which Donald’s code can return incorrect data.

io61_readc`, `io61_read(1)`, …. Any alternation between `readc

and read is a disaster, because data cached by io61_readc should be returned by io61_read, but will not be.

QUESTION 10B. Which of these musicians’ codes can generate an output file with incorrect length?

Solange’s code never returns EOF from io61_read, so it will read too many bytes.

If a program using Donald’s code calls io61_readc once and then switches to io61_read, then they will read too few bytes (bytes in the readc buffer won’t be returned).

Caroline’s code is correct!

For the remaining parts, assume the problem in Part B has been corrected, so that all musicians’ codes generate output files with correct lengths.

QUESTION 10C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.

Solange’s code will outperform Donald’s when io61_read is called with small sizes. Donald’s code will outperform Solange’s when io61_read is called with large sizes.

QUESTION 10D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.

I would change Caroline’s test from if (f-\>pos \>= f-\>sz) to if (f-\>pos \>= f-\>sz && sz \>= 1024) (or 4096, or something similar). This uses the cache when read is called with small sizes, but avoids the extra copy when read is called with large sizes.

11. Feedback (2 points)

QUESTION 11A. What aspects of the course are working well for you? Be brief; any answer but no answer will receive credit.

QUESTION 11B. What aspects of the course are not working well for you? Be brief; any answer but no answer will receive credit.