2017 Midterm

General overview

Assume a CS61 VM running the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Sizes and alignments (10 points)

Here’s a test struct with n members. Assume an x86-64 machine, where each Ti either is a basic x86-64 type (e.g., int, char, double) or is a type derived from such types (e.g., arrays, structs, pointers, unions, possibly recursively), and assume that ai≤8 for all i.

struct test {
    T1 m1;      // sizeof(T1) == s1, __alignof__(T1) == a1
    T2 m2;      // sizeof(T2) == s2, __alignof__(T2) == a2
    ...
    Tn mn;      // sizeof(Tn) == sn, __alignof__(Tn) == an
};

In these questions, you will compare this struct with other structs that have the same members, but in other orders.

QUESTION 1A. True or false: The size of struct test is minimized when its members are sorted by size. In other words, if s1≤s2≤…≤sn, then sizeof(struct test) is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample (i.e., concrete types for T1, …, Tn that do not minimize sizeof(struct test)).

QUESTION 1B. True or false: The size of struct test is minimized when its members are sorted by alignment. In other words, if a1≤a2≤…≤an, then sizeof(struct test) is less than or equal to the struct size for any other member order.

If true, briefly explain your answer; if false, give a counterexample.

QUESTION 1C. True or false: The alignment of struct test is minimized when its members are sorted in increasing order by alignment. In other words, if a1≤a2≤…≤an, then __alignof__(struct test) is less than or equal to the struct alignment for any other member order.

If true, briefly explain your answer; if false, give a counterexample.

QUESTION 1D. What is the maximum number of bytes of padding that struct test could contain for a given n? The answer will be a pretty simple formula involving n. (Remember that ai≤8 for all i.)

QUESTION 1E. What is the minimum number of bytes of padding that struct test could contain for a given n?

2. Undefined behavior (10 points)

QUESTION 2A. Sometimes a conforming C compiler can assume that a + 1 \> a, and sometimes it can’t. For each type below, consider this expression:

a + (int) 1 > a

and say whether the compiler:

Must reject the expression as a type error.
May assume that the expression is true (that a + (int) 1 \> a for all a).
Must not assume that the expression is true.

int a
unsigned a
char\* a
unsigned char a
struct {int m;} a

QUESTION 2B. The following code checks its arguments for sanity, but not well: each check can cause undefined behavior.

void sanity_check(int* array, size_t array_size, int* ptr_into_array) {
    if (array + array_size < array) {
`         fprintf(stderr, "`array` is so big that it wraps around!\n"); `
        abort();
    }
    if (ptr_into_array < array || ptr_into_array > array + array_size) {
`         fprintf(stderr, "`ptr_into_array` doesn’t point into the array!\n"); `
        abort();
    }
    ...

Rewrite these checks to avoid all undefined behavior. You will likely add one or more casts to uintptr_t. For full credit, write each check as a single comparison (no && or ||, even though the current ptr_into_array check uses ||).


`array_size` check:
`ptr_into_array` check:

QUESTION 2C. In Lecture datarep4, we discussed several ways to tell if a signed integer x is negative. One of them was the following:

int isnegative = (x & (1UL << (sizeof(x) * CHAR_BIT))) != 0;

But this is incorrect: it has undefined behavior. Correct it by adding two characters.

3. Memory errors and garbage collection (10 points)

Recall that a conservative garbage collector is a program that can automatically free dynamically-allocated memory by detecting when that memory is no longer referenced. Such a GC works by scanning memory for currently-referenced pointers, starting from stack and global memory, and recursing over each referenced object until all referenced memory has been scanned. We built a conservative garbage collector in lecture datarep6.

QUESTION 3A. An application program that uses conservative GC, and does not call free directly, will avoid certain errors and undefined behaviors. Which of the following errors are avoided? List all that apply.

Use-after-free
Double free
Signed integer overflow
Boundary write error
Unaligned access

QUESTION 3B. Write a C program that leaks unbounded memory without GC, but does not do so with GC. You should need less than 5 lines. (Leaking “unbounded” memory means the program will exhaust the memory capacity of any machine on which it runs.)

QUESTION 3C. Not every valid C program works with a conservative GC, because the C abstract machine allows a program to manipulate pointers in strange ways. Which of the following pointer manipulations might cause the conservative GC from class to inappropriately free a memory allocation? List all that apply.

Storing the pointer in a uintptr_t variable
Writing the pointer to a disk file and reading it back later
Using the least-significant bit of the pointer to store a flag: int* set_ptrflag(int* x, int flagval) { return (int*) ((uintptr_t) x | (flagval ? 1 : 0)); } int get_ptrflag(int* x) { return (uintptr_t) x & 1; } int deref_ptrflag(int* x) { return ((int) ((uintptr_t) x & ~1UL)); }
Storing the pointer in textual form: void save_ptr(char buf[100], void* p) { sprintf(buf, "%p", p); } void* restore_ptr(const char buf[100]) { void* p; sscanf(buf, "%p", &p); return p; }

(…option 5 on next page…)
Splitting the pointer into two parts and storing the parts in an array: typedef union { unsigned long ival; unsigned arr[2]; } value; value save_ptr(void* p) { value v; v.arr[0] = (uintptr_t) p & 0xFFFFFFFFUL; v.arr[1] = ((uintptr_t) p / 4294967296UL) & 0xFFFFFFFFUL; return v; }

4. Bitwise (6 points)

QUESTION 4A. Consider this C fragment:

uintptr_t x = ...;
uintptr_t r = 0;
if (a < b) {
    r = x;
}

Or, shorter:

uintptr_t r = a < b ? x : 0;

Write a single expression that evaluates to the same value, but that does not use the conditional ?: operator. You will use the fact that a \< b always equals 0 or 1. For full credit, do not use expensive operators (multiply, divide, modulus).

QUESTION 4B. This function returns one more than the index of the least-significant 1 bit in its argument, or 0 if its argument is zero.

int ffs(unsigned long x) {
    for (int i = 0; i < sizeof(x) * CHAR_BIT; ++i) {
        if (x & (1UL << i)) {
            return i + 1;
        }
    }
    return 0;
}

This function runs in O(B) time, where B is the number of bits in an unsigned long. Write a version of ffs that runs instead in O(log B) time.

5. Golden Baron (12 points)

A very rich person has designed a new x86-64-based computer, the Golden Baron Supercomputer 9000, and is paying you handsomely to write a C compiler for it. There’s just one problem. This person, like many very rich people, is dumb, and on their computer, odd-numbered memory addresses don’t work for data. When data is loaded into a general-purpose register from an odd-numbered address, the value read is zero. For example, consider the following instructions:

movl $0x01020304, a(%rip)
movl a(%rip), %eax

(where the address of a is even). Executed on true x86-64, %rax will hold the value 0x01020304; on Golden Baron, %rax will hold 0x00020004.

But it is still possible to write a correct C compiler for this ungodly hardware—you just have to work around the bad memory with code. You plan to use two bytes of Golden Baron memory for every one byte of normal x86-64 memory. For instance, an array int a[2] = {1, 0x0a0b0c0d}; would be stored in 16 bytes of memory, like so:

01 00 00 00 00 00 00 00 0d 00 0c 00 0b 00 0a 00

Pointer arithmetic, and moving multi-byte values to and from registers, must account for the zero bytes that alternate with meaningful bytes. So to read the correct value for a[2], the compiler must arrange to read the bytes at addresses A+8, A+10, A+12, and A+14, where A is the address of the first byte in a.

QUESTION 5A. What should printf("%zu\n", sizeof(char)) print on Golden Baron?

QUESTION 5B. This function

int f(signed char* c, size_t i) {
    return c[i];
}

can compile to two instructions on x86-64, including retq. It can also compile to two instructions on Golden Baron. (We’re assuming that memory used for Golden Baron instructions works normally.) What are those instructions?

QUESTION 5C. This function

int g(int* a, size_t i) {
    return a[i];
}

can compile to two instructions on x86-64, but Golden Baron requires more work. Write the Golden Baron translation of this function in x86-64 assembly language. For partial credit, write C code that, executed on x86-64, would return the correct value from a Golden Baron-formatted array.

QUESTION 5D. The Golden Baron’s x86-64 processor actually supports a secret instruction, swizzleq SRC, REG, which rearranges the nybbles (the hexadecimal digits—the aligned 4-bit slices) of the destination register REG based on the source argument SRC. Here’s some examples. Assuming that %rax holds the value 0x0123456789ABCDEF, the following swizzleq instructions leave the indicated results in %rax:

swizzleq \$0, %rax -> 0xFFFFFFFFFFFFFFFF (the contents of nybble 0 [bits 0-3, inclusive], are repeated into every nybble)
swizzleq \$0xFEDCBA9876543210, %rax -> 0x0123456789ABCDEF (each nybble is mapped to its current value: nybble 0 [bits 0-3] is placed in nybble 0 [bits 0-3], nybble 1 in nybble 1, and so forth)
swizzleq \$0x0123456701234567, %rax -> 0xFEDCBA98FEDCBA98 (nybble 0 [bits 0-3] is placed in nybbles 7 and 15 [bits 28-31 and 60-63]; nybble 1 [bits 4-7] is placed in nybbles 6 and 14 [bits 24-27 and 56-59]; etc.)
swizzleq \$0xEFCDAB8967452301, %rax -> 0x1032547698BADCFE (the nybbles of each byte are exchanged)

Use swizzleq to shorten your answer for Part C.

6. Instruction behavior (10 points)

QUESTION 6A. Name three different x86-64 instructions that always modify the stack pointer, no matter their arguments (instruction names only; suffixes don’t count, so movl and movq are the same instruction name).

QUESTION 6B. Name three different x86-64 instructions that sometimes modify the stack pointer, depending on their arguments.

QUESTION 6C. Name three different x86-64 instructions that never modify the stack pointer, no matter their arguments.

QUESTION 6D. List three different instructions, including arguments, that if placed immediately before a retq instruction that ends a function, will never change the function’s behavior. The instructions should have different names. No funny business: assume the function was compiled from valid C, that relative jumps are fixed up, and that, for example, it doesn’t access its own code.

7. Calling convention (10 points)

The following questions concern valid C functions compiled using the normal x86-64 calling convention. True or false?

QUESTION 7A. If the function’s instructions do not save and restore any registers, then the C function did not call any other function.

QUESTION 7B. If the function’s instructions do not change the stack pointer, then the function’s instructions do not contain a call instruction.

QUESTION 7C. If the function’s instructions do not change the stack pointer, then the C function did not call any other function. Explain your answer briefly.

QUESTION 7D. If the function’s instructions do not modify the %rax register, then the C function must return void.

QUESTION 7E. If the function’s instructions store a local variable on the stack, then that variable’s address will be less than the function’s initial stack pointer.

8. I/O traces (10 points)

QUESTION 8A. Which of the following programs cannot be distinguished by the output of the strace utility? List all that apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t be distinguished, and D & E can’t be distinguished, but the groups can be distinguished from each other), list them all.

Sequential byte writes using stdio
Sequential byte writes using system calls
Sequential byte writes using system calls and O_SYNC
Sequential block writes using stdio and block size 2
Sequential block writes using system calls and block size 2
Sequential block writes using system calls and O_SYNC and block size 2
Sequential block writes using stdio and block size 4096
Sequential block writes using system calls and block size 4096
Sequential block writes using system calls and O_SYNC and block size 4096

QUESTION 8B. Which of the programs in Part A cannot be distinguished using blktrace output? List all that apply.

QUESTION 8C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.

Application programs can obtain direct read access to the buffer cache
Application programs can obtain direct write access to the disk, bypassing the buffer cache
Other computers can communicate with the disk independently
The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost

QUESTION 8D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.

9. Reference strings and eviction (10 points)

QUESTION 9A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.

QUESTION 9B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.

QUESTION 9C. Give a single reference string with the following properties:

There exists a cache size and eviction policy that gives a 70% hit rate for the string.
There exists a cache size and eviction policy that gives a 0% hit rate for the string.

QUESTION 9D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.

FIFO
LRU
Random

10. Cache code (12 points)

Several famous musicians have just started working on CS61 Problem Set 3. They share the following code for their read-only, sequential, single-slot cache:

struct io61_file {
    int fd;
    unsigned char buf[4096];
`     size_t pos;    // position of next character to read in `buf` `
`     size_t sz;     // number of valid characters in `buf` `
};

int io61_readc(io61_file* f) {
    if (f->pos >= f->sz) {
        f->pos = f->sz = 0;
        ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
        if (nr <= 0) {
            f->sz = 0;
            return -1;
        } else {
            f->sz = nr;
        }
    }
    int ch = f->buf[f->pos];
    ++f->pos;
    return ch;
}

But they have different io61_read implementations. Donald (Lambert)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    return read(f->fd, buf, sz);
}

Solange (Knowles)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    for (size_t pos = 0; pos < sz; ++pos, ++buf) {
        *buf = io61_readc(f);
    }
    return sz;
}

Caroline (Shaw)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    if (f->pos >= f->sz) {
        return read(f->fd, buf, sz);
    } else {
        int ch = io61_readc(f);
        if (ch < 0) {
            return 0;
        }
        *buf = ch;
        return io61_read(f, buf + 1, sz - 1) + 1;
    }
}

You are testing each of these musicians’ codes by executing a sequence of io61_readc and/or io61_read calls on an input file and printing the resulting characters to standard output. There are no seeks, and your test programs print until end of file, so your tests’ output should equal the input file’s contents.

You should assume for these questions that no read system call ever returns -1.

QUESTION 10A. Describe an access pattern—that is, a sequence of io61_readc and/or io61_read calls (with lengths)—for which Donald’s code can return incorrect data.

QUESTION 10B. Which of these musicians’ codes can generate an output file with incorrect length?

For the remaining parts, assume the problem in Part B has been corrected, so that all musicians’ codes generate output files with correct lengths.

QUESTION 10C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.

QUESTION 10D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.

11. Feedback (2 points)

QUESTION 11A. What aspects of the course are working well for you? Be brief; any answer but no answer will receive credit.

QUESTION 11B. What aspects of the course are not working well for you? Be brief; any answer but no answer will receive credit.