NOTOC

2016 Midterm Solutions

These are the solutions to 2016’s midterm. Enjoy them!

General overview

Assume a CS61 Appliance running on the x86-64 architecture unless otherwise stated. The points allocated to each problem correspond roughly to how much time we think the problem should take. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Data representation: Allocation sizes (5 points)

typedef union {
    int f1[4];
    long f2[2];
} my_union;

int main() {
    char* p = malloc(sizeof(char*));
    my_union u;
    my_union* up = &u;
    ....
}

How much user-accessible space is allocated on the stack and/or the heap by each of the following statements? Assume x86-64.

QUESTION 1A. typedef union ... my_union;

0 (typedefs declare types, which are compile-time constructs; they don’t allocate space at runtime)

QUESTION 1B. char* p = malloc(sizeof(char*));

16: 8 on the heap plus 8 on the stack

QUESTION 1C. my_union u;

16 (on the stack)

QUESTION 1D. my_union* up = &u;

8 (on the stack) for the pointer

2. Data representation: ENIAC (10 points)

Professor Kohler has been busy! He'd developed Eddie's NIfty Awesome Computer (ENIAC). When he built the C compiler for ENIAC, he assigned the following sizes and alignments to C’s fundamental data types. (Assume that every other fundamental type has the same size and alignment as one of these.)

Type	`sizeof`	`alignof`
`char`	1	1
`char*`	16	16	Same for any pointer
`short`	4	4
`int`	8	8
`long`	16	16
`long long`	32	32
`float`	16	16
`double`	32	32

QUESTION 2A. This set of sizes is valid: it obeys all the requirements set by C's abstract machine. Give one different size assignment that would make the set as a whole invalid.

Some examples: sizeof(char) = 0; sizeof(char) = 2; sizeof(short) = 8 (i.e., longer than int); sizeof(int) = 1 (though not discussed in class, turns out that C requires ints are at least 2 bytes big; etc.

QUESTION 2B. What alignment must the ENIAC malloc guarantee?

32, the largest fundamental alignment

For the following two questions, assume the following struct on the ENIAC:

struct s {
    char f1[7];
    char *f2;
    short f3;
    int f4;
};

QUESTION 2C. What is sizeof(struct s)?

f1 is 7 bytes.

f2 is 16 bytes with 16-byte alignment, so add 9B padding.

f3 is 4 bytes (and is already aligned).

f4 is 8 bytes with 8-byte alignment, so add 4B padding.

That adds up to 7 + 9 + 16 + 4 + 4 + 8 = 16 + 16 + 16 = 48 bytes.

That’s a multiple of the structure’s alignment, which is 16, so no need for any end padding.

QUESTION 2D. What is alignof(struct s)?

The remaining questions refer to this structure definition:

// This include file defines a struct inner, but you do not know anything
// about that structure, just that it exists.
#include "inner.h"

struct outer {
    char f1[3];
    struct inner f2;
    short f3;
    int f4;
};

Indicate for each statement whether the statement is always true, possibly true, or never true on the ENIAC.

QUESTION 2E: sizeof(struct outer) > sizeof(struct inner) (Always / Possibly / Never)

Always

QUESTION 2F: sizeof(struct outer) is a multiple of sizeof(struct inner) (Always / Possibly / Never)

Possibly

QUESTION 2G: alignof(struct outer) > alignof(struct inner) (Always / Possibly / Never)

Possibly (depends on what struct inner contains)

QUESTION 2H: sizeof(struct outer) - sizeof(struct inner) < 4 (Always / Possibly / Never)

Never (struct outer contains at least 15 additional bytes)

QUESTION 2I: sizeof(struct outer) - sizeof(struct inner) > 32 (Always / Possibly / Never)

Possibly (depends on alignment)

QUESTION 2J: alignof(struct inner) == 2 (Always / Possibly / Never)

Never (no fundamental type has that alignment)

3. Undefined behavior (8 points)

Which of the following expressions, instruction sequences, and code behaviors cause undefined behavior? For each question, write Defined or Undefined. (Note that the INT_MAX and UINT_MAX constants have types int and unsigned, respectively.)

QUESTION 3A. INT_MAX + 1 (Defined / Undefined)

Undefined (signed integer addition is undefined if it wraps)

QUESTION 3B. UINT_MAX + 1 (Defined / Undefined)

Defined (unsigned integer addition is never undefined; it always acts like two’s complement)

QUESTION 3C.

movq $0x7FFFFFFFFFFFFFFF, %rax
addl $1, %rax

(Defined / Undefined)

Defined (only C abstract machine programs can be undefined)

QUESTION 3D. Failed memory allocation, i.e., malloc returns NULL (Defined / Undefined)

Defined

QUESTION 3E. Use-after-free (Defined / Undefined)

Undefined

QUESTION 3F. Here are two functions and a global variable:

const char string[128] = ".......";
int read_nth_char(int n) {
    return string[n];
}
int f(int i) {
    if (i & 0x40)
        return read_nth_char(i * 2);
    else
        return i * 2;
}

C’s undefined behavior rules would allow an aggressive optimizing compiler to simplify the code generated for f. Fill in the following function with the simplest C code you can, under the constraint that an aggressive optimizing compiler might generate the same object code for f and f_simplified.

int f_simplified(int i) {

    return i * 2;

The compiler can deduce that if i \< 0, read_nth_char will execute undefined behavior, so it may assume that i \>= 0. But then if (i & 0x40) != 0, i \* 2 will either wrap (which is undefined) or be greater than or equal to 128 (since ((i \* 2) & 0x80) != 0), causing undefined behavior. So the compiler can deduce that i \>= 0 && (i & 0x40) == 0.

4. Bit manipulation (8 points)

It’s common in systems code to need to switch data between big-endian and little-endian representations. This is because networks represent multi-byte integers using big-endian representation, whereas x86-family processors store multi-byte integers using little-endian representation.

QUESTION 4A. Complete this function, which translates an integer from big-endian representation to little-endian representation by swapping bytes. For instance, big_to_little(0x01020304) should return 0x04030201. Your return statement must refer to the u.c array, and must not refer to x.

unsigned big_to_little(unsigned x) {
    union {
        unsigned intval;
        unsigned char c[4];
    } u;
    u.intval = x;

    return (u.c[0] << 24) | (u.c[1] << 16) | (u.c[2] << 8) | u.c[3];

QUESTION 4B. Complete the function again, but this time write a single expression that refers to x (you may refer to x multiple times, of course).

unsigned big_to_little(unsigned x) {

    return ((x & 0xFF) << 24) | ((x & 0xFF00) << 8) | ((x & 0xFF0000) >> 8) | (x >> 24);

QUESTION 4C. Now write the function little_to_big, which will translate a little-endian integer into big-endian representation. You may introduce helper variables or even call big_to_little if that’s helpful.

unsigned little_to_big(unsigned x) {

    return big_to_little(x);

)

5. Assembly Language (8 points)

The next four questions pertain to the following four code samples.

f1:
       subq    $8, %rsp
       call    callfunc
       movl    %eax, %edx
       leal    1(%rax,%rax,2), %eax     
       testb   $1, %dl
       jne     .L3
       movl    %edx, %eax
       shrl    $31, %eax
       addl    %edx, %eax
       sarl    %eax
.L3:
       addq    $8, %rsp
       ret

f3:
       subq    $8, %rsp
       call    callfunc
       subl    $97, %eax
       cmpb    $4, %al
       ja      .L2 
       movzbl  %al, %eax
       jmp     *.L4(,%rax,8)
.L4:   
       .quad   .L3     
       .quad   .L9
       .quad   .L6
       .quad   .L7
       .quad   .L8
.L3:   
       movl    $42, %edx
       jmp     .L5
.L6:   
       movl    $4096, %edx
       jmp     .L5
.L7:   
       movl    $52, %edx
       jmp     .L5
.L8:   
       movl    $6440, %edx
       jmp     .L5
.L2:   
       movl    $0, %edx
       jmp     .L5
.L9:
       movl    $61, %edx
.L5:
       movl    $.LC0, %esi
       movl    $1, %edi
       movl    $0, %eax
       call    __printf_chk
       addq    $8, %rsp
       ret
.LC0:
       .string "Sum = %d\n"

f2:
       pushq   %rbx
       xorl    %ebx, %ebx
.L3:
       movl    %ebx, %edi
       addl    $1, %ebx
       call    callfunc
       cmpl    $10, %ebx
       jne     .L3
       popq    %rbx
       ret

f4:
       subq    $40, %rsp
       movl    $1, (%rsp)
       movl    $0, 16(%rsp)
.L2:   
       leaq    16(%rsp), %rsi    
       movq    %rsp, %rdi
       call    callfunc
       movl    16(%rsp), %eax    
       cmpl    %eax, (%rsp)
       jne     .L2
       addq    $40, %rsp
       ret

Now answer the following questions. Pick the most likely sample; you will use each sample exactly once.

QUESTION 5A. Which sample contains a for loop?

QUESTION 5B. Which sample contains a switch statement?

QUESTION 5C. Which sample contains only an if/else construct?

QUESTION 5D. Which sample contains a while loop?

6. Calling conventions: 6186 (15 points)

University Professor Helen Vendler is designing a poetic new processor, the 6186. Can you reverse-engineer some aspects of the 6186’s calling convention from its assembly?

Here’s a function:

int f(int* a, unsigned b) {
    extern int g(int x);
    int index = g(a[2*b + 1]);
    return a[index + b];
}

And here’s that function compiled into 6186 instructions.

f:
    sub $24, %rsp
    movq %ra, (%rsp)
    mov %rb, %rx
    shl $1, %rx
    add $1, %rx
    movl (%ra, %rx, 4), %ra
    call g
    add %rb, %rr
    movq (%rsp), %ra
    movl (%ra, %rr, 4), %ra
    mov %ra, %rr
    add $24, %rsp
    ret

6186 assembly syntax is based on x86-64 assembly, and like the x86-64, 6186 registers are 64 bits wide. However, the 6186 has a different set of registers. There are just five general-purpose registers, %ra, %rb, %rr, %rx, and %ry. (“[W]hen she tries to be deadly serious she is speaking under…constraint”.) The example also features the stack pointer register, %rsp.

Give brief explanations if unsure.

QUESTION 6A. Which register holds function return values?

%rr

After the call to g we use %rr (the add instruction behaves like %rr += %rb). Immediately before returning from f, we move a value into %rr.

QUESTION 6B. What is sizeof(int) on the 6186?

4 (the size in the movl instructions which access the a array)

QUESTION 6C. Which general-purpose register(s) must be callee-saved?

%rb

The function f never modifies %rb. Furthermore, it uses %rb after calling g. That would only be valid if %rb was the function return value—which it’s not—or %rb was callee-saved.

QUESTION 6D. Which general-purpose register(s) must be caller-saved?

%rr`, `%ra`, `%rx

The return register is always caller-saved. We can also observe f modifying %ra and %rx and not restoring their former values. Furthermore, we can observe f saving %ra on the stack before calling g, and restoring it after g returns—which would be unnecessary if %ra was callee-saved.

QUESTION 6E. Which general-purpose register(s) might be callee-saved or caller-saved (you can’t tell which)?

%ry

This register isn’t referenced, so who knows?

QUESTION 6F. Assuming the compiler makes function stack frames as small as possible given the calling convention, what is the alignment of stack frames?

32. The code adds 24 bytes of space on top of the 8 bytes required for the return address, 16 of which go unused. If the alignment were less than 32, the code wouldn’t waste so much space..

QUESTION 6G. Assuming that the 6186 supports the same addressing modes as the x86-64, write a single instruction that has the same effect on %ra as these three instructions:

shl $1, %rx
add $1, %rx
movl (%ra, %rx, 4), %ra

movl 4(%ra, %rx, 8), %ra

7. Caching: Reference strings (10 points)

The following questions concern the FIFO (First In First Out), LRU (Least Recently Used), and LFU (Least Frequently Used) cache eviction policies.

Your answers should refer to seven-item reference strings made up of digits in the range 0–9. An example answer might be “1231231”. In each case, the reference string is processed by a 3-slot cache that’s initially empty.

QUESTION 7A. Give a reference string that has a 1/7 hit rate in all three policies.

1123456 (for example)

QUESTION 7B. Give a reference string that has a 6/7 hit rate in all three policies.

1111111 (for example). Note that the first access is always a miss

QUESTION 7C. Give a reference string that has different hit rates under LRU and LFU policies, and compute the hit rates.

Example string: 1123411

LRU hit rate: 2/7

LFU hit rate: 3/7

QUESTION 7D. Give a reference string that has different hit rates under FIFO and LRU policies, and compute the hit rates.

Example string: 1231411

FIFO hit rate: 2/7

LRU hit rate: 3/7

QUESTION 7E. Now let's assume that you know a reference string in advance. Given a 3-slot cache and the following reference string, what caching algorithm discussed in class and/or exercises would produce the best hit rate, and would would that hit rate be?

“12341425321521”

Bélády’s optimal algorithm!

Cache analysis: 1m 2m 3m 4m [124] 1h 4h 2h 5m [125] 3m [123] 2h 1h 5m [125] 2h 1h

7 hits/14 accesses = 1/2 hit rate.

8. Caching: Access times and hit rates (5 points)

Recall that x86-64 instructions can access memory in units of 1, 2, 4, or 8 bytes at a time. Assume we are running on an x86-64-like machine with 1024-byte cache lines. Our machine takes 32ns to access a unit if the cache hits, regardless of unit size. If the cache misses, an additional 8160ns are required to load the cache, for a total of 8192ns.

QUESTION 8A. What is the average access time per access to access all the data in a cache line as an array of 256 integers, starting from an empty cache? (2 points total, 1 point for an attempt that shows some understanding)

(8192ns * 1 + 32ns * 255)/256 (= 63.875)

QUESTION 8B. What unit size (1, 2, 4, or 8) minimizes the access time to access all data in a cache line, starting from an empty cache? (3 points for 5B and 5C, 1 point for writing anything, 1 point for each correct answer)

QUESTION 8C. What unit size (1, 2, 4, or 8) maximizes the hit rate to access all data in a cache line, starting from an empty cache?

9. Single-slot cache code (14 points)

Donald Duck is working on a single-slot cache for reading. He’s using the pos_tag/end_tag representation, which is:

struct io61_file {
   int fd;
   unsigned char cbuf[BUFSIZ];
   off_t tag;      // file offset of first character in cache (same as before)
`    off_t end_tag;  // file offset one past last valid char in cache; end_tag - tag == old `csz` `
`    off_t pos_tag;  // file offset of next char to read in cache; pos_tag - tag == old `cpos` `
};

Here’s our solution code; in case you want to scribble, the code is copied in the appendix.

 1.  ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
 2.      size_t pos = 0;
 3.      while (pos != sz) {
 4.          if (f->pos_tag < f->end_tag) {
 5.              ssize_t n = sz - pos;
 6.              if (n > f->end_tag - f->pos_tag)
 7.                  n = f->end_tag - f->pos_tag;
 8.              memcpy(&buf[pos], &f->cbuf[f->pos_tag - f->tag], n);
 9.              f->pos_tag += n;
10.              pos += n;
11.          } else {
12.              f->tag = f->end_tag;
13.              ssize_t n = read(f->fd, f->cbuf, BUFSIZ);
14.              if (n > 0)
15.                  f->end_tag += n;
16.              else
17.                  return pos ? pos : n;
18.          }
19.      }
20.      return pos;
21.  }

Donald has ideas for “simplifying” this code. Specifically, he wants to try each of the following independently:

Replacing line 4 with “if (f->pos_tag <= f->end_tag) {”.
Removing lines 6–7.
Removing line 9.
Removing lines 16–17.

QUESTION 9A. Which simplifications could lead to undefined behavior? List all that apply or say “none.”

B (removing 6–7): you read beyond the cache buffer.

QUESTION 9B. Which simplifications could cause io61_read to loop forever without causing undefined behavior? List all that apply or say “none.”

A (replacing 4): you spin forever after exhausting the cache

D (removing 16–17): you spin forever if the file runs out of data or has a persistent error

QUESTION 9C. Which simplifications could lead to io61_read returning incorrect data in buf, meaning that the data read by a series of io61_read calls won’t equal the data in the file? List all that apply or say “none.”

B (removing 6–7): you read garbage beyond the cache buffer

C (removing 9): you read the same data over & over again.

QUESTION 9D. Chastened, Donald decides to optimize the code for a specific situation, namely when io61_read is called with a sz that is larger than BUFSIZ. He wants to add code after line 11, like so, so that fewer read system calls will happen for large sz:

11.          } else if (sz - pos > BUFSIZ) {
                 // DONALD’S CODE HERE

11A.         } else {
12.              f->tag = f->end_tag;
                 ....

Finish Donald’s code. Your code should maintain the relevant invariants between tag, pos_tag, end_tag, and the file position, but you need not keep tag aligned.

ssize_t n = read(f->fd, &buf[pos], sz - pos);
if (n > 0) {
    f->tag = f->pos_tag = f->end_tag = f->end_tag + n;
    pos += n;
} else
    return pos ? pos : n;