# Midterm Sample Questions

This is a bank of questions from prior midterms. The real midterm will not have this many questions. We cover slightly different topics from year to year; some of these questions might not be appropriate this year.

### Rules

Previous exams used the following rules. This year’s rules will be similar.

The exam is open book, open note, open computer. You may access the book, and your own notes in paper form. You may also use a computer or equivalent to access your own class materials and public class materials. However, you may not access other materials except as explicitly allowed below, and you may not write programs to answer questions. Specifically:

• You may access a browser and a PDF reader.
• You may access your own notes and problem set code electronically.
• You may access an Internet site on which your own notes and problem set code are stored.
• You may access the cs61.seas.harvard.edu/wiki/2017 site.
• You may access pages directly linked from the cs61.seas.harvard.edu/wiki/2017 site, including scribe notes and our lecture and section notes.

But:

• You may not access Google or Wikipedia or anything else except as directly linked from the cs61.seas.harvard.edu/wiki/2017 site.
• You may not access Piazza.
• You may not access course videos.
• You may not run a C compiler, assembler, on-line disassembler, calculator, or anything similar. Simple reading applications only.
• You absolutely may not contact other humans via IM or anything like it.
• You may not access solutions from any previous exam, by paper or computer.

Any violations of this policy, or the spirit of this policy, are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

## DATAREP-1. Sizes and alignments

QUESTION DATAREP-1A. True or false: For any non-array type X, the size of X (`sizeof(X)`) is greater than or equal to the alignment of type X.

QUESTION DATAREP-1B. True or false: For any type X, the size of `struct Y { X a; char newc; }` is greater than the size of X.

QUESTION DATAREP-1C. True or false: For any types `A1`...`An` (with `n` ≥ 1), the size of `struct Y` is greater than the size of `struct X`, given:

```  struct X {     struct Y {
A1 a1;         A1 a1;
...            ...
An an;         An an;
};                char newc;
};
```

QUESTION DATAREP-1D. True or false: For any types `A1`...`An` (with `n` ≥ 1), the size of `struct Y` is greater than the size of `union X`, given:

```  union X {      struct Y {
A1 a1;         A1 a1;
...            ...
An an;         An an;
};             };
```

Let `alignof(T)` equal the alignment of type T.

QUESTION DATAREP-1E. Assume that structure `struct Y { ... }` contains K `char` members and M `int` members, with KM, and nothing else. Write an expression defining the maximum `sizeof(struct Y)`.

QUESTION DATAREP-1F. You are given a structure `struct Z { T1 a; T2 b; T3 c; }` that contains no padding. What does `(sizeof(T1) + sizeof(T2) + sizeof(T3)) % alignof(struct Z)` equal?

QUESTION DATAREP-1G. Arrange the following types in increasing order by size. Sample answer: “1 < 2 = 4 < 3” (choose this if #1 has smaller size than #2, which has equal size to #4, which has smaller size than #3).

1. `char`
2. `struct minipoint { uint8_t x; uint8_t y; uint8_t z; }`
3. `int`
4. `unsigned short[1]`
5. `char**`
6. `double[0]`

## DATAREP-2. Expressions

QUESTION DATAREP-2A. Here are eight expressions. Group the expressions into four pairs so that the two expressions in each pair have the same value, and each pair has a different value from every other pair. There is one unique answer that meets these constraints. `m` has the same type and value everywhere it appears (there’s one unique value for `m` that meets the problem’s constraints). Assume an x86-32 machine: a 32-bit architecture in which pointers are 32 bits long.

1. `sizeof(&m)`
2. `-1`
3. `m & -m`
4. `m + ~m + 1`
5. `16 >> 2`
6. `m & ~m`
7. `m`
8. `1`

## DATAREP-3. Hello binary

This problem locates 8-bit numbers horizontally and vertically in the following 16x16 image. Black pixels represent 1 bits and white pixels represent 0 bits. For horizontal arrangements, the most significant bit is on the left as usual. For vertical arrangements, the most significant bit is on top.

Examples: The 8-bit number 15 (hexadecimal 0x0F, binary 0b00001111) is located horizontally at 3,4, which means X=3, Y=4.

• The pixel at 3,4 is white, which has bit value 0.
• 4,4 is white, also 0.
• 5,4 is white, also 0.
• 6,4 is white, also 0.
• 7,4 is black, which has bit value 1.
• 8,4, 9,4, and 10,4 are black, giving three more 1s.
• Reading them all off, this is 0b00001111, or 15.

15 is also located horizontally at 7,6.

The 8-bit number 0 is located vertically at 0,0. It is also located horizontally at 0,0 and 1,0.

The 8-bit number 134 (hexadecimal 0x86, binary 0b10000110) is located vertically at 8,4.

QUESTION DATAREP-3A. Where is 3 located vertically? (All questions refer to 8-bit numbers.)

QUESTION DATAREP-3B. Where is 12 located horizontally?

QUESTION DATAREP-3C. Where is 255 located vertically?

## DATAREP-4. Hello memory

Shintaro Tsuji wants to represent the image of Question DATAREP-3 in computer memory. He stores it in an array of 16-bit unsigned integers:

`uint16_t cute[16];`

Row Y of the image is stored in integer `cute[Y]`.

QUESTION DATAREP-4A. What is `sizeof(cute)`, 2, 16, 32, or 64?

QUESTION DATAREP-4B. `printf("%d\n", cute[0]);` prints `16384`. Is Shintaro’s machine big-endian or little-endian?

## DATAREP-5. Hello program

Now that Shintaro has represented the image in memory as an array of `uint16_t` objects, he can manipulate the image using C. For example, here’s a function.

```void swap(void) {
for (int i = 0; i < 16; ++i)
cute[i] = (cute[i] << 8) | (cute[i] >> 8);
}```

Running `swap` produces the following image:

Shintaro has written several other functions. Here are some images (A is the original):

For each function, what image does that function create?

QUESTION DATAREP-5A.

```void f0() {
for (int i = 0; i < 16; ++i)
cute[i] = ~cute[i];
}```

QUESTION DATAREP-5B.

```void f1() {
for (int i = 0; i < 16; ++i) {
cute[i] = ((cute[i] >> 1) & 0x5555) | ((cute[i] << 1) & 0xAAAA);
cute[i] = ((cute[i] >> 2) & 0x3333) | ((cute[i] << 2) & 0xCCCC);
cute[i] = ((cute[i] >> 4) & 0x0F0F) | ((cute[i] << 4) & 0xF0F0);
cute[i] =  (cute[i] >> 8)           |  (cute[i] << 8);
}
}```

QUESTION DATAREP-5C.

```void f2() {
char *x = (char *) cute;
for (int i = 0; i < 16; ++i)
x[2*i] = i;
}```

#### For “fun”

The following programs generated the other images. Can you match them with their images?

```void f3() {
for (int i = 0; i < 16; ++i)
cute[i] &= ~(7 << i);
}```
```void f4() {
swap();
for (int i = 0; i < 16; ++i)
cute[i] <<= i/4;
swap();
}```
```void f5() {
for (int i = 0; i < 16; ++i)
cute[i] = -1 * !!(cute[i] & 64);
}```
```void f6() {
for (int i = 0; i < 8; ++i) {
int tmp = cute[15-i];
cute[15-i] = cute[i];
cute[i] = tmp;
}
}```
```void f7() {
for (int i = 0; i < 16; ++i)
cute[i] = cute[i] & -cute[i];
}```
```void f8() {
for (int i = 0; i < 16; ++i)
cute[i] ^= cute[i] ^ cute[i];
}```
```void f9() {
for (int i = 0; i < 16; ++i)
cute[i] = cute[i] ^ 4080;
}```

## DATAREP-6. Memory regions

Consider the following program:

```struct ptrs {
int** x;
int* y;
};

struct ptrs global;

void setup(struct ptrs* p) {
int* a = malloc(sizeof(int));
int* b = malloc(sizeof(int));
int* c = malloc(sizeof(int));
int* d = malloc(sizeof(int));
int* e = malloc(sizeof(int) * 2);
int** f = malloc(4 * sizeof(int*));
int** g = malloc(sizeof(int*));

*a = 0;
*b = 0;
*c = (int) a;
*d = *b;
e[0] = 29;
e[1] = (int) &d[100000];

f[0] = b;
f[1] = c;
f[2] = 0;
f[3] = 0;

*g = c;

global.x = f;
global.y = e;

p->x = g;
p->y = &e[1];
}

int main(int argc, char** argv) {
stack_bottom = (char*) &argc;
struct ptrs p;
setup(&p);
m61_collect();
do_stuff(&p);
}```

This program allocates objects `a` through `g` on the heap and then stores those pointers in some stack and global variables. (It then calls our conservative garbage collector from class, but that won’t matter until the next problem.) We recommend you draw a picture of the state `setup` creates.

QUESTION DATAREP-6A. Assume that `(uintptr_t) a == 0x8300000`, and that `malloc` returns increasing addresses. Match each address to the most likely expression with that address value. The expressions are evaluated within the context of `main`. You will not reuse an expression.

Value   Expression
1.   0x8300040 A.   `&p`
2.   0x8049894 B.   `(int*) *p.x[0]`
3.   0x8361AF0 C.   `&global.y`
4.   0x8300000 D.   `global.y`
5.   0xBFAE0CD8 E.   `(int*) *p.y`

## DATAREP-7. Garbage collection

Here is the top-level function for the conservative garbage collector we wrote in class.

```void m61_collect(void) {
char* stack_top = (char*) &stack_top;

// The entire contents of the heap start out unmarked
for (size_t i = 0; i != nmr; ++i) {
mr[i].marked = 0;
}

// Mark all reachable objects, starting with the roots (the stack)
m61_markaccessible(stack_top, stack_bottom - stack_top);

// Free everything that wasn't marked
for (size_t i = 0; i != nmr; ++i) {
if (mr[i].marked == 0) {
m61_free(mr[i].ptr);
--i;                // m61_free moved different data into this
// slot, so we must recheck the slot
}
}
}```

This garbage collector is not correct because it doesn’t capture all memory roots.

Consider the program from the previous section, and assume that an object is reachable if `do_stuff` can access an address within the object via variable references and memory dereferences without casts or pointer arithmetic. Then:

QUESTION DATAREP-7A. Which reachable objects will `m61_collect()` free? Circle all that apply.

 `a` `b` `c` `d` `e` `f` `g` None of these

QUESTION DATAREP-7B. Which unreachable objects will `m61_collect()` not free? Circle all that apply.

 `a` `b` `c` `d` `e` `f` `g` None of these

QUESTION DATAREP-7C. Conservative garbage collection in C is often slower than precise garbage collection in languages such as Java. Why? Circle all that apply.

1. C is generally slower than other languages.
2. Conservative garbage collectors must search all reachable memory for pointers. Precise garbage collectors can ignore values that do not contain pointers, such as large character buffers.
3. C programs generally use the heap more than programs in other languages.
4. None of the above.

## DATAREP-8. Memory errors

The following function constructs and returns a lower-triangular matrix of size N. The elements are random 2-dimensional points in the unit square. The matrix is represented as an array of pointers to arrays.

``` typedef struct point2 {
double d[2];
} point2;
typedef point2* point2_vector;

point2_vector* make_random_lt_matrix(size_t N) {
point2_vector* m = (point2_vector*) malloc(sizeof(point2_vector) * N);
for (size_t i = 0; i < N; ++i) {
m[i] = (point2*) malloc(sizeof(point2) * (i + 1));        /* LINE A */
for (size_t j = 0; j <= i; ++j) {
for (int d = 0; d < 2; ++d) {
m[i][j].d[d] = drand48();                          /* LINE B */
}
}
}
return m;
}
```

This code is running on an x86-32 machine (`size_t` is 32 bits, not 64). You may assume that the machine has enough free physical memory and the process has enough available virtual address space to satisfy any `malloc()` call.

QUESTION DATAREP-8A. Give a value of N so that, while `make_random_lt_matrix(N)` is running, no `malloc()` returns `NULL`, but a memory error (such as a null pointer dereference or an out-of-bounds dereference) happens on Line A. The memory error should happen specifically when `i == 1`.

QUESTION DATAREP-8B. Give a value of N so that no `malloc()` returns `NULL`, and no memory error happens on Line A, but a memory error does happen on Line B.

## DATAREP-9. Data representation

Assume a 64-bit x86-64 architecture unless explicitly told otherwise.

Write your assumptions if a problem seems unclear, and write down your reasoning for partial credit.

QUESTION DATAREP-9A. Arrange the following values in increasing numeric order. Assume that `x` is an `int` with value 8192.

 `EOF` `x & ~x` `(signed char) 0x47F` `x | ~x` `1000` `(signed char) 65535` The size of the stdio cache `-0x80000000`

A possible answer might be “a < b < c = d < e < f < g < h.”

For each of the remaining questions, write one or more arguments that, when passed to the provided function, will cause it to return the integer 61 (which is 0x3d hexadecimal). Write the expected number of arguments of the expected types.

QUESTION DATAREP-9B.

```int f1(int n) {
return 0x11 ^ n;
}
```

QUESTION DATAREP-9C.

```int f2(const char* s) {
return strtol(s, NULL, 0);
}
```

```int f3(const char* s) {
return strtol(s, NULL, 0);
}
```

QUESTION DATAREP-9E. For this problem, you will also need to define a global variable. Give its type and value.

```f4:
andl \$5, %edi
leal (%rsi,%rdi,2), %eax
movzbl y(%rip), %ecx
subl %ecx, %eax
retq
```

## DATAREP-10. Sizes and alignments

Assume a 64-bit x86-64 architecture unless explicitly told otherwise.

Write your assumptions if a problem seems unclear, and write down your reasoning for partial credit.

QUESTION DATAREP-10A. Use the following members to create a struct of size 16, using each member exactly once, and putting `char a` first; or say “impossible” if this is impossible.

1. `char a;` (we’ve written this for you)
2. `unsigned char b;`
3. `short c;`
4. `int d;`
```struct size_16 {
char a;

};
```

QUESTION DATAREP-10B. Repeat Part A, but create a struct with size 12.

```struct size_12 {
char a;

};
```

QUESTION DATAREP-10C. Repeat Part A, but create a struct with size 8.

```struct size_8 {
char a;

};
```

QUESTION DATAREP-10D. Consider the following structs:

```struct x {
T x1;
U x2;
};
struct y {
struct x y1;
V y2;
};
```

Give definitions for T, U, and V so that there is one byte of padding in `struct x` after `x2`, and two bytes of padding in `struct y` after `y1`.

## DATAREP-11. Dynamic memory allocation

QUESTION DATAREP-11A. True or false?

1. `free(NULL)` is an error.
2. `malloc(0)` can never return `NULL`.

QUESTION DATAREP-11B. Give values for `sz` and `nmemb` so that `calloc(sz, nmemb)` will always return `NULL` (on a 32-bit x86 machine), but `malloc(sz * nmemb)` might or might not return null.

Consider the following 8 statements. (`p` and `q` have type `char*`.)

1. `free(p);`
2. `free(q);`
3. `p = q;`
4. `q = NULL;`
5. `p = (char*) malloc(12);`
6. `q = (char*) malloc(8);`
7. `p[8] = 0;`
8. `q[4] = 0;`

QUESTION DATAREP-11C. Put the statements in an order that would execute without error or evoking undefined behavior. Memory leaks count as errors. Use each statement exactly once. Sample answer: “abcdefgh.”

QUESTION DATAREP-11D. Put the statements in an order that would cause one double-free error, and no other error or undefined behavior (except possibly one memory leak). Use each statement exactly once.

QUESTION DATAREP-11E. Put the statements in an order that would cause one memory leak (one allocated piece of memory is not freed), and no other error or undefined behavior. Use each statement exactly once.

QUESTION DATAREP-11F. Put the statements in an order that would cause one boundary write error, and no other error or undefined behavior. Use each statement exactly once.

## DATAREP-12. Pointers and debugging allocators

You are debugging some students’ `m61` code from Problem Set 1. The codes use the following metadata:

```typedef struct meta { ...
struct meta* next;
struct meta* prev;
} meta;

```

Their linked-list manipulations in `m61_malloc` are similar.

```void* m61_malloc(size_t sz, const char* file, int line) {
...
meta* m = (meta*) ptr;
m->prev = NULL;
...
}
```

But their linked-list manipulations in `m61_free` differ.

 Alice’s code: ```void m61_free(void* ptr, ...) { ... meta* m = (meta*) ptr - 1; if (m->next != NULL) m->next->prev = m->prev; if (m->prev == NULL) mhead = NULL; else m->prev->next = m->next; ... } ``` Bob’s code: ```void m61_free(void* ptr, ...) { ... meta* m = (meta*) ptr - 1; if (m->next) m->next->prev = m->prev; if (m->prev) m->prev->next = m->next; ... } ``` Chris’s code: ```void m61_free(void* ptr, ...) { ... meta* m = (meta*) ptr - 1; m->next->prev = m->prev; m->prev->next = m->next; ... } ``` Donna’s code: ```void m61_free(void* ptr, ...) { ... meta* m = (meta*) ptr - 1; if (m->next) m->next->prev = m->prev; if (m->prev) m->prev->next = m->next; else mhead = m->next; ... } ```

You may assume that all code not shown is correct.

QUESTION DATAREP-12A. Whose code will segmentation fault on this input? List all students that apply.

```int main() {
void* ptr = malloc(1);
free(ptr);
}
```

QUESTION DATAREP-12B. Whose code might report something like “`invalid free of pointer [ptr1], not allocated`” on this input? (Because a list traversal starting from `mhead` fails to find `ptr1`.) List all students that apply. Don’t include students whose code would segfault before the report.

```int main() {
void* ptr1 = malloc(1);
void* ptr2 = malloc(1);
free(ptr2);
free(ptr1);   // <- message printed here
}
```

QUESTION DATAREP-12C. Whose code would improperly report something like “`LEAK CHECK: allocated object [ptr1] with size 1`” on this input? (Because the `mhead` list appears not empty, although it should be.) List all students that apply. Don’t include students whose code would segfault before the report.

```int main() {
void* ptr1 = malloc(1);
free(ptr1);
m61_printleakreport();
}
```

QUESTION DATAREP-12D. Whose linked-list code is correct for all inputs? List all that apply.

## DATAREP-13. Arena allocation

Chimamanda Ngozi Adichie is a writing a program that needs to allocate and free a lot of nodes, where a node is defined as follows:

```typedef struct node {
int key;
void* value;
struct node* left;
struct node* right;      // also used in free list
} node;
```

She uses a variant of the arena allocator we saw in Lectures 5 and 6. Here’s part of the code.

```typedef struct arena_group {
struct arena_group* next_group;
node nodes[1024];
} arena_group;

typedef struct arena {
node* frees;
arena_group* groups;
} arena;

node* node_alloc(arena* a) {
if (!a->frees) {
arena_group* g = (arena_group*) malloc(sizeof(arena_group));
// ... link `g` to `a->groups` ...
for (size_t i = 0; i != 1023; ++i)
g->nodes[i].right = &g->nodes[i + 1];
g->nodes[1023].right = NULL;
a->frees = &g->nodes[0];
}
node* n = a->frees;
a->frees = n->right;
return n;
}

void node_free(arena* a, node* n) {
n->right = a->frees;
a->frees = n;
}
```

QUESTION DATAREP-13A. True or false?

1. This allocator never has external fragmentation.
2. This allocator never has internal fragmentation.

QUESTION DATAREP-13B. Chimamanda’s frenemy Paul Auster notices that if many nodes are allocated right in a row, every 1024th allocation seems much more expensive than the others. The reason is that every 1024th allocation initializes a new group, which in turn adds 1024 nodes to the free list. Chimamanda decides instead to allow a single element of the free list to represent many contiguous free nodes. The average allocation might get a tiny bit slower, but no allocation will be much slower than average. Here’s the start of her idea:

```node* node_alloc(arena* a) {
if (!a->frees) {
arena_group* g = (arena_group*) malloc(sizeof(arena_group));
// ... link `g` to `a->groups` ...
g->nodes[0].key = 1024;   // g->nodes[0] is the 1st of 1024 contiguous free nodes
g->nodes[0].right = NULL;
a->frees = &g->nodes[0];
}
node* n = a->frees;
// ???
return n;
}
```

Complete this function by writing code to replace `// ???`.

QUESTION DATAREP-13C. Write a `node_free` function that works with the `node_alloc` function from the previous question.

```void node_free(arena* a, node* n) {
```
```}
```

QUESTION DATAREP-13D. Complete the following new function.

```// Return the arena_group containing node `n`. `n` must be a node returned by
// a previous call to `node_alloc(a)`.
arena_group* node_find_group(arena* a, node* n) {
for (arena_group* g = a->groups; g; g = g->next_group) {
}
return NULL;
}
```

QUESTION DATAREP-13E. Chimamanda doesn’t like that the `node_find_group` function from part D takes O(G) time, where G is the number of allocated arena_groups. She remembers a library function that might help, `posix_memalign`:

```int posix_memalign(void** memptr, size_t alignment, size_t size);

```

The function posix_memalign() allocates `size` bytes and places the address of the allocated memory in `*memptr`. The address of the allocated memory will be a multiple of `alignment`, which must be a power of two and a multiple of `sizeof(void *)`. ...

“Cool,” she says, “I can use this to speed up `node_find_group`!” She now allocates a new group with the following code:

```arena_group* g;
int r = posix_memalign(&g, 32768, sizeof(arena_group));
assert(r == 0); // posix_memalign succeeded
```

Given this allocation strategy, write a version of `node_find_group` that takes O(1) time.

```arena_group* node_find_group(arena* a, node* n) {
```
```}
```

## DATAREP-14. Data representation

Sort the following expressions in ascending order by value, using the operators <, =, >. For example, if we gave you:

```A. int i = 6;
B. int j = 0x6;
C. int k = 3;
```

you would write `k < i = j or k < j = i`.

```a. unsigned char u = 0x191; truncated to 0x91 = 9*16+1 = 145
b. char c = 0x293; truncated to 0x92, but signed, so -0x6D
c. unsigned long l = 0xFFFFFFFF; That's 2^32 - 1
d. int i = 0xFFFFFFFF; That's -1
e. int j = i + 3; That's 2
f. 4 GB That's 2^32
g. short *s; sizeof(*s); That's 2
h. long l = 256; That's 256
i. (binary) 100000000000000000000000000000000000 2^36
j. unsigned long Q = 0xACE - 0x101; 0x9CD=2509
```

## DATAREP-15. Memory

For the following questions, select the part(s) of memory from the list below that best describes where you will find the object.

1. heap
2. stack
3. between the heap and the stack
4. in a read-only data segment
5. in a text segment starting at address 0x08048000
6. in a read/write data segment
7. in a register

Assume the following code, compiled without optimization.

```#include <stdio.h>
#include <stdlib.h>
const long maxitems = 1000;
struct info {
char name[20];
unsigned int age;
short height;
} s = { "sushi", 1, 9 };
int main(int argc, char* argv[]) {
int i, num = maxitems + 1;
struct info *sp;
printf("What did you do? %lx?\n", u);
while (num > maxitems || num < 10) {
printf("How much of it did you eat? ");
scanf(" %d", &num);
}
sp = (struct info *)malloc(num * sizeof(*sp));
for (i = 0; i < num; i++) {
sp[i] = s;
}
}
```

QUESTION DATAREP-15A. The value 0xdeadbeef, when we are returning from main.

QUESTION DATAREP-15B. The variable maxitems

QUESTION DATAREP-15C. The structure s

QUESTION DATAREP-15D. The structure at sp[9]

QUESTION DATAREP-15E. The variable u

QUESTION DATAREP-15F. main

QUESTION DATAREP-15G. printf

QUESTION DATAREP-15H. argc

QUESTION DATAREP-15I. The number the user enters

QUESTION DATAREP-15J. The variable L

## DATAREP-16. Memory and pointers

If multiple processes are sharing data via mmap, they may have the file mapped at different virtual addresses. In this case, pointers to the same object will have different values in the different processes. One way to store pointers in mmapped memory so that multiple processes can access them consistently is using relative pointers. Rather than storing a regular pointer, you store the offset from the beginning of the mmapped region and add that to the address of the mapping to obtain a real pointer. An alternative representation is called selfrelative pointers. In this case, you store the difference in address between the current location (i.e., the location containing the pointer) and the location to which you want to point. Neither representation addresses pointers between the mmapped region and the rest of the address space; you may assume such pointers do not exist.

QUESTION DATAREP-16A. State one advantage that relative pointers have over self-relative pointers.

QUESTION DATAREP-16B. State one advantage that self-relative pointers have over relative pointers.

For the following questions, assume the following setup:

```char *region; /* Address of the beginning of the region. */
/*
* The following are sample structures you might find in
* a linked list that you are storing in an mmaped region.
*/
struct ll1 {
unsigned value;
TYPE1 r_next; /* Relative Pointer. */
};
struct ll2 {
unsigned value;
TYPE2 sr_next; /* Self-Relative Pointer. */
};
struct ll1 node1;
struct ll2 node2;
```

QUESTION DATAREP-16C. Propose a type for TYPE1 and give 1 sentence why you chose that type.

QUESTION DATAREP-16D. Write a C expression to generate a (properly typed) pointer to the element referenced by the r_next field of `ll1`.

QUESTION DATAREP-16E. Propose a type for TYPE2 and give 1 sentence why you chose that type.

QUESTION DATAREP-16F. Write a C expression to generate a (properly typed) pointer to the element referenced by the sr_next field of `ll2`.

## DATAREP-17. Data representation: Allocation sizes

```typedef union {
int f1[4];
long f2[2];
} my_union;

int main() {
char* p = malloc(sizeof(char*));
my_union u;
my_union* up = &u;
....
}
```

How much user-accessible space is allocated on the stack and/or the heap by each of the following statements? Assume x86-64.

QUESTION DATAREP-17A. `typedef union ... my_union;`

QUESTION DATAREP-17B. `char* p = malloc(sizeof(char*));`

QUESTION DATAREP-17C. `my_union u;`

QUESTION DATAREP-17D. `my_union* up = &u;`

## DATAREP-18. Data representation: ENIAC

Professor Kohler has been busy! He'd developed Eddie's NIfty Awesome Computer (ENIAC). When he built the C compiler for ENIAC, he assigned the following sizes and alignments to C’s fundamental data types. (Assume that every other fundamental type has the same size and alignment as one of these.)

Type `sizeof` `alignof`
`char` 1 1
`char*` 16 16 Same for any pointer
`short` 4 4
`int` 8 8
`long` 16 16
`long long` 32 32
`float` 16 16
`double` 32 32

QUESTION DATAREP-18A. This set of sizes is valid: it obeys all the requirements set by C's abstract machine. Give one different size assignment that would make the set as a whole invalid.

QUESTION DATAREP-18B. What alignment must the ENIAC malloc guarantee?

For the following two questions, assume the following struct on the ENIAC:

```struct s {
char f1[7];
char *f2;
short f3;
int f4;
};
```

QUESTION DATAREP-18C. What is `sizeof(struct s)`?

QUESTION DATAREP-18D. What is `alignof(struct s)`?

The remaining questions refer to this structure definition:

```// This include file defines a struct inner, but you do not know anything
// about that structure, just that it exists.
#include "inner.h"

struct outer {
char f1[3];
struct inner f2;
short f3;
int f4;
};
```

Indicate for each statement whether the statement is always true, possibly true, or never true on the ENIAC.

QUESTION DATAREP-18E: `sizeof(struct outer) > sizeof(struct inner)` (Always / Possibly / Never)

QUESTION DATAREP-18F: `sizeof(struct outer)` is a multiple of `sizeof(struct inner)` (Always / Possibly / Never)

QUESTION DATAREP-18G: `alignof(struct outer) > alignof(struct inner)` (Always / Possibly / Never)

QUESTION DATAREP-18H: `sizeof(struct outer) - sizeof(struct inner) < 4` (Always / Possibly / Never)

QUESTION DATAREP-18I: `sizeof(struct outer) - sizeof(struct inner) > 32` (Always / Possibly / Never)

QUESTION DATAREP-18J: `alignof(struct inner) == 2` (Always / Possibly / Never)

## DATAREP-19. Undefined behavior

Which of the following expressions, instruction sequences, and code behaviors cause undefined behavior? For each question, write Defined or Undefined. (Note that the `INT_MAX` and `UINT_MAX` constants have types `int` and `unsigned`, respectively.)

QUESTION DATAREP-19A. `INT_MAX + 1` (Defined / Undefined)

QUESTION DATAREP-19B. `UINT_MAX + 1` (Defined / Undefined)

QUESTION DATAREP-19C.

```movq \$0x7FFFFFFFFFFFFFFF, %rax
```

(Defined / Undefined)

QUESTION DATAREP-19D. Failed memory allocation, i.e., `malloc` returns `NULL` (Defined / Undefined)

QUESTION DATAREP-19E. Use-after-free (Defined / Undefined)

QUESTION DATAREP-19F. Here are two functions and a global variable:

```const char string[128] = ".......";
return string[n];
}
int f(int i) {
if (i & 0x40) {
} else {
return i * 2;
}
}
```

C’s undefined behavior rules would allow an aggressive optimizing compiler to simplify the code generated for `f`. Fill in the following function with the simplest C code you can, under the constraint that an aggressive optimizing compiler might generate the same object code for `f` and `f_simplified`.

```int f_simplified(int i) {

}
```

## DATAREP-20. Bit manipulation

It’s common in systems code to need to switch data between big-endian and little-endian representations. This is because networks represent multi-byte integers using big-endian representation, whereas x86-family processors store multi-byte integers using little-endian representation.

QUESTION DATAREP-20A. Complete this function, which translates an integer from big-endian representation to little-endian representation by swapping bytes. For instance, `big_to_little(0x01020304)` should return `0x04030201`. Your return statement must refer to the `u.c` array, and must not refer to `x`.

```unsigned big_to_little(unsigned x) {
union {
unsigned intval;
unsigned char c[4];
} u;
u.intval = x;

return ______________________________________;
}
```

QUESTION DATAREP-20B. Complete the function again, but this time write a single expression that refers to `x` (you may refer to `x` multiple times, of course).

```unsigned big_to_little(unsigned x) {

return ______________________________________;
}
```

QUESTION DATAREP-20C. Now write the function `little_to_big`, which will translate a little-endian integer into big-endian representation. You may introduce helper variables or even call `big_to_little` if that’s helpful.

```unsigned little_to_big(unsigned x) {

}
```

## DATAREP-21. Computer arithmetic

Bitwise operators and computer arithmetic can represent vectors of bits, which in turn are useful for representing sets. For example, say we have a function `bit` that maps elements to distinct bits; thus, `bit(X) == (1 << u)` for some `u`. Then a set {X0, X1, X2, …, Xn} can be represented as `bit(X0) | bit(X1) | bit(X2) | … | bit(Xn)`. Element Xi is in the set with representation `n` if and only if `(bit(Xi) & n) != 0`.

QUESTION DATAREP-21A. What is the maximum number of set elements that can be represented in a single `unsigned` variable on an x86 machine?

QUESTION DATAREP-21B. Match each set operation with the C operator(s) that could implement that operation. (Complement is a unary operation.)

 intersection `==` equality `~` complement `&` union `^` toggle membership(flip whether an element is in the set) `|`

QUESTION DATAREP-21C. Complete this function, which should return the set difference between the sets with representations `a` and `b`. This is the set containing exactly those elements of set `a` that are not in set `b`.

```unsigned set_difference(unsigned a, unsigned b) {

```

QUESTION DATAREP-21D. Below we’ve given a number of C expressions, some of their values, and some of their set representations for a set of elements. For example, the first row says that the integer value of expression `0` is just 0, which corresponds to an empty set. Fill in the blanks. This will require figuring out which bits correspond to the set elements `A`, `B`, `C`, and `D`, and the values for the 32-bit `int` variables `a`, `x`, and `s`. No arithmetic operation overflows; `abs(x)` returns the absolute value of `x` (that is, `x < 0 ? -x : x`).

Expression `e` Integer value Represented set
`0` 0 `{}`
`a == a` ______________ `{A}`
`(unsigned) ~a < (unsigned) a` ______________ `{A}`
`a < 0` ______________ ______________
`(1 << (s/2)) - 1` ______________ `{A,B,C,D}`
`a * a` ______________ `{C}`
`abs(a)` ______________ ______________
`x & (x - 1)` ______________ `{}`
`x - 1` ______________ `{A,D}`
`x` ______________ ______________
`s` ______________ ______________

## DATAREP-22. Bit Tac Toe

Brenda Bitdiddle is implementing tic-tac-toe using bitwise arithmetic. (If you’re unfamiliar with tic-tac-toe, see below.) Her implementation starts like this:

```typedef struct {
unsigned moves[2];
} tictactoe;
#define XS 0
#define OS 1

void tictactoe_init(tictactoe* b) {
b->moves[XS] = b->moves[OS] = 0;
}

static const unsigned ttt_values[3][3] = {
{ 0x001, 0x002, 0x004 },
{ 0x010, 0x020, 0x040 },
{ 0x100, 0x200, 0x400 }
};

// Mark a move by player `p` at row `row` and column `col`.
// Return 0 on success; return –1 if position `row,col` has already been used.
int tictactoe_move(tictactoe* b, int p, int row, int col) {
1.     assert(row >= 0 && row < 3 && col >= 0 && col < 3);
2.     assert(p == XS || p == OS);
3.     /* TODO: check for position reuse */
4.     b->moves[p] |= ttt_values[row][col];
5.     return 0;
}
```

Each position on the board is assigned a distinct bit.

Tic-tac-toe, also known as noughts and crosses, is a simple paper-and-pencil game for two players, X and O. The board is a 3x3 grid. The players take turns writing their symbol (X or O) in an empty square on the grid. The game is won when one player gets their symbol in all three squares in one of the rows, one of the columns, or one of the two diagonals. X goes first; played perfectly, the game always ends in a draw.

QUESTION DATAREP-22A. Brenda’s current code doesn’t check whether a move reuses a position. Write a snippet of C code that returns –1 if an attempted move is reusing a position. This snippet will replace line 3.

QUESTION DATAREP-22B. Complete the following function. You may use the following helper function:

`int popcount(unsigned n)`
Return the number of 1 bits in `n`. (Stands for “population count”; is implemented on recent x86 processors by a single instruction, `popcnt`.)

For full credit, your code should consist of a single “`return`” statement with a simple expression, but for substantial partial credit write any correct solution.

```// Return the number of moves that have happened so far.
int tictactoe_nmoves(const tictactoe* b) {

}
```

QUESTION DATAREP-22C. Write a simple expression that, if nonzero, indicates that player `XS` has a win on board `b` across the main diagonal (has marks in positions `0,0`, `1,1`, and `2,2`).

Lydia Davis notices Brenda’s code and has a brainstorm. “If you use different values,” she suggests, “it becomes easy to detect any win.” She suggests:

```static const unsigned ttt_values[3][3] = {
{ 0x01001001, 0x00010002, 0x10100004 },
{ 0x00002010, 0x22020020, 0x00200040 },
{ 0x40004100, 0x00040200, 0x04400400 }
};
```

QUESTION DATAREP-22D. Repeat part A for Lydia’s values: Write a snippet of C code that returns –1 if an attempted move is reusing a position. This snippet will replace line 3 in Brenda’s code.

QUESTION DATAREP-22E. Repeat part B for Lydia’s values: Use `popcount` to complete `tictactoe_nmoves`.

```int tictactoe_nmoves(const tictactoe* b) {

}
```

QUESTION DATAREP-22F. Complete the following function for Lydia’s values. For full credit, your code should consist of a single “`return`” statement containing exactly two constants, but for substantial partial credit write any correct solution.

```// Return nonzero if player `p` has won, 0 if `p` has not won.
int tictactoe_check_win(const tictactoe* b, int p) {
assert(p == XS || p == OS);

}
```

## DATAREP-23. Memory and Pointers

Two processes are mapping a file into their address space. The mapped file contains an unsorted linked list of integers. As the processes cannot ensure that the file will be mapped at the same virtual address, they use relative pointers to link elements in the list. A relative pointer holds not an address, but an offset that user code can use to calculate a true address. Our processes define the offset as relative to the start of the file.

Thus, each element in the linked list is represented by the following structure:

```struct ll_node {
int value;
size_t offset;
};
```

`offset == (size_t) -1` indicates the end of the list. Other `offset` values represent the position of the next item in the list, calculated relative to the start of the file.

QUESTION DATAREP-23A. Write a function to find an item in the list. The function's prototype is:

```struct ll_node* find_element(void* mapped_file, struct ll_node* list, int value);
```

The `mapped_file` parameter is the address of the mapped file data; the `list` parameter is a pointer to the first node in the list; and the `value` parameter is the value for which we are searching. The function should return a pointer to the linked list element if the value appears in the list or NULL if the value is not in the list.

## ASM-1. Disassemble

Here’s some assembly produced by compiling a C program.

```        .globl  f
.align  16, 0x90
.type   f,@function
f:
movl    \$1, %r8d
jmp     .LBB0_1
.LBB0_6:
incl    %r8d
.LBB0_1:
movl    %r8d, %ecx
imull   %ecx, %ecx
movl    \$1, %edx
.LBB0_2:
movl    %edx, %edi
imull   %edi, %edi
movl    \$1, %esi
.align  16, 0x90
.LBB0_3:
movl    %esi, %eax
imull   %eax, %eax
cmpl    %ecx, %eax
je      .LBB0_7
cmpl    %edx, %esi
leal    1(%rsi), %eax
movl    %eax, %esi
jl      .LBB0_3
cmpl    %r8d, %edx
leal    1(%rdx), %eax
movl    %eax, %edx
jl      .LBB0_2
jmp     .LBB0_6
.LBB0_7:
pushq   %rax
.Ltmp0:
movl    \$.L.str, %edi
xorl    %eax, %eax
callq   printf
movl    \$1, %eax
popq    %rcx
retq

.type   .L.str,@object
.L.str:
.asciz  "%d %d\n"
.size   .L.str, 7
```

QUESTION ASM-1A. How many arguments might this function have? Circle all that apply.

1. 0
2. 1
3. 2
4. 3 or more

QUESTION ASM-1B. What might this function return? Circle all that apply.

1. 0
2. 1
3. −1
4. Its first argument, whatever that argument is
5. A square number other than 0 or 1
6. None of the above

QUESTION ASM-1C. Which callee-saved registers does this function save and restore? Circle all that apply.

1.  %rax
2.  %rbx
3.  %rcx
4.  %rdx
5.  %rbp
6.  %rsi
7.  %rdi
8. None of the above

QUESTION ASM-1D. This function handles signed integers. If we changed the C source to use unsigned integers instead, which instructions would change? Circle all that apply.

1. `movl`
2. `imull`
3. `addl`
4. `cmpl`
5. `je`
6. `jl`
7. `popq`
8. None of the above

QUESTION ASM-1E. What might this function print? Circle all that apply.

1. `0 0`
2. `1 1`
3. `3 4`
4. `4 5`
5. `6 8`
6. None of the above

## ASM-2. Assembly

Here is some x86 assembly code.

``` f:
movl a, %eax
movl b, %edx
andl \$255, %edx
subl %edx, %eax
movl %eax, a
retq
```

QUESTION ASM-2A. Write valid C code that could have compiled into this assembly (i.e., write a C definition of function `f`), given the global variable declarations “`extern unsigned a, b;`.” Your C code should compile without warnings. REMINDER: You are not permitted to run a C compiler, except for the C compiler that is your brain.

QUESTION ASM-2B. Write different valid, warning-free C code that could have compiled into that assembly. This version should contain different operators than your first version. (For extra credit, use only one operator.)

QUESTION ASM-2C. Again, write different valid, warning-free C code that could have compiled into that assembly. In this version, `f` should have a different type than in your first version.

## ASM-3. Assembly and Data Structures

For each code sample below, indicate the most likely type of the data being accessed. (If multiple types are equally likely, just pick one.)

QUESTION ASM-3A. `movzbl %al, %eax`

QUESTION ASM-3B. `movl -28(%rbp), %edx`

QUESTION ASM-3C. `movsbl -32(%rbp), %eax`

QUESTION ASM-3D. `movzwl -30(%rbp), %eax`

For each code sample below, indicate the most likely data structure being accessed (assume that `g_var` is a global variable). Be as specific as possible.

QUESTION ASM-3E. `movzwl 6(%rdx, %rax, 8), %eax`

QUESTION ASM-3F. `movl (%rdx, %rax, 4), %eax`

QUESTION ASM-3G.

```movzbl 4(%rax), %eax
movsbl %al, %eax
```

For the remaining questions, indicate for what values of the register contents will the jump be taken. QUESTION ASM-3H. xorl %eax, %eax

```jge LABEL
```

QUESTION ASM-3I. testb \$1, %eax

```jne
```

QUESTION ASM-3J. cmpl %edx, %eax

```jl LABEL
```

## ASM-4. Assembly language

The next four questions pertain to the following four code samples.

 f1 ```f1: subq \$8, %rsp call callfunc movl  %eax, %edx leal 1(%rax,%rax,2), %eax      testb \$1, %dl jne .L3 movl  %edx, %eax shrl \$31, %eax addl  %edx, %eax sarl  %eax .L3: addq \$8, %rsp ret ``` f3 ```f3: subq \$8, %rsp call callfunc subl \$97, %eax cmpb \$4, %al ja .L2 movzbl  %al, %eax jmp *.L4(,%rax,8) .L4: .quad .L3 .quad .L9 .quad .L6 .quad .L7 .quad .L8 .L3: movl \$42, %edx jmp .L5 .L6: movl \$4096, %edx jmp .L5 .L7: movl \$52, %edx jmp .L5 .L8: movl \$6440, %edx jmp .L5 .L2: movl \$0, %edx jmp .L5 .L9: movl \$61, %edx .L5: movl \$.LC0, %esi movl \$1, %edi movl \$0, %eax call __printf_chk addq \$8, %rsp ret .LC0: .string "Sum = %d\n" ``` f2 ```f2: pushq  %rbx xorl  %ebx, %ebx .L3: movl  %ebx, %edi addl \$1, %ebx call callfunc cmpl \$10, %ebx jne .L3 popq  %rbx ret ``` f4 ```f4: subq \$40, %rsp movl \$1, (%rsp) movl \$0, 16(%rsp) .L2: leaq 16(%rsp), %rsi     movq  %rsp, %rdi call callfunc movl 16(%rsp), %eax     cmpl  %eax, (%rsp) jne .L2 addq \$40, %rsp ret ``` Now answer the following questions. Pick the most likely sample; you will use each sample exactly once. QUESTION ASM-4A. Which sample contains a for loop? QUESTION ASM-4B. Which sample contains a switch statement? QUESTION ASM-4C. Which sample contains only an if/else construct? QUESTION ASM-4D. Which sample contains a while loop?

## ASM-5. Calling conventions: 6186

University Professor Helen Vendler is designing a poetic new processor, the 6186. Can you reverse-engineer some aspects of the 6186’s calling convention from its assembly?

Here’s a function:

```int f(int* a, unsigned b) {
extern int g(int x);
int index = g(a[2*b + 1]);
return a[index + b];
}
```

And here’s that function compiled into 6186 instructions.

```f:
sub \$24, %rsp
movq %ra, (%rsp)
mov %rb, %rx
shl \$1, %rx
movl (%ra, %rx, 4), %ra
call g
movq (%rsp), %ra
movl (%ra, %rr, 4), %ra
mov %ra, %rr
ret
```

6186 assembly syntax is based on x86-64 assembly, and like the x86-64, 6186 registers are 64 bits wide. However, the 6186 has a different set of registers. There are just five general-purpose registers, `%ra`, `%rb`, `%rr`, `%rx`, and `%ry`. (“[W]hen she tries to be deadly serious she is speaking under…constraint”.) The example also features the stack pointer register, `%rsp`.

Give brief explanations if unsure.

QUESTION ASM-5A. Which register holds function return values?

QUESTION ASM-5B. What is `sizeof(int)` on the 6186?

QUESTION ASM-5C. Which general-purpose register(s) must be callee-saved?

QUESTION ASM-5D. Which general-purpose register(s) must be caller-saved?

QUESTION ASM-5E. Which general-purpose register(s) might be callee-saved or caller-saved (you can’t tell which)?

QUESTION ASM-5F. Assuming the compiler makes function stack frames as small as possible given the calling convention, what is the alignment of stack frames?

QUESTION ASM-5G. Assuming that the 6186 supports the same addressing modes as the x86-64, write a single instruction that has the same effect on `%ra` as these three instructions:

```shl \$1, %rx
movl (%ra, %rx, 4), %ra
```

## ASM-6. Data structure assembly

Here are four assembly functions, `f1` through `f4`.

```f1:
pushq   %rbp
movq    %rsp, %rbp
testl   %esi, %esi
jle     LBB0_3
incl    %esi
LBB0_2:
movq    8(%rdi), %rdi
decl    %esi
cmpl    \$1, %esi
jg      LBB0_2
LBB0_3:
movl    (%rdi), %eax
popq    %rbp
retq```
```f2:
pushq   %rbp
movq    %rsp, %rbp
movslq  %esi, %rax
movq    (%rdi,%rax,8), %rcx
movl    (%rcx,%rax,4), %eax
popq    %rbp
retq```
```f3:
testl   %esi, %esi
jle     LBB2_3
incl    %esi
LBB2_2:
movl    %edx, %eax
andl    \$1, %eax
movq    8(%rdi,%rax,8), %rdi
sarl    %edx
decl    %esi
cmpl    \$1, %esi
jg      LBB2_2
LBB2_3:
movl    (%rdi), %eax
retq```
```f4:
movslq  %esi, %rax
movl    (%rdi,%rax,4), %eax
retq```

QUESTION ASM-6A. Each function returns a value loaded from some data structure. Which function uses which data structure?

1. Array
2. Array of pointers to arrays
4. Binary tree

QUESTION ASM-6B. The array data structure is an array of type T. Considering the code for the function that manipulates the array, which of the following types are likely possibilities for T? Circle all that apply.

1. `char`
2. `int`
3. `unsigned long`
4. `unsigned long long`
5. `char*`
6. None of the above

## ASM-7. Where’s Waldo?

In the following questions, we give you C code and a portion of the assembly generated by some compiler for that code. (Sometimes we blank out a part of the assembly.) The C code contains a variable, constant, or function called `waldo`, and a point in the assembly is marked with asterisks `***`. Your job is to find Waldo: write an assembly expression or constant that holds the value of `waldo` at the marked point. We’ve done the first one for you.

NON-QUESTION: Where’s Waldo?

```int identity(int waldo) {
return waldo;
}
```
```00000000004007f6 <identity>:
4007f6:       55                      push   %rbp
4007f7:       48 89 e5                mov    %rsp,%rbp
4007fa:       89 7d fc                mov    %edi,-0x4(%rbp)
4007fd:       8b 45 fc                mov    -0x4(%rbp),%eax
***
400800:       5d                      pop    %rbp
400801:       c3                      retq
```

ANSWER: `%edi`, `-0x4(%rbp)`, `%eax`, and `%rax` all hold the value of `waldo` at the marked point, so any of them is a valid answer. If the asterisks came before the first instruction, only `%edi` would work.

QUESTION ASM-7A: Where’s Waldo?

```int f1(int a, int b, int waldo, int d) {
if (a > b)
return waldo;
else
return d;
}

0000000000400802 <f1>:
***
400802:       55                      push   %rbp
400803:       48 89 e5                mov    %rsp,%rbp
400806:       89 7d fc                mov    %edi,-0x4(%rbp)
400809:       89 75 f8                mov    %esi,-0x8(%rbp)
40080c:       89 55 f4                mov    %edx,-0xc(%rbp)
40080f:       89 4d f0                mov    %ecx,-0x10(%rbp)
400812:       8b 45 fc                mov    -0x4(%rbp),%eax
400815:       3b 45 f8                cmp    -0x8(%rbp),%eax
400818:       7e 05                   jle    40081f <f1+0x1d>
40081a:       8b 45 f4                mov    -0xc(%rbp),%eax
40081d:       eb 03                   jmp    400822 <f1+0x20>
40081f:       8b 45 f0                mov    -0x10(%rbp),%eax
400822:       5d                      pop    %rbp
400823:       c3                      retq
```

QUESTION ASM-7B: Where’s Waldo?

```int int_array_get(int* a, int waldo) {
int x = a[waldo];
return x;
}

00000000004007d9 <int_array_get>:
INSTRUCTIONS OMITTED
***
4007dc:       8b 04 b7                mov    (%rdi,%rsi,4),%eax
4007df:       c3                      retq
```

QUESTION ASM-7C: Where’s Waldo?

```int matrix_get(int** matrix, int row, int col) {
int* waldo = matrix[row];
return waldo[col];
}

00000000004007e0 <matrix_get>:
4007e0:       48 63 f6                movslq %esi,%rsi
4007e3:       48 63 d2                movslq %edx,%rdx
***
4007e6:       ?? ?? ?? ??             mov    ??,%rax
4007ea:       8b 04 90                mov    (%rax,%rdx,4),%eax
4007ed:       c3                      retq
```

QUESTION ASM-7D: Where’s Waldo?

```int f5(int x) {
extern int waldo(int);
return waldo(x * 45);
}

0000000000400be0 <f5>:
***
400be0:       6b ff 2d                imul   \$0x2d,%edi,%edi
400be3:       eb eb                   jmp    400bd0
```

QUESTION ASM-7E: Where’s Waldo?

```int factorial(int waldo) {
if (waldo < 2)
return 1;
else
return waldo * factorial(waldo - 1);
}

0000000000400910 <factorial>:
400910:       83 ff 01                cmp    \$0x1,%edi
400913:       b8 01 00 00 00          mov    \$0x1,%eax
400918:       7e 13                   jle    .L2 <factorial+0x1b>
40091a:       [6 bytes of padding (a no-op instruction)]
.L1:            ***
400920:       0f af c7                imul   %edi,%eax
400923:       83 ef 01                sub    \$0x1,%edi
400926:       83 ff 01                cmp    \$0x1,%edi
400929:       75 f5                   jne    .L1 <factorial+0x10>
.L2: 40092b:       f3 c3                   repz retq
```

QUESTION ASM-7F: Where’s Waldo?

Currently using 32-bit assembly
```int binary_search(const char* needle, const char** haystack, unsigned sz) {
unsigned waldo = 0, r = sz;
while (waldo < r) {
unsigned m = waldo + ((r - waldo) >> 1);
if (strcmp(needle, haystack[m]) < 0)
r = m;
else if (strcmp(needle, haystack[m]) == 0)
waldo = r = m;
else
waldo = m + 1;
}
return waldo;
}

80484ab <binary_search>:
INSTRUCTIONS OMITTED
.L1: 80484c3:       89 fe                   mov    %edi,%esi
80484c5:       29 de                   sub    %ebx,%esi
80484c7:       d1 ee                   shr    %esi
80484cb:       8b 44 b5 00             mov    0x0(%ebp,%esi,4),%eax
80484cf:       89 44 24 04             mov    %eax,0x4(%esp)
80484d3:       8b 44 24 30             mov    0x30(%esp),%eax
80484d7:       89 04 24                mov    %eax,(%esp)
80484da:       e8 11 fe ff ff          call   80482f0 <strcmp@plt>
80484df:       85 c0                   test   %eax,%eax
80484e1:       78 09                   js     .L2 <binary_search+0x41>
80484e3:       85 c0                   test   %eax,%eax
80484e5:       74 13                   je     80484fa <binary_search+0x4f>
***
80484e7:       8d 5e 01                lea    0x1(%esi),%ebx
80484ea:       eb 02                   jmp    .L3 <binary_search+0x43>
.L2: 80484ec:       89 f7                   mov    %esi,%edi
.L3: 80484ee:       39 df                   cmp    %ebx,%edi
80484f0:       77 d1                   ja     .L1 <binary_search+0x18>
INSTRUCTIONS OMITTED
```

In the remaining questions, you are given assembly compiled from one of the above functions by a different compiler, or at a different optimization level. Your goal is to figure out what C code corresponds to the given assembly.

QUESTION ASM-7G:

Currently using 32-bit assembly

```804851d <waldo>:
804851d:       55                      push   %ebp
804851e:       89 e5                   mov    %esp,%ebp
8048520:       83 ec 18                sub    \$0x18,%esp
8048523:       83 7d 08 01             cmpl   \$0x1,0x8(%ebp)
8048527:       7f 07                   jg     8048530
8048529:       b8 01 00 00 00          mov    \$0x1,%eax
804852e:       eb 10                   jmp    8048540
8048530:       8b 45 08                mov    0x8(%ebp),%eax
8048533:       48                      dec    %eax
8048534:       89 04 24                mov    %eax,(%esp)
8048537:       e8 e1 ff ff ff          call   804851d
804853c:       0f af 45 08             imul   0x8(%ebp),%eax
8048540:       c9                      leave
8048541:       c3                      ret
```

What’s Waldo? Circle one.

 `f1``f5` `matrix_get``permutation_compare` `factorial``binary_search`

QUESTION ASM-7H:

Currently using 32-bit assembly

```8048425 <waldo>:
8048425:       55                      push   %ebp
8048426:       89 e5                   mov    %esp,%ebp
8048428:       8b 45 08                mov    0x8(%ebp),%eax
804842b:       3b 45 0c                cmp    0xc(%ebp),%eax
804842e:       7e 05                   jle    8048435 <waldo+0x10>
8048430:       8b 45 10                mov    0x10(%ebp),%eax
8048433:       eb 03                   jmp    8048438 <waldo+0x13>
8048435:       8b 45 14                mov    0x14(%ebp),%eax
8048438:       5d                      pop    %ebp
8048439:       c3                      ret
```

What’s Waldo? Circle one.

 `f1``f5` `matrix_get``permutation_compare` `factorial``binary_search`

QUESTION ASM-7I:

```00000000004008b4 <waldo>:
4008b4:       55                      push   %rbp
4008b5:       48 89 e5                mov    %rsp,%rbp
4008b8:       48 83 ec 10             sub    \$0x10,%rsp
4008bc:       89 7d fc                mov    %edi,-0x4(%rbp)
4008bf:       8b 45 fc                mov    -0x4(%rbp),%eax
4008c2:       6b c0 2d                imul   \$0x2d,%eax,%eax
4008c5:       89 c7                   mov    %eax,%edi
4008c7:       e8 9e 05 00 00          callq  400e6a
4008cc:       c9                      leaveq
4008cd:       c3                      retq
80484a1 <waldo>:
80484a1:       55                      push   %ebp
80484a2:       89 e5                   mov    %esp,%ebp
80484a4:       83 ec 18                sub    \$0x18,%esp
80484a7:       8b 55 08                mov    0x8(%ebp),%edx
80484aa:       89 d0                   mov    %edx,%eax
80484ac:       c1 e0 02                shl    \$0x2,%eax
80484b5:       c1 e0 02                shl    \$0x2,%eax
80484ba:       89 04 24                mov    %eax,(%esp)
80484bd:       e8 2b 01 00 00          call   80485ed
80484c2:       c9                      leave
80484c3:       c3                      ret
```

What’s Waldo? Circle one.

 `f1``f5` `matrix_get``permutation_compare` `factorial``binary_search`

## ASM-8. Disassembly I

Here’s some assembly produced by compiling a C program with `gcc`.

```.LC1:
.string "%d %d\n"

.globl  f
.type   f, @function
f:
pushq   %rbp
movl    \$1, %ecx
.L7:
movl    %ecx, %r8d
movl    \$1, %edx
imull   %ecx, %r8d
.L2:
movl    %edx, %esi
leal    (%rdx,%rcx), %edi
movl    \$1, %eax
imull   %edx, %esi
.L6:
cmpl    %edi, %eax
jg      .L10
movl    %eax, %r9d
imull   %eax, %r9d
cmpl    %r9d, %esi
je      .L3
incl    %eax
jmp     .L6
.L10:
incl    %edx
cmpl    %edx, %ecx
jge     .L2
incl    %ecx
jmp     .L7
.L3:
pushq   %rax
movl    \$.LC0, %esi
movl    \$1, %edi
xorl    %eax, %eax
call    __printf_chk
movl    \$1, %eax
popq    %rdx
popq    %rbp
ret```

QUESTION ASM-8A. How many arguments might this function have? Circle all that apply.

1. 0
2. 1
3. 2
4. 3 or more

QUESTION ASM-8B. What might this function return? Circle all that apply.

1. 0
2. 1
3. −1
4. Its first argument, whatever that argument is
5. A square number other than 0 or 1
6. None of the above

QUESTION ASM-8C. Of these registers, which are callee-saved registers that the function saves and restores? Circle all that apply.

1.  %rbx
2.  %rcx
3.  %rdx
4.  %rbp
5.  %rsi
6.  %rdi
7.  %r12
8. None of the above

QUESTION ASM-8D. This function handles signed integers. If we changed the C source to use unsigned integers instead, which instructions would change? Circle all that apply.

1. `movl`
2. `imull`
3. `addl`
4. `cmpl`
5. `je`
6. `jge`
7. `popq`
8. None of the above

QUESTION ASM-8E. What might this function print? Circle all that apply.

1. `0 0`
2. `1 1`
3. `3 4`
4. `4 5`
5. `6 8`
6. None of the above

## ASM-9. Disassembly II

The questions in this section concern a function called `ensmallen`, which has the following assembly.

```    ensmallen:
1.         movzbl  (%rsi), %edx
2.         testb   %dl, %dl
3.         movb    %dl, (%rdi)
4.         jne     .L22
5.         jmp     .L23
6. .L18:
8. .L22:
9.         movzbl  (%rsi), %eax
10.         cmpb    %dl, %al
11.         je      .L18
13.         testb   %al, %al
14.         movb    %al, (%rdi)
15.         je      .L23
16.         movl    %eax, %edx
17.         jmp     .L22
18. .L23:
19.         ret
```

QUESTION ASM-9A. How many arguments is this function likely to take? Give line numbers that helped you determine an answer.

QUESTION ASM-9B. Are the argument(s) pointers? Give line numbers that helped you determine an answer.

QUESTION ASM-9C. What type(s) are the argument(s) likely to have? Give line numbers that helped you determine an answer.

QUESTION ASM-9D. Write a likely signature for the function. Use return type `void`.

QUESTION ASM-9E. Write an alternate likely signature for the function, different from your last answer. Again, use return type `void`.

QUESTION ASM-9F. Which callee-saved registers does this function use? Give line numbers that helped you determine an answer.

QUESTION ASM-9G. The function has an “input” and an “output”. Give an “input” that would cause the CPU to jump from line 5 to label `.L23`, and describe what is placed in the “output” for that “input”.

QUESTION ASM-9H. Give an “input” for which the corresponding “output” is not a copy of the “input”. Your answer must differ from the previous answer.

QUESTION ASM-9I. Write C code corresponding to this function. Make it as compact as you can.

## ASM-10. Machine programming

Intel really messed up this time. They’ve released a processor, the Fartium Core Trio, where every instruction is broken except the ones on this list.

 1 `cmpq %rdi, %rsi` 2 `decq %rsi` 3 `incq %rax` 4 `je L1` 5 `jl L2` 6 `jmp L3` 7 `movl (%rdi,%rax,4), %edi` 8 `retq` 9 `xchgq %rax, %rcx` 10 `xorq %rax, %rax`

(In case you forgot, `xchgq` swaps two values—here, the values in two registers—without modifying condition codes.)

“So what if it’s buggy,” says Andy Grove; “it can still run programs.” For instance, he argues convincingly that this function:

``` void do_nothing(void) {
}
```

is implemented correctly by this Fartium instruction sequence:

``` retq
```

Your job is to implement more complex functions using only Fartium instructions. Your implementations must have the same semantics as the C functions, but may perform much worse than one might expect. You may leave off arguments and write instruction numbers (#1–10) or instruction names. Indicate where labels `L1–L3` point (if you need them). Assume that the Fartium Core Trio uses the normal x86-64 calling convention.

QUESTION ASM-10A.

``` int return_zero(void) {
return 0;
}
```

QUESTION ASM-10B.

``` int identity(int a) {
return a;
}
```

QUESTION ASM-10C.

``` void infinite_loop(void) {
while (1) {
/* do nothing */
}
}
```

QUESTION ASM-10D.

``` typedef struct point {
int x;
int y;
int z;
} point;

int extract_z(point* p) {
return p->z;
}
```

So much for the easy ones. Now complete one out of the following parts, or more than one for extra credit.

QUESTION ASM-10E.

``` long add(long a, long b) {
return a + b;
}
```

QUESTION ASM-10F.

``` int array_dereference(int* a, long i) {
return a[i];
}
```

## ASM-11. Program Layout

For the following questions, select the part(s) of memory from the list below that best describes where you will find the object.

1. heap
2. stack
3. between the heap and the stack
4. in a read-only data segment
5. in a text segment starting at address 0x08048000
6. in a read/write data segment
7. in a register

Assume the following code, compiled without optimization.

```#include <errno.h>
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// The following is copied from stdio.h for your reference
#define EOF (-1)

1    unsigned long
2    fib (unsigned long n)
3    {
4        if (n < 2)
5            return (n);
6        return (fib(n - 1) + fib(n - 2));
7    }
8
9    int
10    main(int argc, char *argv[])
11    {
12        extern int optind;
13        char ch;
14        unsigned long f, n;
15
16        /* Command line processing. */
17        while ((ch = getopt(argc, argv, "h")) != EOF)
18            switch (ch) {
19            case 'h':
20            case '?':
21            default:
22                return (usage());
23            }
24
25        argc -= optind;
26        argv += optind;
27
28        if (argc != 1)
29            return (usage());
30
31        n = strtoul(strdup(argv[0]), NULL, 10);
32        if (n == 0 && errno == EINVAL)
33            return (usage());
34
35        /* Now call one of the fib routines. */
36        f = fib(n);
37        printf("fib(%lu) = %lu\n", n, f);
38
39        return (0);
40    }
```

QUESTION ASM-11A. The string `"fib(%lu) = %lu\n"` (line 37).

QUESTION ASM-11B. `optind` (line 25).

QUESTION ASM-11C. When executing at line 4, where you will find the address to which `fib` returns.

QUESTION ASM-11D. Where will you find the value of EOF that is compared to the return value of getopt in line 17.

QUESTION ASM-11E. `getopt` (line 17)

QUESTION ASM-11F. `fib` (lines 1-7)

QUESTION ASM-11G. the variable `f` (line 36)

QUESTION ASM-11H. the string being passed to `strtoul` (line 31)

QUESTION ASM-11I. `strdup` (line 31)

QUESTION ASM-11J. The value of the `fib` function when we return from `fib` (line 6).

## ASM-12. Assembly and Data Structures

Consider the following assembly function.

```func:
xorl    %eax, %eax
cmpb    \$0, (%rdi)
je      .L27
.L26:
cmpb    \$0, (%rdi)
jne     .L26
.L27:
retq
```

QUESTION ASM-12A. How many parameters does this function appear to have?

QUESTION ASM-12B. What do you suppose the type of that parameter is?

QUESTION ASM-12C. Write C code that corresponds to it.

## IO-1. I/O caching

Mary Ruefle, a poet who lives in Vermont, is working on her caching I/O library for CS61’s problem set 2. She wants to implement a cache with N slots. Since searching those slots might slow down her library, she writes a function that maps addresses to slots. Here’s some of her code.

``` #define SLOTSIZ 4096
typedef struct io61_slot {
char buf[SLOTSIZ];
off_t pos; // = (off_t) -1 for empty slots
ssize_t sz;
} io61_slot;

#define NSLOTS 64
struct io61_file {
int fd;
off_t pos; // current file position
io61_slot slots[NSLOTS];
};

static inline int find_slot(off_t off) {
return off % NSLOTS;
}

int slotindex = find_slot(f->pos);
io61_slot* s = &f->slots[slotindex];

if (f->pos < s->pos || f->pos >= s->pos + s->sz) {
// slot contains wrong data, need to refill it
off_t new_pos = lseek(f->fd, f->pos, SEEK_SET);
assert(new_pos != (off_t) -1); // only handle seekable files for now
ssize_t r = read(f->fd, s->buf, SLOTSIZ);
if (r == -1 || r == 0)
return EOF;
s->pos = f->pos;
s->sz = r;
}

int ch = (unsigned char) s->buf[f->pos - s->pos];
++f->pos;
return ch;
}
```

Before she can run and debug this code, Mary is led “to an emergency of feeling that ... results in a poem.” She’ll return to CS61 and fix her implementation soon, but in the meantime, let’s answer some questions about it.

QUESTION IO-1A. True or false: Mary’s cache is a direct-mapped cache.

QUESTION IO-1B. What changes to Mary’s code could change your answer to Part A? Circle all that apply.

1. New code for `find_slot` (keeping `io61_readc` the same)
2. New code in `io61_readc` (keeping `find_slot` the same)
3. New code in `io61_readc` and new code for `find_slot`
4. None of the above

QUESTION IO-1C. Which problems would occur when Mary’s code was used to sequentially read a seekable file of size 2MiB (2×220 = 2097152 bytes) one character at a time? Circle all that apply.

1. Excessive CPU usage (>10x stdio)
2. Many system calls to read data (>10x stdio)
3. Incorrect data (byte x read at a position where the file has byte yx)
6. Crash/undefined behavior
7. None of the above

QUESTION IO-1D. Which of these new implementations for `find_slot` would fix at least one of these problems with reading sequential files? Circle all that apply.

1. `return (off * 2654435761) % NSLOTS; /* integer hash function from Stack Overflow */`
2. `return (off / SLOTSIZ) % NSLOTS;`
3. `return off & (NSLOTS - 1);`
4. `return 0;`
5. `return (off >> 12) & 0x3F;`
6. None of the above

## IO-2. Caches and reference strings

QUESTION IO-2A. True or false: A direct-mapped cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.

QUESTION IO-2B. True or false: A fully-associative cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.

Consider the following 5 reference strings.

Name String
α 1
β 1, 2
γ 1, 2, 3, 4, 5
δ 2, 4
ε 5, 2, 4, 2

QUESTION IO-2C. Which of the strings might indicate a sequential access pattern? Circle all that apply.

 α β γ δ ε None of these

QUESTION IO-2D. Which of the strings might indicate a strided access pattern with stride >1? Circle all that apply.

 α β γ δ ε None of these

The remaining questions concern concatenated permutations of these five strings. For example, the permutation αβγδε refers to this reference string:

 1, 1, 2, 1, 2, 3, 4, 5, 2, 4, 5, 2, 4, 2.

We pass such permutations through an initially-empty, fully-associative cache with 3 slots, and observe the numbers of hits.

QUESTION IO-2E. How many cold misses might a permutation observe? Circle all that apply.

 0 1 2 3 4 5 Some other number

Under LRU eviction, the permutation αβεγδ observes 5 hits as follows. (We annotate each access with “h” for hit or “m” for miss.)

 1m; 1h, 2m; 5m, 2h, 4m, 2h; 1m, 2h, 3m, 4m, 5m; 2m, 4h.

QUESTION IO-2F. How many hits does this permutation observe under FIFO eviction?

QUESTION IO-2G. Give a permutation that will observe 8 hits under LRU eviction, which is the maximum for any permutation. There are several possible answers. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 7 hits, etc.)

QUESTION IO-2H. Give a permutation that will observe 2 hits under LRU eviction, which is the minimum for any permutation. There is one unique answer. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 3 hits, etc.)

## IO-3. Processor cache

The git version control system is based on commit hashes, which are 160-bit (20-byte) hash values used to identify commits. In this problem you’ll consider the processor cache behavior of several versions of a “grading server” that maps commits to grades. Here’s the first version:

``` typedef struct commit_info {
char hash[20];
} commit_info;

commit_info* commits;
size_t N;

int get_grade1(const char* hash, int pset) {
for (size_t i = 0; i != N; ++i)
if (memcmp(commits[i].hash, hash, 20) == 0)
return -1;
}
```

We will ask questions about the average number of cache lines accessed by variants of `get_grade(hash, pset)`. You should make the following assumptions:

• The `hash` argument is uniformly drawn from the set of known commits. That is, the probability that `hash` equals the ith commit’s hash is 1/`N`.
• Only count cache lines accessible via `commits`. Don’t worry about cache lines used for local variables, for parameters, for global variables, or for instructions. For instance, do not count the `hash` argument or the global-data cache line that stores the `commits` variable itself.
• Every object is 64-byte aligned, and no two objects share the same cache line.
• Commit hashes are mathematically indistinguishable from random numbers. Thus, the probability that two different hashes have the same initial k bits equals 1/2k.
• Fully correct answers would involve ceiling functions, but you don’t need to include them.

QUESTION IO-3A. What is the expected number of cache lines accessed by `get_grade1`, in terms of `N`?

The second version:

``` typedef struct commit_info {
char hash[20];
} commit_info;

commit_info** commits;
size_t N;

int get_grade2(const char hash[20], int pset) {
for (size_t i = 0; i != N; ++i)
if (memcmp(commits[i]->hash, hash, 20) == 0)
return -1;
}
```

QUESTION IO-3B. What is the expected number of cache lines accessed by `get_grade2`, in terms of `N`?

The third version:

``` typedef struct commit_info {
char hash[20];
} commit_info;

typedef struct commit_hint {
char hint[4];
commit_info* commit;
} commit_hint;

commit_hint* commits;
size_t N;

int get_grade3(const char* hash, int pset) {
for (size_t i = 0; i != N; ++i)
if (memcmp(commits[i].hint, hash, 4) == 0
&& memcmp(commits[i].commit->hash, hash, 20) == 0)
return -1;
}
```

QUESTION IO-3C. What is the expected number of cache lines accessed by `get_grade3`, in terms of `N`? (You may assume that `N`≤2000.)

The fourth version is actually a hash table.

``` typedef struct commit_info {
char hash[20];
} commit_info;

commit_info** commits;
size_t commits_hashsize;

int get_grade4(const char* hash, int pset) {
// choose initial bucket
size_t bucket;
memcpy(&bucket, hash, sizeof(bucket));
bucket = bucket % commits_hashsize;
// search for the commit starting from that bucket
while (commits[bucket] != NULL) {
if (memcmp(commits[bucket]->hash, hash, 20) == 0)
bucket = (bucket + 1) % commits_hashsize;
}
return -1;
}
```

QUESTION IO-3D. Assume that a call to `get_grade4` encounters `C` expected hash collisions (i.e., examines `C` buckets before finding the bucket that actually contains `hash`). What is the expected number of cache lines accessed by `get_grade4`, in terms of `N` and `C`?

## IO-4. IO caching and strace

Elif Batuman is investigating several program executables left behind by her ex-roommate Fyodor. She runs each executable under `strace` in the following way:

```strace -o strace.txt ./EXECUTABLE files/text1meg.txt > files/out.txt
```

Help her figure out properties of these programs based on their system call traces.

QUESTION IO-4A. Program `./mysterya`:

```open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x8193000
brk(0x81b5000)                          = 0x81b5000
write(1, "A", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, "A", 1)                        = 1
write(1, "'", 1)                        = 1
write(1, "s", 1)                        = 1
...
```

Circle at least one option in each column.

 Sequential IO Reverse sequential IO Strided IO No read cache Unaligned read cache Aligned read cache No write cache Write cache Cache size 4096 Cache size 2048 Cache size 1024 Other

QUESTION IO-4B. Program `./mysteryb`:

```open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x96c5000
brk(0x96e6000)                          = 0x96e6000
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
...
```

Circle at least one option in each column.

 Sequential IO Reverse sequential IO Strided IO No read cache Unaligned read cache Aligned read cache No write cache Write cache Cache size 4096 Cache size 2048 Cache size 1024 Other

QUESTION IO-4C. Program `./mysteryc`:

```open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9064000
brk(0x9085000)                          = 0x9085000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1046528, SEEK_SET)             = 1046528
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1044480, SEEK_SET)             = 1044480
write(1, "ehR\neehR\naehR\ns'hR\nhR\nsdlonyeR\ns"..., 2048) = 2048
lseek(3, 1042432, SEEK_SET)             = 1042432
write(1, "\ns'nailitniuQ\nnailitniuQ\nnniuQ\ns"..., 2048) = 2048
lseek(3, 1040384, SEEK_SET)             = 1040384
write(1, "rP\ndilsymerP\ns'regnimerP\nregnime"..., 2048) = 2048
...
```

Circle at least one option in each column.

 Sequential IO Reverse sequential IO Strided IO No read cache Unaligned read cache Aligned read cache No write cache Write cache Cache size 4096 Cache size 2048 Cache size 1024 Other

QUESTION IO-4D. Program `./mysteryd`:

```open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9a0e000
brk(0x9a2f000)                          = 0x9a2f000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1048575, SEEK_SET)             = 1048575
lseek(3, 1048574, SEEK_SET)             = 1048574
lseek(3, 1048573, SEEK_SET)             = 1048573
...
lseek(3, 1046528, SEEK_SET)             = 1046528
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1046527, SEEK_SET)             = 1046527
lseek(3, 1046526, SEEK_SET)             = 1046526
...
```

Circle at least one option in each column.

 Sequential IO Reverse sequential IO Strided IO No read cache Unaligned read cache Aligned read cache No write cache Write cache Cache size 4096 Cache size 2048 Cache size 1024 Other

QUESTION IO-4E. Program `./mysterye`:

```open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x93e5000
brk(0x9407000)                          = 0x9407000
...
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 1024) = 1024
...
```

Circle at least one option in each column.

 Sequential IO Reverse sequential IO Strided IO No read cache Unaligned read cache Aligned read cache No write cache Write cache Cache size 4096 Cache size 2048 Cache size 1024 Other

QUESTION IO-4F. Program `./mysteryf`:

```open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9281000
brk(0x92a3000)                          = 0x92a3000
write(1, "A", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, "A", 1)                        = 1
...
write(1, "A", 1)                        = 1
write(1, "l", 1)                        = 1
write(1, "t", 1)                        = 1
write(1, "o", 1)                        = 1
write(1, "n", 1)                        = 1
...
```

Circle at least one option in each column.

 Sequential IO Reverse sequential IO Strided IO No read cache Unaligned read cache Aligned read cache No write cache Write cache Cache size 4096 Cache size 2048 Cache size 1024 Other

## IO-5. Processor cache

The following questions use the following C definition for an `N`x`M` matrix (the matrix has `N` rows and `M` columns).

```typedef struct matrix {
unsigned N;
unsigned M;
double elt[0];
} matrix;

matrix* matrix_create(unsigned N, unsigned M) {
matrix* m = (matrix*) malloc(sizeof(matrix) + N * M * sizeof(double));
m->N = N;
m->M = M;
for (size_t i = 0; i < N * M; ++i) {
m->elt[i] = 0.0;
}
return m;
}
```

Typically, matrix data is stored in row-major order: element mij (at row i and column j) is stored in `m->elt[i*m->M + j]`. We might write this in C using an inline function:

```inline double* melt1(matrix* m, unsigned i, unsigned j) {
return &m->elt[i * m->M + j];
}
```

But that’s not the only possible method to store matrix data. Here are several more.

```inline double* melt2(matrix* m, unsigned i, unsigned j) {
return &m->elt[i + j * m->N];
}

inline double* melt3(matrix* m, unsigned i, unsigned j) {
return &m->elt[i + ((m->N - i + j) % m->M) * m->N];
}

inline double* melt4(matrix* m, unsigned i, unsigned j) {
return &m->elt[i + ((i + j) % m->M) * m->N];
}

inline double* melt5(matrix* m, unsigned i, unsigned j) {
assert(m->M % 8 == 0);
unsigned k = (i/8) * (m->M/8) + (j/8);
return &m->elt[k*64 + (i % 8) * 8 + j % 8];
}
```

QUESTION IO-5A. Which method (of `melt1``melt5`) will have the best processor cache behavior if most matrix accesses use loops like this?

```for (unsigned j = 0; j < 100; ++j) {
for (unsigned i = 0; i < 100; ++i) {
f(*melt(m, i, j));
}
}
```

QUESTION IO-5B. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

```for (unsigned i = 0; i < 100; ++i) {
f(*melt(m, i, i));
}
```

QUESTION IO-5C. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

```for (unsigned i = 0; i < 100; ++i) {
for (unsigned j = 0; j < 100; ++j) {
f(*melt(m, i, j));
}
}
```

QUESTION IO-5D. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

```for (int di = -3; di <= 3; ++di) {
for (int dj = -3; dj <= 3; ++dj) {
f(*melt(m, I + di, J + dj));
}
}
```

QUESTION IO-5E. Here is a matrix-multiply function in ikj order.

```matrix* matrix_multiply(matrix* a, matrix* b) {
assert(a->M == b->N);
matrix* c = matrix_create(a->N, b->M);
for (unsigned i = 0; i != a->N; ++i) {
for (unsigned k = 0; k != a->M; ++k) {
for (unsigned j = 0; j != b->M; ++j) {
*melt(c, i, j) += *melt(a, i, k) * *melt(b, k, j);
}
}
}
}
```

This loop order is cache-optimal when data is stored in `melt1` order. What loop order is cache-optimal for `melt2`?

QUESTION IO-5F. You notice that accessing a matrix element using `melt1` is very slow. After some debugging, it seems like the processor on which you are running code has a very slow multiply instruction. Briefly describe a change to `struct matrix` that would let you write a version of `melt1` with no multiply instruction. You may add members, change sizes, or anything you like.

## IO-6. Caching

Assume that we have a cache that holds four pages. Assume that each letter below indicates an access to a page. Answer the following questions as they pertain to the following sequence of accesses.

```E D C B A E D A A A B C D E
```

QUESTION IO-6A. What is the hit rate assuming an LRU replacement policy?

QUESTION IO-6B. What pages will you have in the cache at the end of the run?

QUESTION IO-6C. What is the best possible hit rate attainable if you could see into the future?

## IO-7. Caching

Intel and CrossPoint have announced a new persistent memory technology with performance approaching that of DRAM. Your job is to calculate some performance metrics to help system architectects decide how to best incorporate this new technology into their platform.

Let's say that it takes 64ns to access one (32-bit) word of main memory (DRAM) and 256ns to access one (32-bit) word of this new persistent memory, which we'll call NVM (non-volatile memory). The block size of the NVM is 256 bytes. The NVM designers are quite smart and although it takes a long time to access the first byte, when you are accessing NVM sequentially, the devices perform read ahead and stream data efficiently -- at 32 GB/second, which is identical to the bandwidth of DRAM.

QUESTION IO-7A. Let's say that we are performing random accesses of 32 bits (on a 32-bit processor). What fraction of the accesses must be to main memory (as opposed to NVM) to achieve performance within 10% of DRAM?

QUESTION IO-7B. Let's say that they write every byte of a 256 block in units of 32 bits. How much faster will write-back cache perform relative to a write-through cache? (An approximate order of magnitude will be sufficient; showing work can earn partial credit.)

QUESTION IO-7C. Why might you not want to use a write-back cache?

## IO-8. Reference strings

The following questions concern the FIFO (First In First Out), LRU (Least Recently Used), and LFU (Least Frequently Used) cache eviction policies.

Your answers should refer to seven-item reference strings made up of digits in the range 0–9. An example answer might be “1231231”. In each case, the reference string is processed by a 3-slot cache that’s initially empty.

QUESTION IO-8A. Give a reference string that has a 1/7 hit rate in all three policies.

QUESTION IO-8B. Give a reference string that has a 6/7 hit rate in all three policies.

QUESTION IO-8C. Give a reference string that has different hit rates under LRU and LFU policies, and compute the hit rates.

String:

LRU hit rate:

LFU hit rate:

QUESTION IO-8D. Give a reference string that has different hit rates under FIFO and LRU policies, and compute the hit rates.

String:

FIFO hit rate:

LRU hit rate:

QUESTION IO-8E. Now let's assume that you know a reference string in advance. Given a 3-slot cache and the following reference string, what caching algorithm discussed in class and/or exercises would produce the best hit rate, and would would that hit rate be?

“12341425321521”

## IO-9. Caching: Access times and hit rates

Recall that x86-64 instructions can access memory in units of 1, 2, 4, or 8 bytes at a time. Assume we are running on an x86-64-like machine with 1024-byte cache lines. Our machine takes 32ns to access a unit if the cache hits, regardless of unit size. If the cache misses, an additional 8160ns are required to load the cache, for a total of 8192ns.

QUESTION IO-9A. What is the average access time per access to access all the data in a cache line as an array of 256 integers, starting from an empty cache?

QUESTION IO-9B. What unit size (1, 2, 4, or 8) minimizes the access time to access all data in a cache line, starting from an empty cache?

QUESTION IO-9C. What unit size (1, 2, 4, or 8) maximizes the hit rate to access all data in a cache line, starting from an empty cache?

## IO-10. Single-slot cache code

Donald Duck is working on a single-slot cache for reading. He’s using the `pos_tag`/`end_tag` representation, which is:

```struct io61_file {
int fd;
unsigned char cbuf[BUFSIZ];
off_t tag;      // file offset of first character in cache (same as before)
off_t end_tag;  // file offset one past last valid char in cache; end_tag - tag == old `csz`
off_t pos_tag;  // file offset of next char to read in cache; pos_tag - tag == old `cpos`
};
```

Here’s our solution code; in case you want to scribble, the code is copied in the appendix.

``` 1.  ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
2.      size_t pos = 0;
3.      while (pos != sz) {
4.          if (f->pos_tag < f->end_tag) {
5.              ssize_t n = sz - pos;
6.              if (n > f->end_tag - f->pos_tag)
7.                  n = f->end_tag - f->pos_tag;
8.              memcpy(&buf[pos], &f->cbuf[f->pos_tag - f->tag], n);
9.              f->pos_tag += n;
10.              pos += n;
11.          } else {
12.              f->tag = f->end_tag;
13.              ssize_t n = read(f->fd, f->cbuf, BUFSIZ);
14.              if (n > 0)
15.                  f->end_tag += n;
16.              else
17.                  return pos ? pos : n;
18.          }
19.      }
20.      return pos;
21.  }
```

Donald has ideas for “simplifying” this code. Specifically, he wants to try each of the following independently:

1. Replacing line 4 with “`if (f->pos_tag <= f->end_tag) {`”.
2. Removing lines 6–7.
3. Removing line 9.
4. Removing lines 16–17.

QUESTION IO-10A. Which simplifications could lead to undefined behavior? List all that apply or say “none.”

QUESTION IO-10B. Which simplifications could cause `io61_read` to loop forever without causing undefined behavior? List all that apply or say “none.”

QUESTION IO-10C. Which simplifications could lead to `io61_read` returning incorrect data in `buf`, meaning that the data read by a series of `io61_read` calls won’t equal the data in the file? List all that apply or say “none.”

QUESTION IO-10D. Chastened, Donald decides to optimize the code for a specific situation, namely when `io61_read` is called with a `sz` that is larger than `BUFSIZ`. He wants to add code after line 11, like so, so that fewer `read` system calls will happen for large `sz`:

```11.          } else if (sz - pos > BUFSIZ) {
// DONALD’S CODE HERE

11A.         } else {
12.              f->tag = f->end_tag;
....
```

Finish Donald’s code. Your code should maintain the relevant invariants between `tag`, `pos_tag`, `end_tag`, and the file position, but you need not keep `tag` aligned.

## MISC-1. Git

Edward Snowden is working on a CS61 problem set and he has some git questions.

QUESTION MISC-1A. The CS61 staff has released some new code. Which commands will help Edward get the code from code.seas.harvard.edu into his repository? Circle all that apply.

1. `git commit`
2. `git add`
3. `git push`
4. `git pull`

QUESTION MISC-1B. Edward has made some changes to his code. He hasn’t run git since making the changes. He wants to upload his latest version to code.seas.harvard.edu. Put the following git commands in an order that will accomplish this goal. You won’t necessarily use every command. You may add flags to a command (but you don’t have to). If you add flags, tell us what they are.

1. `git commit`
2. `git add`
3. `git push`
4. `git pull`

Edward Snowden’s partner, Edward Norton, has been working on the problem set also. They’ve been working independently.

At midnight on October 10, here’s how things stood. The `git log` for the partners’ shared code.seas.harvard.edu repository looked like this. The committer is listed in (parentheses).

``` 52d44ee Pset release. (kohler)
```

The `git log` for Snowden’s local repository:

``` 3246d07 Save Greenwald's phone number (snowden)
8633fbd Start work on a direct-mapped cache (snowden)
52d44ee Pset release. (kohler)
```

The `git log` for Norton’s local repository:

``` 81f952e try mmap (norton)
52d44ee Pset release. (kohler)
```

At noon on October 11, their shared code.seas.harvard.edu repository has this log:

``` d446e60 Increase cache size (snowden)
b677e85 use mmap on mmappable files (norton)
b46cfda Merge branch 'master' of code.seas.harvard.edu:~TheTrueHOOHA/cs61/TheTrueHOOHAs-cs61-psets.git \
(norton)
81f952e try mmap (norton)
3246d07 Save Greenwald's phone number (snowden)
8633fbd Start work on a direct-mapped cache (snowden)
52d44ee Pset release. (kohler)
```

QUESTION MISC-1C. Give an order for these commands that could have produced that log starting from the midnight October 10 state. You might not use every command, and you might use some commands more than once. Sample (incorrect) answer: “1 4 4 5 2.”

1. snowden: `git commit -a`
2. snowden: `git push`
3. snowden: `git pull`
4. norton: `git commit -a`
5. norton: `git push`
6. norton: `git pull`

QUESTION MISC-1D. In your answer to Part C, circle the step(s) where there might have been a merge conflict.

## MISC-2. Debugging

QUESTION MISC-2A. Match each tool or technique with a debugging situation for which it is well suited. Produce the best overall match that uses each situation exactly once.

 1. strace A. Investigating segmentation faults 2. gdb B. Finding memory leaks 3. valgrind --tool=memcheck C. Checking your assumptions and verifying invariants 4. printf statements D. Discovering I/O patterns 5. assert E. Displaying program state

## MISC-3. Pot Pourri

QUESTION MISC-3A. What does the following instruction place in %eax? `sarl \$31, %eax`

QUESTION MISC-3B. True/False: A direct-mapped cache with N slots can handle any reference string with < N distinct addresses with no misses except for compulsory misses.

QUESTION MISC-3C. What is 1 (binary) TB in hexadecimal?

QUESTION MISC-3D. Write the answer to the following in hexadecimal: `0xabcd + 12`

QUESTION MISC-3E. True/False: The garbage collector we discussed is conservative, because it only runs when we tell it to.

QUESTION MISC-3F. True/False: Given the definition int array[10] the following two expressions mean the same thing: `&array[4] and array + 4`.

QUESTION MISC-3G. Using the matrix multiply from lecture 12, in what order should you iterate over the indices i, j, and k to achieve the best performance.

QUESTION MISC-3H. True/False: fopen, fread, fwrite, and fclose are system calls.

QUESTION MISC-3I. Which do you expect to be faster (on the CS50 appliance): insertion sorting into a linked list of 1000 elements or into an array of 1000 elements?

QUESTION MISC-3J. What does the hardware do differently when adding signed versus unsigned numbers?