# Sample midterm solutions

### 1. Sizes and alignments

QUESTION 1A. True or false: For any type X, the size of X (`sizeof(X)`) is greater than or equal to the alignment of type X.

True

QUESTION 1B. True or false: For any type X, the size of `struct Y { X a; char newc; }` is greater than the size of X.

True

QUESTION 1C. True or false: For any types `A1`...`An` (with `n` ≥ 1), the size of `struct Y` is greater than the size of `struct X`, given:

```  struct X {     struct Y {
A1 a1;         A1 a1;
...            ...
An an;         An an;
};                char newc;
};
```

False (example: `A1 = int`, `A2 = char`)

QUESTION 1D. True or false: For any types `A1`...`An` (with `n` ≥ 1), the size of `struct Y` is greater than the size of `union X`, given:

```  union X {      struct Y {
A1 a1;         A1 a1;
...            ...
An an;         An an;
};             };
```

False (if `n = 1`)

### 3. Hello binary

This problem locates 8-bit numbers horizontally and vertically in the following 16x16 image. Black pixels represent 1 bits and white pixels represent 0 bits. For horizontal arrangements, the most significant bit is on the left as usual. For vertical arrangements, the most significant bit is on top.

Examples: The 8-bit number 15 (hexadecimal 0x0F, binary 0b00001111) is located horizontally at 3,4, which means X=3, Y=4.

• The pixel at 3,4 is white, which has bit value 0.
• 4,4 is white, also 0.
• 5,4 is white, also 0.
• 6,4 is white, also 0.
• 7,4 is black, which has bit value 1.
• 8,4, 9,4, and 10,4 are black, giving three more 1s.
• Reading them all off, this is 0b00001111, or 15.

15 is also located horizontally at 7,6.

The 8-bit number 0 is located vertically at 0,0. It is also located horizontally at 0,0 and 1,0.

The 8-bit number 134 (hexadecimal 0x86, binary 0b10000110) is located vertically at 8,4.

QUESTION 3A. Where is 3 located vertically? (All questions refer to 8-bit numbers.)

9,6

QUESTION 3B. Where is 12 located horizontally?

5,5

QUESTION 3C. Where is 255 located vertically?

14,3

### 4. Hello memory

Shintaro Tsuji wants to represent this image in computer memory. He stores it in an array of 16-bit unsigned integers:

`uint16_t cute[16];`

Row Y of the image is stored in integer `cute[Y]`.

QUESTION 4A. What is `sizeof(cute)`, 2, 16, 32, or 64?

32

QUESTION 4B. `printf("%d\n", cute[0]);` prints `16384`. Is Shintaro’s machine big-endian or little-endian?

Little-endian

### 6. Memory regions

Consider the following program:

```struct ptrs {
int **x;
int *y;
};

struct ptrs global;

void setup(struct ptrs *p) {
int *a = malloc(sizeof(int));
int *b = malloc(sizeof(int));
int *c = malloc(sizeof(int));
int *d = malloc(sizeof(int));
int *e = malloc(sizeof(int) * 2);
int **f = malloc(4 * sizeof(int *));
int **g = malloc(sizeof(int *));

*a = 0;
*b = 0;
*c = (int) a;
*d = *b;
e[0] = 29;
e[1] = (int) &d[100000];

f[0] = b;
f[1] = c;
f[2] = 0;
f[3] = 0;

*g = c;

global.x = f;
global.y = e;

p->x = g;
p->y = &e[1];
}

int main(int argc, char **argv) {
stack_bottom = (char *) &argc;
struct ptrs p;
setup(&p);
m61_collect();
do_stuff(&p);
}```

This program allocates objects `a` through `g` on the heap and then stores those pointers in some stack and global variables. (It then calls our conservative garbage collector from class, but that won’t matter until the next problem.) We recommend you draw a picture of the state `setup` creates.

QUESTION 6A. Assume that `(uintptr_t) a == 0x8300000`, and that `malloc` returns increasing addresses. Match each address to the most likely expression with that address value. The expressions are evaluated within the context of `main`. You will not reuse an expression.

Value   Expression
1.   0x8300040 A.   `&p`
2.   0x8049894 B.   `(int *) *p.x[0]`
3.   0x8361AF0 C.   `&global.y`
4.   0x8300000 D.   `global.y`
5.   0xBFAE0CD8 E.   `(int *) *p.y`

1—D; 2—C; 3—E; 4—B; 5—A

Since `p` has automatic storage duration, it is located on the stack, giving us 5—A. The `global` variable has static storage duration, and so does its component `global.y`; so the pointer `&global.y` has an address that is below all heap-allocated pointers. This gives us 2—C. The remaining expressions go like this:

 `global.y == e` `p.y == &e[1]`, so `*p.y == e[1] == (int) &d[100000]`, and `(int *) *p.y == &d[100000]` `p.x == g`, so `p.x[0] == g[0] == *g == c`, and `*p.x[0] == *c == (int) a`

Address #4 has value 0x8300000, which by assumption is `a`’s address; so 4—B. Address #3 is much larger than the other heap addresses, so 3—E. This leaves 1—D.

### 7. Garbage collection

Here is the top-level function for the conservative garbage collector we wrote in class (`cs61-lectures/l07/m61-13.c`).

```void m61_collect(void) {
char *stack_top = (char *) &stack_top;

// The entire contents of the heap start out unmarked
for (size_t i = 0; i != nmr; ++i)
mr[i].marked = 0;

// Mark all reachable objects, starting with the roots (the stack)
m61_markaccessible(stack_top, stack_bottom - stack_top);

// Free everything that wasn't marked
for (size_t i = 0; i != nmr; ++i)
if (mr[i].marked == 0) {
m61_free(mr[i].ptr);
--i;		// m61_free moved different data into this
// slot, so we must recheck the slot
}
}```

This garbage collector is not correct because it doesn’t capture all memory roots.

Consider the program from the previous section, and assume that an object is reachable if `do_stuff` can access an address within the object via variable references and memory dereferences without casts or pointer arithmetic. Then:

QUESTION 7A. Which reachable objects will `m61_collect()` free? Circle all that apply.

 `a` `b` `c` `d` `e` `f` `g` None of these

`b`, `f`.

The collector searches the stack for roots. This yields just the values in `struct ptrs p` (the only pointer-containing variable with automatic storage duration at the time `m61_collect` is called). The objects directly pointed to by `p` are `g` and `e`. The collector then recursively marks objects pointed to by these objects. From `g`, it finds `c`. From `e`, it finds nothing. Then it checks one more time. From `c`, it finds the value of `a`! Now, `a` is actually not a pointer here—the type of `*c` is `int`—so by the definition above, `a` is not actually reachable. But the collector doesn’t know this.

Putting it together, the collector marks `a`, `c`, `e`, and `g`. It won’t free these objects; it will free the others (`b`, `d`, and `f`). But `b` and `f` are reachable from `global`.

QUESTION 7B. Which unreachable objects will `m61_collect()` not free? Circle all that apply.

 `a` `b` `c` `d` `e` `f` `g` None of these

`a`

QUESTION 7C. Conservative garbage collection in C is often slower than precise garbage collection in languages such as Java. Why? Circle all that apply.

1. C is generally slower than other languages.
2. Conservative garbage collectors must search all reachable memory for pointers. Precise garbage collectors can ignore values that do not contain pointers, such as large character buffers.
3. C programs generally use the heap more than programs in other languages.
4. None of the above.

#2

### 9. Attack

QUESTION 9A. True or false: A successful stack smashing attack against a process can cause that process to execute a `cli` instruction.

True

QUESTION 9B. True or false: Buffer overflow attacks would be neutralized if the processor protected stack return addresses so that they couldn’t be overwritten.

False (there are heap overflow attacks)

QUESTION 9C. “Canary” values can help detect stack smashing attacks before the attacker’s code gains control. Before returning, a function compares one copy of the canary with a reserved copy; if they differ, the stack was smashed. Which properties are valuable in canary values? Circle all that apply.

2. One copy located close to the return address on the stack
3. Both copies located close to the return address on the stack
4. Inexpensive to compare
5. None of the above

#2 and #4

### 10. Memory errors

The following function constructs and returns a lower-triangular matrix of size N. The elements are random 2-dimensional points in the unit square. The matrix is represented as an array of pointers to arrays.

``` typedef struct point2 {
double d[2];
} point2;
typedef point2 *point2_vector;

point2_vector *make_random_lt_matrix(size_t N) {
point2_vector *m = (point2_vector *) malloc(sizeof(point2_vector) * N);
for (size_t i = 0; i < N; ++i) {
m[i] = (point2 *) malloc(sizeof(point2) * (i + 1));        /* LINE A */
for (size_t j = 0; j <= i; ++j)
for (int d = 0; d < 2; ++d)
m[i][j].d[d] = drand48();                          /* LINE B */
}
return m;
}
```

This code is running on an x86 machine (`size_t` is 32 bits). You may assume that the machine has enough free physical memory and the process has enough available virtual address space to satisfy any `malloc()` call.

QUESTION 10A. Give a value of N so that, while `make_random_lt_matrix(N)` is running, no `malloc()` returns `NULL`, but a memory error (such as a null pointer dereference or an out-of-bounds dereference) happens on Line A. The memory error should happen specifically when `i == 1`.

We are asked to produce a value of N so that no memory error happens on Line A when `i == 0`, but a memory error does happen when `i == 1`. So reason that through. What memory errors could happen on Line A if `malloc()` returns non-`NULL`? There’s only one memory operation, namely the dereference `m[i]`. Perhaps this dereference is out of bounds.

If no memory error happens when `i == 0`, then a `m[0]` dereference must not cause a memory error. So the `m` object must contain at least 4 bytes. But a memory error does happen on Line A when `i == 1`. So the `m` object must contain less than 8 bytes. How many bytes were allocated for `m`? `sizeof(point2_vector) * N == sizeof(point2 *) * N == 4 * N`. So we have:

• `(4 * N)` ≥ 4
• `(4 * N)` < 8

It seems like the only possible answer is `N == 1`. But no, this doesn’t cause a memory error, because the loop body would never be executed with `i == 1`!

The key insight is that the multiplications above use 32-bit unsigned computer arithmetic. Let’s write `N` as `X + 1`. Then these inequalities become:

• 4 ≤ (`4 * (X + 1)` = `4 * X + 4`) < 8
• 0 ≤ `(4 * X)` < 4

(Multiplication distributes over addition in computer arithmetic.) What values of `X` satisfy this inequality? It might be easier to see if we remember that multiplication by powers of two is equivalent to shifting:

• 0 ≤ `(X << 2)` < 4

The key insight is that this shift eliminates the top two bits of `X`. There are exactly four values for `X` that work: `0`, `0x40000000`, `0x80000000`, and `0xC0000000`. For any of these, `4 * X == 0` in 32-bit computer arithmetic, because 4×`X` = 0 (mod 232) in normal arithmetic.

Plugging `X` back in to `N`, we see that `N ∈ {0x40000001, 0x80000001, 0xC0000001}`. These are the only values that work.

Partial credit was awarded for values that acknowledged the possibility of overflow.

QUESTION 10B. Give a value of N so that no `malloc()` returns `NULL`, and no memory error happens on Line A, but a memory error does happen on Line B.

If no memory error happens on Line A, then `N` < 230 (otherwise overflow would happen as seen above). But a memory error does happen on Line B. Line B dereferences `m[i][j]`, for 0 ≤ `j``i`; so how big is `m[i]`? It was allocated on Line A with size `sizeof(point2) * (i + 1) == 2 * sizeof(double) * (i + 1) == 16 * (i + 1)`. If `i + 1` ≥ 232 / 16 = 228, this multiplication will overflow. Since `i < N`, we can finally reason that any `N` greater than or equal to 228 = `0x10000000` and less than 230 = `0x40000000` will cause the required memory error.

### 11. Caches and reference strings

QUESTION 11A. True or false: A direct-mapped cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.

False. Direct-mapped caches can have conflict misses.

QUESTION 11B. True or false: A fully-associative cache with N or more slots can handle any reference string containing ≤N distinct addresses with no misses except for cold misses.

True

Consider the following 5 reference strings.

Name String
α 1
β 1, 2
γ 1, 2, 3, 4, 5
δ 2, 4
ε 5, 2, 4, 2

QUESTION 11C. Which of the strings might indicate a sequential access pattern? Circle all that apply.

 α β γ δ ε None of these
(α), β, γ

QUESTION 11D. Which of the strings might indicate a strided access pattern with stride >1? Circle all that apply.

 α β γ δ ε None of these
(α), δ

One very clever person pointed out that β and γ could also represent large strides: for example, consider a file with 10 bytes accessed with stride 11!

The remaining questions concern concatenated permutations of these five strings. For example, the permutation αβγδε refers to this reference string:

 1, 1, 2, 1, 2, 3, 4, 5, 2, 4, 5, 2, 4, 2.

We pass such permutations through an initially-empty, fully-associative cache with 3 slots, and observe the numbers of hits.

QUESTION 11E. How many cold misses might a permutation observe? Circle all that apply.

 0 1 2 3 4 5 Some other number
5. The first time a reference string address is encountered, it must cause a cold miss.

Under LRU eviction, the permutation αβεγδ observes 5 hits as follows. (We annotate each access with “h” for hit or “m” for miss.)

 1m; 1h, 2m; 5m, 2h, 4m, 2h; 1m, 2h, 3m, 4m, 5m; 2m, 4h.

QUESTION 11F. How many hits does this permutation observe under FIFO eviction?

4 hits.

QUESTION 11G. Give a permutation that will observe 8 hits under LRU eviction, which is the maximum for any permutation. There are several possible answers. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 7 hits, etc.)

The following four permutations observe 8 hits under LRU: αβγδε, αβγεδ, βαγδε, βαγεδ. 28 permutations observe 7 hits; 25 observe 6 hits; and 38 observe 5 hits.

QUESTION 11H. Give a permutation that will observe 2 hits under LRU eviction, which is the minimum for any permutation. There is one unique answer. (Write your answer as a permutation of αβγδε. For partial credit, find a permutation that has 3 hits, etc.)

δαεγβ. 4 permutations observe 3 hits and 20 observe 4 hits.

### 12. Cache coherence

QUESTION 12A. True or false: The buffer cache is coherent with respect to disk data.

True. Though at times the buffer cache may contain newer data than the disk, that’s still OK.

QUESTION 12B. True or false: An x86 processor cache is coherent with respect to primary memory.

True. The reasoning is similar.

QUESTION 12C. True or false: The standard I/O package’s data cache is coherent with respect to file data.

False. Stdio cache data might be either newer or older than file data.