2017 Midterm Solutions
General overview
Assume a CS61 VM running the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.
1. Sizes and alignments (10 points)
Here’s a test struct with n members. Assume an x86-64 machine, where
each Ti
either is a basic x86-64 type (e.g., int
, char
,
double
) or is a type derived from such types (e.g., arrays, structs,
pointers, unions, possibly recursively), and assume that ai≤8 for all
i.
struct test {
T1 m1; // sizeof(T1) == s1, __alignof__(T1) == a1
T2 m2; // sizeof(T2) == s2, __alignof__(T2) == a2
...
Tn mn; // sizeof(Tn) == sn, __alignof__(Tn) == an
};
In these questions, you will compare this struct with other structs that have the same members, but in other orders.
QUESTION 1A. True or false: The size of struct test
is minimized
when its members are sorted by size. In other words, if
s1≤s2≤…≤sn, then sizeof(struct test)
is less than or equal to
the struct size for any other member order.
If true, briefly explain your answer; if false, give a counterexample
(i.e., concrete types for T1
, …, Tn
that do not minimize
sizeof(struct test)
).
False, because some data types can have large sizes but small
alignments. Consider T1 = char
, T2 = int
, T3 = char[5]
. On
x86-64 this has size 16 (T1 = 1 byte; 3 padding bytes; T2 = 4 bytes; T3
= 5 bytes; and finally 3 padding bytes so the result has size equal to a
multiple of the alignment, which is 4), but the order T3, T1, T2 would
have size 12.
QUESTION 1B. True or false: The size of struct test
is minimized
when its members are sorted by alignment. In other words, if
a1≤a2≤…≤an, then sizeof(struct test)
is less than or equal to
the struct size for any other member order.
If true, briefly explain your answer; if false, give a counterexample.
True. Padding is added only between objects with different alignments, and the amount of padding added is always smaller than the alignment of the second object. (As a special case, padding is added at the end of the struct to ensure the struct’s size is a multiple of its alignment, but this can be modeled in the same way.) Sorting by alignment therefore minimizes padding.
QUESTION 1C. True or false: The alignment of struct test
is
minimized when its members are sorted in increasing order by alignment.
In other words, if a1≤a2≤…≤an, then
__alignof__(struct test)
is less than or equal to the struct
alignment for any other member order.
If true, briefly explain your answer; if false, give a counterexample.
True. It’s all the same; alignment is max alignment of every component, and is independent of order.
QUESTION 1D. What is the maximum number of bytes of padding that
struct test
could contain for a given n? The answer will be a pretty
simple formula involving n. (Remember that ai≤8 for all i.)
Alternating char
and long
gives the most padding, which is
7*(n/2) when n is even and 7*(n+1)/2 otherwise.
QUESTION 1E. What is the minimum number of bytes of padding that
struct test
could contain for a given n?
0 :)
2. Undefined behavior (10 points)
QUESTION 2A. Sometimes a conforming C compiler can assume that a + 1 \> a
, and sometimes it can’t. For each type below, consider this
expression:
a + (int) 1 > a
and say whether the compiler:
- Must reject the expression as a type error.
- May assume that the expression is true (that
a + (int) 1 \> a
for alla
). - Must not assume that the expression is true.
int a
unsigned a
char\* a
unsigned char a
struct {int m;} a
- May assume, because signed integer overflow (
INT_MAX + 1
) is undefined behavior. - Must not assume, because
UINT_MAX + 1 == 0
without causing undefined behavior. - May assume, because pointer arithmetic overflow is undefined behavior.
- May assume, because the maximum
unsigned char
is 255, and255 + (int) 1 == (int) 256
. This is called integer promotion. - Must reject
QUESTION 2B. The following code checks its arguments for sanity, but not well: each check can cause undefined behavior.
void sanity_check(int* array, size_t array_size, int* ptr_into_array) {
if (array + array_size < array) {
` fprintf(stderr, "`array` is so big that it wraps around!\n"); `
abort();
}
if (ptr_into_array < array || ptr_into_array > array + array_size) {
` fprintf(stderr, "`ptr_into_array` doesn’t point into the array!\n"); `
abort();
}
...
Rewrite these checks to avoid all undefined behavior. You will likely
add one or more casts to uintptr_t
. For full credit, write each
check as a single comparison (no &&
or ||
, even though the current
ptr_into_array
check uses ||
).
array_size check: |
ptr_into_array check: |
array_size`: `(uintptr_t) array + 4 * array_size < (uintptr_t) array
ptr_into_array
:
(uintptr_t) ptr_into_array - (uintptr_t) array > 4 * array_size
where 4 is shorthand for sizeof(int)
.
QUESTION 2C. In Lecture datarep4, we discussed several ways to tell
if a signed integer x
is negative. One of them was the following:
int isnegative = (x & (1UL << (sizeof(x) * CHAR_BIT))) != 0;
But this is incorrect: it has undefined behavior. Correct it by adding two characters.
(x & (1UL << (sizeof(x) * CHAR_BIT `<b>`- 1`</b>`))) != 0
3. Memory errors and garbage collection (10 points)
Recall that a conservative garbage collector is a program that can automatically free dynamically-allocated memory by detecting when that memory is no longer referenced. Such a GC works by scanning memory for currently-referenced pointers, starting from stack and global memory, and recursing over each referenced object until all referenced memory has been scanned. We built a conservative garbage collector in lecture datarep6.
QUESTION 3A. An application program that uses conservative GC, and
does not call free
directly, will avoid certain errors and undefined
behaviors. Which of the following errors are avoided? List all that
apply.
- Use-after-free
- Double free
- Signed integer overflow
- Boundary write error
- Unaligned access
1, 2. Only the errors that relate to freeing memory are solved.
QUESTION 3B. Write a C program that leaks unbounded memory without GC, but does not do so with GC. You should need less than 5 lines. (Leaking “unbounded” memory means the program will exhaust the memory capacity of any machine on which it runs.)
while (1) {
(void) malloc(1);
}
QUESTION 3C. Not every valid C program works with a conservative GC, because the C abstract machine allows a program to manipulate pointers in strange ways. Which of the following pointer manipulations might cause the conservative GC from class to inappropriately free a memory allocation? List all that apply.
-
Storing the pointer in a
uintptr_t
variable -
Writing the pointer to a disk file and reading it back later
-
Using the least-significant bit of the pointer to store a flag: int* set_ptrflag(int* x, int flagval) { return (int*) ((uintptr_t) x | (flagval ? 1 : 0)); } int get_ptrflag(int* x) { return (uintptr_t) x & 1; } int deref_ptrflag(int* x) { return ((int) ((uintptr_t) x & ~1UL)); }
-
Storing the pointer in textual form: void save_ptr(char buf[100], void* p) { sprintf(buf, "%p", p); } void* restore_ptr(const char buf[100]) { void* p; sscanf(buf, "%p", &p); return p; }
(…option 5 on next page…)
-
Splitting the pointer into two parts and storing the parts in an array: typedef union { unsigned long ival; unsigned arr[2]; } value; value save_ptr(void* p) { value v; v.arr[0] = (uintptr_t) p & 0xFFFFFFFFUL; v.arr[1] = ((uintptr_t) p / 4294967296UL) & 0xFFFFFFFFUL; return v; }
2, 4 will obscure the pointer value from GC, thereby leading to a possible inappropriate free.
- 1 (storing the pointer in a
uintptr_t
variable) will not obscure the
pointer value because conservative GC is independent of type, and the
in-memory representation of a uintptr_t
is the same as the in-memory
representation of the pointer.
- 3 (the pointer flag) will not obscure the pointer value because our
conservative GC looks for pointers into objects, so if the flag is
set, the object will still be found. Also, since malloced pointers are
maximally aligned, we will never have a returned pointer from malloc
with the lower bit set.
- 5 (the split pointer) will not obscure the pointer value because x86-64 is
little-endian. The in-memory representation of value
will be the
same as the in-memory representation of p
.
4. Bitwise (6 points)
QUESTION 4A. Consider this C fragment:
uintptr_t x = ...;
uintptr_t r = 0;
if (a < b) {
r = x;
}
Or, shorter:
uintptr_t r = a < b ? x : 0;
Write a single expression that evaluates to the same value, but that
does not use the conditional ?: operator. You will use the fact that
a \< b
always equals 0 or 1. For full credit, do not use expensive
operators (multiply, divide, modulus).
For partial credit, (a \< b) \* x
works. For full credit:
-(uintptr_t) (a < b) & x
because -(uintptr_t) (a \< b)
equals (uintptr_t) 0b111111…1
iff
a \< b
.
QUESTION 4B. This function returns one more than the index of the least-significant 1 bit in its argument, or 0 if its argument is zero.
int ffs(unsigned long x) {
for (int i = 0; i < sizeof(x) * CHAR_BIT; ++i) {
if (x & (1UL << i)) {
return i + 1;
}
}
return 0;
}
This function runs in O(B) time, where B is the number of bits in
an unsigned long
. Write a version of ffs
that runs instead in
O(log B) time.
The idea we were looking for was binary search over the bits of x
.
Here’s a written-out version:
int ffs(unsigned long x) {
if (!x) {
return 0;
}
int ans = 1;
if (x & 0xFFFFFFFF00000000UL) {
ans += 32; x >>= 32;
}
if (x & 0xFFFF0000) {
ans += 16; x >>= 16;
}
if (x & 0xFF00) {
ans += 8; x >>= 8;
}
if (x & 0xF0) {
ans += 4; x >>= 4;
}
if (x & 0xC) {
ans += 2; x >>= 2;
}
return ans + (x & 0x2 ? 1 : 0);
}
We saw many similar answers.
5. Golden Baron (12 points)
A very rich person has designed a new x86-64-based computer, the Golden Baron Supercomputer 9000, and is paying you handsomely to write a C compiler for it. There’s just one problem. This person, like many very rich people, is dumb, and on their computer, odd-numbered memory addresses don’t work for data. When data is loaded into a general-purpose register from an odd-numbered address, the value read is zero. For example, consider the following instructions:
movl $0x01020304, a(%rip)
movl a(%rip), %eax
(where the address of a
is even). Executed on true x86-64, %rax
will hold the value 0x01020304; on Golden Baron, %rax
will hold
0x00020004.
But it is still possible to write a correct C compiler for this ungodly
hardware—you just have to work around the bad memory with code. You plan
to use two bytes of Golden Baron memory for every one byte of normal
x86-64 memory. For instance, an array int a[2] = {1, 0x0a0b0c0d};
would be stored in 16 bytes of memory, like so:
01 00 00 00 00 00 00 00 0d 00 0c 00 0b 00 0a 00
Pointer arithmetic, and moving multi-byte values to and from registers,
must account for the zero bytes that alternate with meaningful bytes. So
to read the correct value for a[2]
, the compiler must arrange to
read the bytes at addresses A+8
, A+10
, A+12
, and A+14
,
where A
is the address of the first byte in a
.
QUESTION 5A. What should printf("%zu\n", sizeof(char))
print on
Golden Baron?
1. The answer to this question is determined by the C abstract machine,
which requires that sizeof(char) == (size_t) 1
on every machine.
QUESTION 5B. This function
int f(signed char* c, size_t i) {
return c[i];
}
can compile to two instructions on x86-64, including retq
. It can
also compile to two instructions on Golden Baron. (We’re assuming that
memory used for Golden Baron instructions works normally.) What are
those instructions?
movsbl (%rdi,%rsi,2), %eax
retq
There are other possibilities too, such as movsbq
.
QUESTION 5C. This function
int g(int* a, size_t i) {
return a[i];
}
can compile to two instructions on x86-64, but Golden Baron requires more work. Write the Golden Baron translation of this function in x86-64 assembly language. For partial credit, write C code that, executed on x86-64, would return the correct value from a Golden Baron-formatted array.
movzbl (%rdi,%rsi,8), %eax
movzbl 2(%rdi,%rsi,8), %ecx
shll $8, %ecx
orl %ecx, %eax
movzbl 4(%rdi,%rsi,8), %ecx
shlq $16, %ecx
orl %ecx, %eax
movzbl 6(%rdi,%rsi,8), %ecx
shlq $24, %ecx
orl %ecx, %eax
retq
Or:
movq (%rdi,%rsi,8), %rcx
movq %rcx, %rax
andq $255, %rax
shrq $8, %rcx
movq %rcx, %rdx
andq $0xff00, %rdx
orl %edx, %eax
shrq $8, %rcx
movq %rcx, %rdx
andq $0xff0000, %rdx
orl %edx, %eax
shrq $8, %rcx
movq %rcx, %rdx
andq $0xff000000, %rdx
orl %edx, %eax
retq
QUESTION 5D. The Golden Baron’s x86-64 processor actually supports a
secret instruction, swizzleq SRC, REG
, which rearranges the nybbles
(the hexadecimal digits—the aligned 4-bit slices) of the destination
register REG
based on the source argument SRC
. Here’s some
examples. Assuming that %rax
holds the value 0x0123456789ABCDEF
,
the following swizzleq
instructions leave the indicated results in
%rax
:
swizzleq \$0, %rax
->0xFFFFFFFFFFFFFFFF
(the contents of nybble 0 [bits 0-3, inclusive], are repeated into every nybble)swizzleq \$0xFEDCBA9876543210, %rax
->0x0123456789ABCDEF
(each nybble is mapped to its current value: nybble 0 [bits 0-3] is placed in nybble 0 [bits 0-3], nybble 1 in nybble 1, and so forth)swizzleq \$0x0123456701234567, %rax
->0xFEDCBA98FEDCBA98
(nybble 0 [bits 0-3] is placed in nybbles 7 and 15 [bits 28-31 and 60-63]; nybble 1 [bits 4-7] is placed in nybbles 6 and 14 [bits 24-27 and 56-59]; etc.)swizzleq \$0xEFCDAB8967452301, %rax
->0x1032547698BADCFE
(the nybbles of each byte are exchanged)
Use swizzleq
to shorten your answer for Part C.
movzbq (%rdi,%rsi,8), %rax
swizzleq $0x22222222dc985410, %rax
retq
6. Instruction behavior (10 points)
QUESTION 6A. Name three different x86-64 instructions that always
modify the stack pointer, no matter their arguments (instruction names
only; suffixes don’t count, so movl
and movq
are the same
instruction name).
push, pop, call, ret
QUESTION 6B. Name three different x86-64 instructions that sometimes modify the stack pointer, depending on their arguments.
mov, add, sub, or, and, …
QUESTION 6C. Name three different x86-64 instructions that never modify the stack pointer, no matter their arguments.
jmp, jne, je, jWHATEVER, cmp, test, nop, many others
QUESTION 6D. List three different instructions, including
arguments, that if placed immediately before a retq
instruction
that ends a function, will never change the function’s behavior. The
instructions should have different names. No funny business: assume the
function was compiled from valid C, that relative jumps are fixed up,
and that, for example, it doesn’t access its own code.
Many examples:
- ret :)
- jmp [next instruction]
- test (any register), (any register)
- cmp (any register), (any register)
- nop
- movs & arithmetic instructions that involve caller-saved registers
other than
%rax
- arithmetic instructions that do not modify
%rax
, such asaddq $0, %rax
7. Calling convention (10 points)
The following questions concern valid C functions compiled using the normal x86-64 calling convention. True or false?
QUESTION 7A. If the function’s instructions do not save and restore any registers, then the C function did not call any other function.
False for two reasons. (1) If this function doesn't use any callee-saved registers, it doesn't need to explicitly save & restore anything. (2) Tail call elimination.
QUESTION 7B. If the function’s instructions do not change the stack
pointer, then the function’s instructions do not contain a call
instruction.
True because of stack alignment.
QUESTION 7C. If the function’s instructions do not change the stack pointer, then the C function did not call any other function. Explain your answer briefly.
False because of tail call elimination.
QUESTION 7D. If the function’s instructions do not modify the %rax
register, then the C function must return void
.
False; the function could return the result of calling another function.
QUESTION 7E. If the function’s instructions store a local variable on the stack, then that variable’s address will be less than the function’s initial stack pointer.
True.
8. I/O traces (10 points)
QUESTION 8A. Which of the following programs cannot be
distinguished by the output of the strace
utility? List all that
apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t
be distinguished, and D & E can’t be distinguished, but the groups can
be distinguished from each other), list them all.
- Sequential byte writes using stdio
- Sequential byte writes using system calls
- Sequential byte writes using system calls and
O_SYNC
- Sequential block writes using stdio and block size 2
- Sequential block writes using system calls and block size 2
- Sequential block writes using system calls and
O_SYNC
and block size 2 - Sequential block writes using stdio and block size 4096
- Sequential block writes using system calls and block size 4096
- Sequential block writes using system calls and
O_SYNC
and block size 4096
1, 4, 7, 8 can’t be distinguished.
If you don’t consider the open
system call, which lets you distinguish
between the O_SYNC
cases and the others, 2&3 can’t be distinguished,
4&5 can’t be distinguished, and 1,4,7,8,9 can’t be distinguished.
QUESTION 8B. Which of the programs in Part A cannot be distinguished
using blktrace
output? List all that apply.
1, 2, 4, 5, 7, 8 can’t be distinguished. 9 was also considered correct.
QUESTION 8C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.
- Application programs can obtain direct read access to the buffer cache
- Application programs can obtain direct write access to the disk, bypassing the buffer cache
- Other computers can communicate with the disk independently
- The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost
2, 3.
QUESTION 8D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.
1 only.
9. Reference strings and eviction (10 points)
QUESTION 9A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.
Sequential access.
QUESTION 9B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.
Prefetching. Batching is an OK answer, but mostly because it involves prefetching when done for reads.
QUESTION 9C. Give a single reference string with the following properties:
- There exists a cache size and eviction policy that gives a 70% hit rate for the string.
- There exists a cache size and eviction policy that gives a 0% hit rate for the string.
Example: 1231231231. 70% on any 3-slot cache; 0% on a 1-slot cache.
QUESTION 9D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.
- FIFO
- LRU
- Random
Random, FIFO, LRU. Random needs no additional metadata; FIFO can deal with a single integer for the whole cache, pointing to the next index to use; LRU nees a least-recently-used time.
10. Cache code (12 points)
Several famous musicians have just started working on CS61 Problem Set 3. They share the following code for their read-only, sequential, single-slot cache:
struct io61_file {
int fd;
unsigned char buf[4096];
` size_t pos; // position of next character to read in `buf` `
` size_t sz; // number of valid characters in `buf` `
};
int io61_readc(io61_file* f) {
if (f->pos >= f->sz) {
f->pos = f->sz = 0;
ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
if (nr <= 0) {
f->sz = 0;
return -1;
} else {
f->sz = nr;
}
}
int ch = f->buf[f->pos];
++f->pos;
return ch;
}
But they have different io61_read
implementations. Donald
(Lambert)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
return read(f->fd, buf, sz);
}
Solange (Knowles)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
for (size_t pos = 0; pos < sz; ++pos, ++buf) {
*buf = io61_readc(f);
}
return sz;
}
Caroline (Shaw)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
if (f->pos >= f->sz) {
return read(f->fd, buf, sz);
} else {
int ch = io61_readc(f);
if (ch < 0) {
return 0;
}
*buf = ch;
return io61_read(f, buf + 1, sz - 1) + 1;
}
}
You are testing each of these musicians’ codes by executing a sequence
of io61_readc
and/or io61_read
calls on an input file and
printing the resulting characters to standard output. There are no
seeks, and your test programs print until end of file, so your tests’
output should equal the input file’s contents.
You should assume for these questions that no read
system call
ever returns -1.
QUESTION 10A. Describe an access pattern—that is, a sequence of
io61_readc
and/or io61_read
calls (with lengths)—for which
Donald’s code can return incorrect data.
io61_readc`, `io61_read(1)`, …. Any alternation between `readc
and read
is a disaster, because data cached by io61_readc
should
be returned by io61_read
, but will not be.
QUESTION 10B. Which of these musicians’ codes can generate an output file with incorrect length?
Solange’s code never returns EOF from io61_read
, so it will read too
many bytes.
If a program using Donald’s code calls io61_readc
once and then
switches to io61_read
, then they will read too few bytes (bytes in
the readc
buffer won’t be returned).
Caroline’s code is correct!
For the remaining parts, assume the problem in Part B has been corrected, so that all musicians’ codes generate output files with correct lengths.
QUESTION 10C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.
Solange’s code will outperform Donald’s when io61_read
is called
with small sizes. Donald’s code will outperform Solange’s when
io61_read
is called with large sizes.
QUESTION 10D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.
I would change Caroline’s test from if (f-\>pos \>= f-\>sz)
to if (f-\>pos \>= f-\>sz && sz \>= 1024)
(or 4096, or something similar).
This uses the cache when read
is called with small sizes, but avoids
the extra copy when read
is called with large sizes.
11. Feedback (2 points)
QUESTION 11A. What aspects of the course are working well for you? Be brief; any answer but no answer will receive credit.
QUESTION 11B. What aspects of the course are not working well for you? Be brief; any answer but no answer will receive credit.