2017 Midterm
General overview
Assume a CS61 VM running the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.
1. Sizes and alignments (10 points)
Here’s a test struct with n members. Assume an x86-64 machine, where
each Ti
either is a basic x86-64 type (e.g., int
, char
,
double
) or is a type derived from such types (e.g., arrays, structs,
pointers, unions, possibly recursively), and assume that ai≤8 for all
i.
struct test {
T1 m1; // sizeof(T1) == s1, __alignof__(T1) == a1
T2 m2; // sizeof(T2) == s2, __alignof__(T2) == a2
...
Tn mn; // sizeof(Tn) == sn, __alignof__(Tn) == an
};
In these questions, you will compare this struct with other structs that have the same members, but in other orders.
QUESTION 1A. True or false: The size of struct test
is minimized
when its members are sorted by size. In other words, if
s1≤s2≤…≤sn, then sizeof(struct test)
is less than or equal to
the struct size for any other member order.
If true, briefly explain your answer; if false, give a counterexample
(i.e., concrete types for T1
, …, Tn
that do not minimize
sizeof(struct test)
).
QUESTION 1B. True or false: The size of struct test
is minimized
when its members are sorted by alignment. In other words, if
a1≤a2≤…≤an, then sizeof(struct test)
is less than or equal to
the struct size for any other member order.
If true, briefly explain your answer; if false, give a counterexample.
QUESTION 1C. True or false: The alignment of struct test
is
minimized when its members are sorted in increasing order by alignment.
In other words, if a1≤a2≤…≤an, then
__alignof__(struct test)
is less than or equal to the struct
alignment for any other member order.
If true, briefly explain your answer; if false, give a counterexample.
QUESTION 1D. What is the maximum number of bytes of padding that
struct test
could contain for a given n? The answer will be a pretty
simple formula involving n. (Remember that ai≤8 for all i.)
QUESTION 1E. What is the minimum number of bytes of padding that
struct test
could contain for a given n?
2. Undefined behavior (10 points)
QUESTION 2A. Sometimes a conforming C compiler can assume that a + 1 \> a
, and sometimes it can’t. For each type below, consider this
expression:
a + (int) 1 > a
and say whether the compiler:
- Must reject the expression as a type error.
- May assume that the expression is true (that
a + (int) 1 \> a
for alla
). - Must not assume that the expression is true.
int a
unsigned a
char\* a
unsigned char a
struct {int m;} a
QUESTION 2B. The following code checks its arguments for sanity, but not well: each check can cause undefined behavior.
void sanity_check(int* array, size_t array_size, int* ptr_into_array) {
if (array + array_size < array) {
` fprintf(stderr, "`array` is so big that it wraps around!\n"); `
abort();
}
if (ptr_into_array < array || ptr_into_array > array + array_size) {
` fprintf(stderr, "`ptr_into_array` doesn’t point into the array!\n"); `
abort();
}
...
Rewrite these checks to avoid all undefined behavior. You will likely
add one or more casts to uintptr_t
. For full credit, write each
check as a single comparison (no &&
or ||
, even though the current
ptr_into_array
check uses ||
).
array_size check: |
ptr_into_array check: |
QUESTION 2C. In Lecture datarep4, we discussed several ways to tell
if a signed integer x
is negative. One of them was the following:
int isnegative = (x & (1UL << (sizeof(x) * CHAR_BIT))) != 0;
But this is incorrect: it has undefined behavior. Correct it by adding two characters.
3. Memory errors and garbage collection (10 points)
Recall that a conservative garbage collector is a program that can automatically free dynamically-allocated memory by detecting when that memory is no longer referenced. Such a GC works by scanning memory for currently-referenced pointers, starting from stack and global memory, and recursing over each referenced object until all referenced memory has been scanned. We built a conservative garbage collector in lecture datarep6.
QUESTION 3A. An application program that uses conservative GC, and
does not call free
directly, will avoid certain errors and undefined
behaviors. Which of the following errors are avoided? List all that
apply.
- Use-after-free
- Double free
- Signed integer overflow
- Boundary write error
- Unaligned access
QUESTION 3B. Write a C program that leaks unbounded memory without GC, but does not do so with GC. You should need less than 5 lines. (Leaking “unbounded” memory means the program will exhaust the memory capacity of any machine on which it runs.)
QUESTION 3C. Not every valid C program works with a conservative GC, because the C abstract machine allows a program to manipulate pointers in strange ways. Which of the following pointer manipulations might cause the conservative GC from class to inappropriately free a memory allocation? List all that apply.
-
Storing the pointer in a
uintptr_t
variable -
Writing the pointer to a disk file and reading it back later
-
Using the least-significant bit of the pointer to store a flag: int* set_ptrflag(int* x, int flagval) { return (int*) ((uintptr_t) x | (flagval ? 1 : 0)); } int get_ptrflag(int* x) { return (uintptr_t) x & 1; } int deref_ptrflag(int* x) { return ((int) ((uintptr_t) x & ~1UL)); }
-
Storing the pointer in textual form: void save_ptr(char buf[100], void* p) { sprintf(buf, "%p", p); } void* restore_ptr(const char buf[100]) { void* p; sscanf(buf, "%p", &p); return p; }
(…option 5 on next page…)
-
Splitting the pointer into two parts and storing the parts in an array: typedef union { unsigned long ival; unsigned arr[2]; } value; value save_ptr(void* p) { value v; v.arr[0] = (uintptr_t) p & 0xFFFFFFFFUL; v.arr[1] = ((uintptr_t) p / 4294967296UL) & 0xFFFFFFFFUL; return v; }
4. Bitwise (6 points)
QUESTION 4A. Consider this C fragment:
uintptr_t x = ...;
uintptr_t r = 0;
if (a < b) {
r = x;
}
Or, shorter:
uintptr_t r = a < b ? x : 0;
Write a single expression that evaluates to the same value, but that
does not use the conditional ?: operator. You will use the fact that
a \< b
always equals 0 or 1. For full credit, do not use expensive
operators (multiply, divide, modulus).
QUESTION 4B. This function returns one more than the index of the least-significant 1 bit in its argument, or 0 if its argument is zero.
int ffs(unsigned long x) {
for (int i = 0; i < sizeof(x) * CHAR_BIT; ++i) {
if (x & (1UL << i)) {
return i + 1;
}
}
return 0;
}
This function runs in O(B) time, where B is the number of bits in
an unsigned long
. Write a version of ffs
that runs instead in
O(log B) time.
5. Golden Baron (12 points)
A very rich person has designed a new x86-64-based computer, the Golden Baron Supercomputer 9000, and is paying you handsomely to write a C compiler for it. There’s just one problem. This person, like many very rich people, is dumb, and on their computer, odd-numbered memory addresses don’t work for data. When data is loaded into a general-purpose register from an odd-numbered address, the value read is zero. For example, consider the following instructions:
movl $0x01020304, a(%rip)
movl a(%rip), %eax
(where the address of a
is even). Executed on true x86-64, %rax
will hold the value 0x01020304; on Golden Baron, %rax
will hold
0x00020004.
But it is still possible to write a correct C compiler for this ungodly
hardware—you just have to work around the bad memory with code. You plan
to use two bytes of Golden Baron memory for every one byte of normal
x86-64 memory. For instance, an array int a[2] = {1, 0x0a0b0c0d};
would be stored in 16 bytes of memory, like so:
01 00 00 00 00 00 00 00 0d 00 0c 00 0b 00 0a 00
Pointer arithmetic, and moving multi-byte values to and from registers,
must account for the zero bytes that alternate with meaningful bytes. So
to read the correct value for a[2]
, the compiler must arrange to
read the bytes at addresses A+8
, A+10
, A+12
, and A+14
,
where A
is the address of the first byte in a
.
QUESTION 5A. What should printf("%zu\n", sizeof(char))
print on
Golden Baron?
QUESTION 5B. This function
int f(signed char* c, size_t i) {
return c[i];
}
can compile to two instructions on x86-64, including retq
. It can
also compile to two instructions on Golden Baron. (We’re assuming that
memory used for Golden Baron instructions works normally.) What are
those instructions?
QUESTION 5C. This function
int g(int* a, size_t i) {
return a[i];
}
can compile to two instructions on x86-64, but Golden Baron requires more work. Write the Golden Baron translation of this function in x86-64 assembly language. For partial credit, write C code that, executed on x86-64, would return the correct value from a Golden Baron-formatted array.
QUESTION 5D. The Golden Baron’s x86-64 processor actually supports a
secret instruction, swizzleq SRC, REG
, which rearranges the nybbles
(the hexadecimal digits—the aligned 4-bit slices) of the destination
register REG
based on the source argument SRC
. Here’s some
examples. Assuming that %rax
holds the value 0x0123456789ABCDEF
,
the following swizzleq
instructions leave the indicated results in
%rax
:
swizzleq \$0, %rax
->0xFFFFFFFFFFFFFFFF
(the contents of nybble 0 [bits 0-3, inclusive], are repeated into every nybble)swizzleq \$0xFEDCBA9876543210, %rax
->0x0123456789ABCDEF
(each nybble is mapped to its current value: nybble 0 [bits 0-3] is placed in nybble 0 [bits 0-3], nybble 1 in nybble 1, and so forth)swizzleq \$0x0123456701234567, %rax
->0xFEDCBA98FEDCBA98
(nybble 0 [bits 0-3] is placed in nybbles 7 and 15 [bits 28-31 and 60-63]; nybble 1 [bits 4-7] is placed in nybbles 6 and 14 [bits 24-27 and 56-59]; etc.)swizzleq \$0xEFCDAB8967452301, %rax
->0x1032547698BADCFE
(the nybbles of each byte are exchanged)
Use swizzleq
to shorten your answer for Part C.
6. Instruction behavior (10 points)
QUESTION 6A. Name three different x86-64 instructions that always
modify the stack pointer, no matter their arguments (instruction names
only; suffixes don’t count, so movl
and movq
are the same
instruction name).
QUESTION 6B. Name three different x86-64 instructions that sometimes modify the stack pointer, depending on their arguments.
QUESTION 6C. Name three different x86-64 instructions that never modify the stack pointer, no matter their arguments.
QUESTION 6D. List three different instructions, including
arguments, that if placed immediately before a retq
instruction
that ends a function, will never change the function’s behavior. The
instructions should have different names. No funny business: assume the
function was compiled from valid C, that relative jumps are fixed up,
and that, for example, it doesn’t access its own code.
7. Calling convention (10 points)
The following questions concern valid C functions compiled using the normal x86-64 calling convention. True or false?
QUESTION 7A. If the function’s instructions do not save and restore any registers, then the C function did not call any other function.
QUESTION 7B. If the function’s instructions do not change the stack
pointer, then the function’s instructions do not contain a call
instruction.
QUESTION 7C. If the function’s instructions do not change the stack pointer, then the C function did not call any other function. Explain your answer briefly.
QUESTION 7D. If the function’s instructions do not modify the %rax
register, then the C function must return void
.
QUESTION 7E. If the function’s instructions store a local variable on the stack, then that variable’s address will be less than the function’s initial stack pointer.
8. I/O traces (10 points)
QUESTION 8A. Which of the following programs cannot be
distinguished by the output of the strace
utility? List all that
apply; if multiple indistinguishable groups exist (e.g., A, B, & C can’t
be distinguished, and D & E can’t be distinguished, but the groups can
be distinguished from each other), list them all.
- Sequential byte writes using stdio
- Sequential byte writes using system calls
- Sequential byte writes using system calls and
O_SYNC
- Sequential block writes using stdio and block size 2
- Sequential block writes using system calls and block size 2
- Sequential block writes using system calls and
O_SYNC
and block size 2 - Sequential block writes using stdio and block size 4096
- Sequential block writes using system calls and block size 4096
- Sequential block writes using system calls and
O_SYNC
and block size 4096
QUESTION 8B. Which of the programs in Part A cannot be distinguished
using blktrace
output? List all that apply.
QUESTION 8C. The buffer cache is coherent. Which of the following operating system changes could make the buffer cache incoherent? List all that apply.
- Application programs can obtain direct read access to the buffer cache
- Application programs can obtain direct write access to the disk, bypassing the buffer cache
- Other computers can communicate with the disk independently
- The computer has a uninterruptible power supply (UPS), ensuring that the operating system can write the contents of the buffer cache to disk if main power is lost
QUESTION 8D. The stdio cache is incoherent. Which of the operating system changes from Part C could make the stdio cache coherent? List all that apply.
9. Reference strings and eviction (10 points)
QUESTION 9A. When demonstrating cache eviction in class, we modeled a completely reactive cache, meaning that the cache performed at most one load from slow storage per access. Name a class of reference string that will have a 0% hit rate on any cold reactive cache. For partial credit, give several examples of such reference strings.
QUESTION 9B. What cache optimization can be used to improve the hit rate for the class of reference string in Part A? One word is enough; put the best choice.
QUESTION 9C. Give a single reference string with the following properties:
- There exists a cache size and eviction policy that gives a 70% hit rate for the string.
- There exists a cache size and eviction policy that gives a 0% hit rate for the string.
QUESTION 9D. Put the following eviction algorithms in order of how much space they require for per-slot metadata, starting with the least space and ending with the most space. (Assume the slot order is fixed, so once a block is loaded into slot i, it stays in slot i until it is evicted.) For partial credit say what you think the metadata would be.
- FIFO
- LRU
- Random
10. Cache code (12 points)
Several famous musicians have just started working on CS61 Problem Set 3. They share the following code for their read-only, sequential, single-slot cache:
struct io61_file {
int fd;
unsigned char buf[4096];
` size_t pos; // position of next character to read in `buf` `
` size_t sz; // number of valid characters in `buf` `
};
int io61_readc(io61_file* f) {
if (f->pos >= f->sz) {
f->pos = f->sz = 0;
ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
if (nr <= 0) {
f->sz = 0;
return -1;
} else {
f->sz = nr;
}
}
int ch = f->buf[f->pos];
++f->pos;
return ch;
}
But they have different io61_read
implementations. Donald
(Lambert)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
return read(f->fd, buf, sz);
}
Solange (Knowles)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
for (size_t pos = 0; pos < sz; ++pos, ++buf) {
*buf = io61_readc(f);
}
return sz;
}
Caroline (Shaw)’s is:
ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
if (f->pos >= f->sz) {
return read(f->fd, buf, sz);
} else {
int ch = io61_readc(f);
if (ch < 0) {
return 0;
}
*buf = ch;
return io61_read(f, buf + 1, sz - 1) + 1;
}
}
You are testing each of these musicians’ codes by executing a sequence
of io61_readc
and/or io61_read
calls on an input file and
printing the resulting characters to standard output. There are no
seeks, and your test programs print until end of file, so your tests’
output should equal the input file’s contents.
You should assume for these questions that no read
system call
ever returns -1.
QUESTION 10A. Describe an access pattern—that is, a sequence of
io61_readc
and/or io61_read
calls (with lengths)—for which
Donald’s code can return incorrect data.
QUESTION 10B. Which of these musicians’ codes can generate an output file with incorrect length?
For the remaining parts, assume the problem in Part B has been corrected, so that all musicians’ codes generate output files with correct lengths.
QUESTION 10C. Give an access pattern for which Solange’s code will return correct data and outperform Donald’s, or vice versa, and say whose code will win.
QUESTION 10D. Suggest a small change (≤10 characters) to Caroline’s code that would, most likely, make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.
11. Feedback (2 points)
QUESTION 11A. What aspects of the course are working well for you? Be brief; any answer but no answer will receive credit.
QUESTION 11B. What aspects of the course are not working well for you? Be brief; any answer but no answer will receive credit.