Midterm Solutions

0. Ground rules

Assume a 32-bit x86 architecture unless explicitly told otherwise.

Write your assumptions if a problem seems unclear, and write down your reasoning for partial credit.

1. Git (15 points)

Edward Snowden is working on a CS61 problem set and he has some git questions.

QUESTION 1A. The CS61 staff has released some new code. Which commands will help Edward get the code from code.seas.harvard.edu into his repository? Circle all that apply.

git commit
git add
git push
git pull

5pts; 4pts for #4 plus others; 3pts for #3 (assume brainfart)

QUESTION 1B. Edward has made some changes to his code. He hasn’t run git since making the changes. He wants to upload his latest version to code.seas.harvard.edu. Put the following git commands in an order that will accomplish this goal. You won’t necessarily use every command. You may add flags to a command (but you don’t have to). If you add flags, tell us what they are.

git commit
git add
git push
git pull

#2, #1, #3

OR: #1 -a, #3

5pts; 4pts if they say pull instead of push (brainfart), or they commit without adding, or they reorder add & commit; 3pts for 2 of those mistakes at once

Edward Snowden’s partner, Edward Norton, has been working on the problem set also. They’ve been working independently.

At midnight on October 10, here’s how things stood. The git log for the partners’ shared code.seas.harvard.edu repository looked like this. The committer is listed in (parentheses).

 52d44ee Pset release. (kohler)

The git log for Snowden’s local repository:

 3246d07 Save Greenwald's phone number (snowden)
 8633fbd Start work on a direct-mapped cache (snowden)
 52d44ee Pset release. (kohler)

The git log for Norton’s local repository:

 81f952e try mmap (norton)
 52d44ee Pset release. (kohler)

At noon on October 11, their shared code.seas.harvard.edu repository has this log:

 d446e60 Increase cache size (snowden)
 b677e85 use mmap on mmappable files (norton)
 b46cfda Merge branch 'master' of code.seas.harvard.edu:~TheTrueHOOHA/cs61/TheTrueHOOHAs-cs61-psets.git 
         (norton)
 81f952e try mmap (norton)
 3246d07 Save Greenwald's phone number (snowden)
 8633fbd Start work on a direct-mapped cache (snowden)
 52d44ee Pset release. (kohler)

QUESTION 1C. Give an order for these commands that could have produced that log starting from the midnight October 10 state. You might not use every command, and you might use some commands more than once. Sample (incorrect) answer: “1 4 4 5 2.”

snowden: git commit -a
snowden: git push
snowden: git pull
norton: git commit -a
norton: git push
norton: git pull

#2 (snowden push)
[#5 (norton push—OPTIONAL; this push would fail)]
#6 (norton pull) (We know that Snowden pushed first, and Norton pulled before pushing, because Norton committed the merge) [CIRCLE FOR 1D]
[#4 (norton commit—OPTIONAL for the merge commit; the merge commit will happen automatically if there are no conflicts] [ALLOW CIRCLE FOR 1D]
#4 (norton commit for b677e85)
#5 (norton push)
#3 (snowden pull—snowden pulls before committing because there is no merge)
#1 (snowden commit for d446e60)
#2 (snowden push)

QUESTION 1D. In your answer to Question 1C, circle the step(s) where there might have been a merge conflict.

See above

2. Debugging (10 points)

QUESTION 2A. Match each tool or technique with a debugging situation for which it is well suited. Produce the best overall match that uses each situation exactly once.


1. strace	A. Investigating segmentation faults
2. gdb	B. Finding memory leaks
3. valgrind --tool=memcheck	C. Checking your assumptions and verifying invariants
4. printf statements	D. Discovering I/O patterns
5. assert	E. Displaying program state

1—D, 2—A, 3—B, 4—E, 5—C

3. Sizes and alignment (15 points)

Let alignof(T) equal the alignment of type T.

QUESTION 3A. Assume that structure struct Y { ... } contains K char members and M int members, with K≤M, and nothing else. Write an expression defining the maximum sizeof(struct Y).

4M + 4K

QUESTION 3B. You are given a structure struct Z { T1 a; T2 b; T3 c; } that contains no padding. What does (sizeof(T1) + sizeof(T2) + sizeof(T3)) % alignof(struct Z) equal?

QUESTION 3C. Arrange the following types in increasing order by size. Sample answer: “1 < 2 = 4 < 3” (choose this if #1 has smaller size than #2, which has equal size to #4, which has smaller size than #3).

char
struct minipoint { uint8_t x; uint8_t y; uint8_t z; }
int
unsigned short[1]
char**
double[0]

#6 < #1 < #4 < #2 < #3 = #5

sizeof(x[0]) is actually 0!

4. Memory layout and garbage collection (15 points)

The following program makes several memory allocations and then calls a version of our m61_gc conservative garbage collector from Lecture 8. We recommend that before going further you draw a quick picture of what points to what.

 int*** a;
 int* b;
 
 void f(void);
 
 int main(int argc, char** argv) {
     extern char* stack_bottom; stack_bottom = (char*) &argc;
     
     int** x = (int**) m61_malloc(sizeof(int*) * 4);   // Allocation 0
     for (int i = 0; i < 4; ++i) {
         x[i] = (int*) m61_malloc(sizeof(int));   // Allocations 1 through 4
         *x[i] = i;
     }
     a = &x;
 
     b = (int*) m61_malloc(sizeof(int));   // Allocation 5
     *b = 61;    
     x[0] = b;
     
     f();
 
     printf("%d\n", *x[0]);   // see Question 4A
 }
 
 typedef struct container {
     uint16_t y[4];
     int* z;
 } container;
 
 void f(void) {
     container c;
 
     unsigned v1 = (unsigned) (*a)[2];
     unsigned v2 = (unsigned) (*a)[3] + 3;
     c.y[0] = v1 / 0x10000;
     c.y[1] = v1 % 0x10000;
     c.y[2] = v2 % 0x10000;
     c.y[3] = v2 / 0x10000;
     v1 = v2 = 0;
 
     c.z = (int*) m61_malloc(sizeof(int));    // Allocation 6
  
     m61_gc();
 }

QUESTION 4A. The printf statement in main prints the integer value stored in Allocation 5. Write three syntactically different C expressions that would print the same value if substituted for *x[0]. Specifically, write:

One expression using a
One expression using x (but not “*x[0]”)
One expression using another variable

***a (or a[0][0][0], etc.)

**x (or (&*x[0])[0], etc.)

*b

QUESTION 4B. Which of the following expressions has a value that equals the address of some dynamically-allocated heap object? Circle all that apply.

a
b
x
stack_bottom
c.y[0]
&c.z[0]
None of the above

#2, #3, #6

QUESTION 4C. Which allocations are garbage (i.e., unreferenced) when m61_gc() runs? Identify them by allocation number.

Allocation 1 only

Recall that m61_gc() is a mark-sweep collector; its mark phase recursively marks reachable heap pointers, starting from roots in global variables and on the stack. m61_gc() looks like this:

 void m61_gc(void) {
     char* stack_top = (char*) &stack_top;  // current stack position
     extern char data_start[], _end[];   // boundaries of the data+bss segments (read/write globals)
 
     // The heap starts out unmarked
     for (size_t i = 0; i != nmr; ++i)
         mr[i].mark = 0;
 
     // Mark all objects reachable from globals
     m61_find_allocations(data_start, _end - data_start);
     // Mark all objects reachable from the stack
     m61_find_allocations(stack_top, stack_bottom - stack_top);
 
     // Free everything that wasn't marked
     for (size_t i = 0; i != nmr; ++i)
         if (!mr[i].mark) {
             m61_free(mr[i].ptr);
             --i;                // recheck slot (it has another object now)
         }
 }

This is just like the version from class, except it checks roots in global variables before checking roots in the stack.

The m61_find_allocations function is defined this way (exactly as we did in class):

 void m61_find_allocations(char* base, size_t sz) {
     if (sz < sizeof(void*))
         return;
     for (size_t i = 0; i <= sz - sizeof(void*); ++i) {
         char** aptr = (char**) &base[i];
         memregion* m = m61_mr_find(*aptr, 1);
           // m == NULL unless *aptr points into an active heap allocation
         if (m && !m->mark) {
             m->mark = 1;
             m61_find_allocations(m->ptr, m->sz);
         }
     }
 }

QUESTION 4D. Arrange the reachable allocations in the order they will be marked by m61_gc(). For instance, if Allocation 6 were marked first, then Allocation 5, and so forth, you would write “6 5 4 3 2 1 0.” Assume that the stack and globals contain no extraneous roots (in other words, they don’t contain any temporary variables or random garbage that look like pointers into the heap). Write down any assumptions you make, such as assumptions about the addresses of global variables.

First we look at global data. a is ignored since it points at an object on the stack. b is followed, leading to
- Allocation 5
Then the stack is followed. The stack is tracked top down, i.e., first f then main. In f there are 3 locals, c, v1, and v2. The 4 bytes at c.y[0]...c.y[1] equal a BIG-ENDIAN pointer, which won't be recognized. Other byte combinations will also likely not amount to much. Then c.y[2]...c.y[3] equal a LITTLE-ENDIAN version of a pointer into (*a)[3], that is, x[3], or
- Allocation 4
Next is c.z
- Allocation 6
Then comes main
- Allocation 0
- Allocation 2 (skip Allocation 5 because it's encountered already; skip Allocation 1 because it’s garbage)
- Allocation 3

5. I/O caching (20 points)

Mary Ruefle, a poet who lives in Vermont, is working on her caching I/O library for CS61’s problem set 2. She wants to implement a cache with N slots. Since searching those slots might slow down her library, she writes a function that maps addresses to slots. Here’s some of her code.

 #define SLOTSIZ 4096
 typedef struct io61_slot {
     char buf[SLOTSIZ];
     off_t pos; // = (off_t) -1 for empty slots
     ssize_t sz;
 } io61_slot;
 
 #define NSLOTS 64
 struct io61_file {
     int fd;
     off_t pos; // current file position
     io61_slot slots[NSLOTS];
 };
 
 static inline int find_slot(off_t off) {
     return off % NSLOTS;
 }
 
 int io61_readc(io61_file* f) {
     int slotindex = find_slot(f->pos);
     io61_slot* s = &f->slots[slotindex];
 
     if (f->pos < s->pos || f->pos >= s->pos + s->sz) {
         // slot contains wrong data, need to refill it
         off_t new_pos = lseek(f->fd, f->pos, SEEK_SET);
         assert(new_pos != (off_t) -1); // only handle seekable files for now
         ssize_t r = read(f->fd, s->buf, SLOTSIZ);
         if (r == -1 || r == 0)
             return EOF;
         s->pos = f->pos;
         s->sz = r;
     }
 
     int ch = (unsigned char) s->buf[f->pos - s->pos];
     ++f->pos;
     return ch;
 }

Before she can run and debug this code, Mary is led “to an emergency of feeling that ... results in a poem.” She’ll return to CS61 and fix her implementation soon, but in the meantime, let’s answer some questions about it.

QUESTION 5A. True or false: Mary’s cache is a direct-mapped cache.

True

QUESTION 5B. What changes to Mary’s code could change your answer to Question 5A? Circle all that apply.

New code for find_slot (keeping io61_readc the same)
New code in io61_readc (keeping find_slot the same)
New code in io61_readc and new code for find_slot
None of the above

#2 and #3. #1 is NOT a valid answer

QUESTION 5C. Which problems would occur when Mary’s code was used to sequentially read a seekable file of size 2MiB (2×2²⁰ = 2097152 bytes) one character at a time? Circle all that apply.

Excessive CPU usage (>10x stdio)
Many system calls to read data (>10x stdio)
Incorrect data (byte x read at a position where the file has byte y≠x)
Too much data
Too little data
Crash/undefined behavior
None of the above

#2 only

QUESTION 5D. Which of these new implementations for find_slot would fix at least one of these problems with reading sequential files? Circle all that apply.

return (off * 2654435761) % NSLOTS; /* integer hash function from Stack Overflow */
return (off / SLOTSIZ) % NSLOTS;
return off & (NSLOTS - 1);
return 0;
return (off >> 12) & 0x3F;
None of the above

#2, #4, #5

6. Processor cache (15 points)

The git version control system is based on commit hashes, which are 160-bit (20-byte) hash values used to identify commits. In this problem you’ll consider the processor cache behavior of several versions of a “grading server” that maps commits to grades. Here’s the first version:

 typedef struct commit_info {
     char hash[20];
     int grade[11];
 } commit_info;
 
 commit_info* commits;
 size_t N;
 
 int get_grade1(const char* hash, int pset) {
     for (size_t i = 0; i != N; ++i)
         if (memcmp(commits[i].hash, hash, 20) == 0)
             return commits[i].grade[pset];
     return -1;
 }

We will ask questions about the average number of cache lines accessed by variants of get_grade(hash, pset). You should make the following assumptions:

The hash argument is uniformly drawn from the set of known commits. That is, the probability that hash equals the ith commit’s hash is 1/N.
Only count cache lines accessible via commits. Don’t worry about cache lines used for local variables, for parameters, for global variables, or for instructions. For instance, do not count the hash argument or the global-data cache line that stores the commits variable itself.
Every object is 64-byte aligned, and no two objects share the same cache line.
Commit hashes are mathematically indistinguishable from random numbers. Thus, the probability that two different hashes have the same initial k bits equals 1/2^k.
Fully correct answers would involve ceiling functions, but you don’t need to include them.

QUESTION 6A. What is the expected number of cache lines accessed by get_grade1, in terms of N?

Each commit_info object is on its own cache line, and we will examine 1/2 of the objects on average, so the answer is ⌈N/2⌉. (Reminder: ceilings not required)

The second version:

 typedef struct commit_info {
     char hash[20];
     int grade[11];
 } commit_info;
 
 commit_info** commits;
 size_t N;
 
 int get_grade2(const char hash[20], int pset) {
     for (size_t i = 0; i != N; ++i)
         if (memcmp(commits[i]->hash, hash, 20) == 0)
             return commits[i]->grade[pset];
     return -1;
 }

QUESTION 6B. What is the expected number of cache lines accessed by get_grade2, in terms of N?

This still examines N/2 commit_info objects. But in addition it examines cache lines to evaluate the POINTERS to those objects. There are 16 such pointers per cache line (16×4=64), and we examine N/2 pointers, for N/16/2 = N/32 additional cache lines. Thus ⌈N/2⌉+⌈N/32⌉ ≅ 17N/32.

The third version:

 typedef struct commit_info {
     char hash[20];
     int grade[11];
 } commit_info;
 
 typedef struct commit_hint {
     char hint[4];
     commit_info* commit;
 } commit_hint;
 
 commit_hint* commits;
 size_t N;
 
 int get_grade3(const char* hash, int pset) {
     for (size_t i = 0; i != N; ++i)
         if (memcmp(commits[i].hint, hash, 4) == 0
             && memcmp(commits[i].commit->hash, hash, 20) == 0)
             return commits[i].commit->grade[pset];
     return -1;
 }

QUESTION 6C. What is the expected number of cache lines accessed by get_grade3, in terms of N? (You may assume that N≤2000.)

The assumption that N≤2000 means we’re exceedingly unlikely to encounter a hint collision (i.e. a commit with the same hint, but different commit value). That means that we will examine N/2 commit_hint objects but ONLY ONE commit_info object. commit_hint objects are 8B big, so 8 hints/cache line: we examine N/8/2 = N/16 cache lines for hint objects, plus one for the info. ⌈N/16⌉ + 1.

The fourth version is actually a hash table.

 typedef struct commit_info {
     char hash[20];
     int grade[11];
 } commit_info;
 
 commit_info** commits;
 size_t commits_hashsize;
 
 int get_grade4(const char* hash, int pset) {
     // choose initial bucket
     size_t bucket;
     memcpy(&bucket, hash, sizeof(bucket));
     bucket = bucket % commits_hashsize;
     // search for the commit starting from that bucket
     while (commits[bucket] != NULL) {
         if (memcmp(commits[bucket]->hash, hash, 20) == 0)
             return commits[bucket]->grade[pset];
         bucket = (bucket + 1) % commits_hashsize;
     }
     return -1;
 }

QUESTION 6D. Assume that a call to get_grade4 encounters C expected hash collisions (i.e., examines C buckets before finding the bucket that actually contains hash). What is the expected number of cache lines accessed by get_grade4, in terms of N and C?

For commit_info objects, the lookup will access C cache lines, for the collisions, plus 1, for the successful lookup. But we must also consider the commits[bucket] pointers. We will examine at least 1 cache line for the successful bucket. The C collisions that happen before that will access C buckets. But those buckets might be divided among multiple cache lines; for instance, if C=1, 2 buckets are accessed, but if the first bucket=15, those buckets will be divided among 2 cache lines. The correct formula for buckets, including the final lookup, is 1 + C/16. Thus the total lookup will examine 2 + C + C/16 cache lines on average.

7. Assembly (10 points)

Here is some x86 assembly code.

 f:
         movl a, %eax
         movl b, %edx
         andl $255, %edx
         subl %edx, %eax
         movl %eax, a
         ret

QUESTION 7A. Write valid C code that could have compiled into this assembly (i.e., write a C definition of function f), given the global variable declarations “extern unsigned a, b;.” Your C code should compile without warnings. REMINDER: You are not permitted to run a C compiler, except for the C compiler that is your brain.

Many answers:

 void f(void) {
     a -= b & 255;
 }
 void f(void) {
     a += -(b % 256);
 }
 unsigned f(void) {
     a = a - b % 0x100;
     return a;
 }
 unsigned f(void) {
     a -= (unsigned char) b; /* NB extra credit */
     return a;
 }
 char* f(int x, int y, int z[1000]) {
     a -= (unsigned char) b;
     return (char*) a;
 }

QUESTION 7B. Write different valid, warning-free C code that could have compiled into that assembly. This version should contain different operators than your first version. (For extra credit, use only one operator.)

QUESTION 7C. Again, write different valid, warning-free C code that could have compiled into that assembly. In this version, f should have a different type than in your first version.

NOTOC