CS 61
  • Info
    Concepts Course Description Course Staff Extension Resources Schedule Setup: GitHub Setup: Docker Setup: VM Style C and C++ Patterns Git Diff GDB Introduction GDB Commands File Descriptors HTTP Test Information Test 1 Test 2 Final Midterm 2023 Final 2023 Midterm 2022 Final 2022 Exercises: Miscellaneous
  • Problem sets
    Problem set 0 Problem set 1 Problem set 2 Problem set 3 Problem set 4 Problem set 5 Problem set 5 EC: Interruption Problem set 6 Problem set 6 EC Data representation exercises Assembly exercises Kernel exercises Storage exercises Process control exercises Synchronization exercises
  • Lectures
    Textbook Data representation Data representation 1: Introduction Data representation 2: Sizes and layout Data representation 3: Layout and dynamic allocation Data representation 4: Dynamic allocation, invariants Assembly Kernel Kernel 1: Processes, isolation, robustness Kernel 2: Virtual memory Kernel 3: Virtual memory and page tables Storage Storage 1: OS input/output, memory hierarchy Storage 2: Cache optimizations and coherence Storage 3: Eviction policies and coherence Process control Processes 1: Basics Processes 2: Inter-process communication Processes 3: Interruption and race conditions EthiCS: Harms and Unicode EthiCS: UTF-8 Synchronization Synchronization 1: Threads and atomicity Synchronization 2: Critical sections, serializability, lock granularity Synchronization 5: Databases Synchronization 6: Databases and Experiences
  • Sections
    Times & locations Datarep Section 1: C++ data structures Datarep Section 2: Memory bugs Assembly Section 1: Fun Datarep/Assembly Exercises Section Kernel Section 1: WeensyOS Kernel Section 2: Kernel exercises Kernel Section 3: Pipes Storage Section 1: Single-slot cache Process Section 1 and Storage Exercises Process Supplement: BNF grammars Process Section 2: Process control exercises Synchronization Section 1: Threads and synchronization objects Synchronization Section 2: Synchronization Exercises

2022 Midterm

Show all solutions Hide all solutions

Rules

The exam is open book, open note, open computer. You may access the book and your own notes. You may also use computers or electronic devices to access your own class materials and public class materials. Specifically:

  • You may access a browser and a PDF reader.
  • You may access your own notes and problem set code electronically.
  • You may access Internet sites on which your own notes and problem set code are stored.
  • You may access the course site.
  • You may access pages directly linked from the course site, including our lecture notes, exercises, and section notes.
  • You may access our lecture, problem set, and section code.
  • You may run a C or C++ compiler, including an assembler and linker.
  • You may access manual pages and common utilities, including calculators and a Python interpreter.

However:

  • You may not access class discussion forums, such as the Edboard, or other question forums, such as Stack Overflow.
  • You may not access an on-line disassembler or decompiler.
  • You may not access solutions from any previous exam, by paper or computer, except for those on the course site.
  • You may not broadly search the Internet for answers to questions or access other courses’ materials. Stick to our course site.
  • You absolutely may not contact other humans or AI assistants with questions about the exam—whether in person, via IM, via GitHub Copilot, or by posting questions to forums. The single exception is that you may contact course staff.

Any violations of this policy are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Students are taking the exam at different times. Do not post publicly about the exam until given permission to do so by course staff. This includes general comments about exam difficulty.

Questions

If you have questions during the exam, write us email at cs61-staff@g.harvard.edu. Don’t use the Edboard. We may not notice Edboard posts, and you aren’t supposed to access the Edboard anyway.

Completing your exam

You have 3 hours to complete the exam starting from the time you press Start. Different students may have different exams. Enter your answers directly in this form. A green checkmark appears when an answer is saved. (Click outside a text box to save the answer.)

Students with extra time accommodations: If your extra time is not reflected next to the start button, contact course staff and wait for confirmation before pressing it.

You should not need to attach a drawing or other longer notes, but you may do so by adding them to a midterm subdirectory in your cs61-f24-psets repository. Make sure you push your changes to GitHub before time is up, and explain in this form where we should look at your notes.

There is no need to explicitly “submit” the exam. Your changes are automatically saved with timestamps. We’ll grade the latest version saved within your exam window.

Notes

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Basics (15 points)

QUESTION 1A. How many bits are in a long?

3pt 64

QUESTION 1B. What is the address of register %rax?

2pt It doesn’t have an address.

QUESTION 1C. Why might traversing a linked list be slower than traversing an array? List all that apply.

  1. A node in a linked list is larger than an element of an array, because the linked list node must also have pointers to its neighbors.
  2. Nodes in a linked list might be arranged in random order in memory, rather than sequentially, and sequential access is usually faster than random access.
  3. Traversing a linked list requires more instructions than traversing an array, and more instructions are always slower.
  4. Traversing a linked list just requires accessing memory, whereas traversing an array always require multiplication instructions to access elements by index, and multiplication is faster than memory access.
  5. None of the above.

2pt 2 is the most important answer; it is the effect we observed in class, and has the most significant impact in quantitative terms. 1 is also pretty significant. 3 is not correct—more instructions are not always slower: some instructions are slower than others—but that’s a bit of a quibble. 4 is just wrong.

QUESTION 1D. List the typical segments of memory in increasing order by address on x86-64 Linux.

3pt code (or text or read-only data), data, heap, stack

You can show this for yourself with a program like datarep3/segments.

QUESTION 1E. Give three different examples of types of undefined behavior.

3pt Signed integer overflow. Dereferencing a null pointer. Reading or writing outside the bounds of an object. Forming a pointer far beyond the bounds of an array. Dereferencing an element outside the bounds of an array. Using an invalid iterator. There are many, many others; here’s a pretty full list for C.

QUESTION 1F. Assuming that the following code does not execute undefined behavior, what is the minimum value for N (the number of elements in array a)?

int a[N];
int* x = a + 100;
x += 10;
int* y = &x[-5];
printf("%d value, %zd difference\n", *y, x - y);

2pt The array must have at least 110 elements. Note that the first version of the question asked for the “size”, which is ambiguous, so some the answer 440 = 110 * sizeof(int) is also valid.

2. Validity (10 points)

You are helping Emily St. John Mandel debug her implementation of pset1. Her code includes the following global variables:

// Key: pointer to start of allocation
// Value: padded size of allocation in bytes
std::map<void*, size_t> active_allocs;
std::map<void*, size_t> freed_allocs;

const int MAGIC_FOUR_BYTE_VALUE = 0xABBACAFE; // For wild write detection

QUESTION 2A. Emily’s code produces a segmentation fault in one of the tests. A sanitizer run crashes on line 19 of an m61_insert_and_coalesce helper function:

void m61_insert_and_coalesce(void* ptr, size_t sz) {
    auto it = freed_allocs.lower_bound(ptr);
    if (it != freed_allocs.begin()) {
        --it;
    }
    
    if (it != freed_allocs.end() && (char*) it->first + it->second == (char*) ptr) {
        // Coalesce down
        it->second += sz;
    } else {
        // Can’t coalesce down, so insert
        freed_allocs[ptr] = sz;
        it = freed_allocs.find(ptr);
    }
    
    // Try coalescing up.
    auto next = it;
    ++next;
    if ((char*) it->first + it->second == next->first) {
        it->second += next->second;
        freed_allocs.erase(next);
    }
}

What is the problem with Emily’s code? Describe how to fix it.

2pt Line 19 does not check whether next == freed_allocs.end(), but dereferences next. You could fix it by adding that comparison.

auto next = it;
++next;
if (next != freed_allocs.end() && (char*) it->first + it->second == next->first) {
    it->second += next->second;
    freed_allocs.erase(next);
}

Please note that an early version of this question was buggy (in that Emily’s code was correct); any student that was tested with the buggy version received full credit for this question.

QUESTION 2B. Having fixed that problem, she sees different sanitizer error messages, on line 11 of m61_malloc.

void* m61_malloc(size_t sz, const char* file, int line) {
    size_t padded_size = add_padding(sz);
    assert(padded_size >= sz + 4 && padded_size % alignof(std::max_align_t) == 0);
    void* ptr = m61_find_free_space(padded_size);
    if (!ptr) {
        return nullptr;
    }

    // Place magic_word after sz bytes to detect wild writes.
    char* magic_ptr = (char*) ptr + sz;
    *((int*) magic_ptr) = MAGIC_FOUR_BYTE_VALUE;

    active_allocs[ptr] = padded_size;
    return ptr;
}

What is the problem with Emily’s code? Describe how to fix it.

4pt To detect the simplest wild write, in which the user writes directly after the allocated block, MAGIC_FOUR_BYTE_VALUE must be placed at ptr + sz. (Placing it at ptr + padded_sz might be too far away, and would bleed into the next allocation.) But it’s not guaranteed that ptr + sz == magic_ptr is at an address properly aligned for integers. Without that guarantee, dereferencing magic_ptr as an int* is an alignment violation that a sanitizer is able to detect.

(A common incorrect response on this midterm was that the cast from char* to int* between lines 10 and 11 is illegal. The cast itself is valid, however; the dereference is what triggers the alignment violation.)

Potential fixes to Emily's code include:

  • Use a 1-byte magic word (although that’s less robust than a multi-byte magic word).
  • Set each byte of the magic word individually, because all addresses are aligned for chars.
  • memcpy((int*) magic_ptr, (int*) &MAGIC_FOUR_BYTE_VALUE, sp);

We did not accept as correct answers that would have caused compiler errors rather than sanitizer errors. For example, some students said that Emily’s code referred to the wrong magic value (because one version had a const int magic_word definition). But this problem, if it caused any error, would cause a compiler error: the code would not compile.

QUESTION 2C. After resolving that problem, Emily observes that sometimes a call to m61_free will corrupt a nearby allocation. For example:

size_t n = ...; /* some number > 0 */
void* a = m61_malloc(n);
memset(a, 'a', n);
void* b = m61_malloc(11);
memset(b, 'b', 11);
m61_free(a);
assert(memcmp(b, "bbbbbbbbbbb", 11) == 0); // Fails!

Here is her m61_free code. We’ve left out the code to detect invalid and double frees, which is not relevant here.

void m61_free(void* ptr) {
    // ... detect & handle `nullptr` and invalid and double frees ...
    auto sz = active_allocs[ptr];
    auto max_alignment = alignof(std::max_align_t);
    size_t sz_plus_magic = sz + sizeof(int);
    size_t padded_size = sz_plus_magic + max_alignment - (sz_plus_magic % max_alignment);
    // Scribble over data to discourage use-after-free
    memset(ptr, '!', padded_size);
    active_allocs.erase(ptr);
    freed_allocs.insert({ptr, padded_size});
}

Give a possible problem with Emily’s code, and describe how to fix it.

4pt m61_free(a) overwrites b. There’s only one line of code in m61_free that could overwrite a nearby allocation, namely line 8, the memset. So we must assume that padded_size is wrong; if padded_size were too big—larger than the allocation—then the memset could overwrite data belonging to the next allocation in memory. It’s a bad sign that padded_size is being computed a different way here than it was in m61_malloc, above! There are two separate potential problems.

  1. m61_malloc stores the padded size in the active_allocs array. Thus, when m61_free adds another sizeof(int) on line 5, it is double-counting the magic number.
  2. m61_malloc uses an add_padding function to compute its padded size. Lack of consistency is a big danger; why aren’t we re-using that function here? It’s possible that m61_free’s calculation is wrong. For instance, the calculation in m61_free will add extraneous padding to allocations whose sz_plus_magic is a multiple of 16.

We gave full credit to either answer.

3. Internal metadata (15 points)

The following internal metadata implementation places a header struct at the start of every memory allocation. The payload (the space requested by the user) immediately follows the header. MAGIC_ALLOC and MAGIC_FREE are 8-byte constants defined elsewhere.

struct header {
    size_t magic;        // MAGIC_ALLOC or MAGIC_FREE
    size_t size;         // Size of allocation, including padding and header
    size_t prev_size;    // Size of *previous* allocation; 0 if no previous allocation
    size_t payload_size; // Size of payload (user-requested size)
    char* file;
    int line;
};

QUESTION 3A. What is the alignment of the header struct?

5pt for A-C 8 bytes.

QUESTION 3B. What is the size of the header struct?

48 bytes.

QUESTION 3C. How many bytes of padding does the header struct contain, and where?

4, after line.

QUESTION 3D. Can the members of the header struct be rearranged at all to make the struct smaller?

1pt No. (Explanation not required of students:) The size_ts and char* are 8 bytes each. No matter where the 4-byte int goes, an additional 4 bytes of padding are required so that address of the next member is aligned.

QUESTION 3E. Assuming that the first header struct appears at the beginning of default_buffer.buffer, how many bytes of padding are required after a header struct to maintain correct alignment for user payloads?

1pt 0. The x86-64 maximum alignment is 16, and 48 is a multiple of 16.

QUESTION 3F. Complete this function, which returns the payload pointer corresponding to a given header pointer.

void* header_to_payload(header* hdr) {

}

3pt for F-G return hdr + 1;

QUESTION 3G. Complete this function, which returns the header pointer corresponding to a given payload pointer.

void* payload_to_header(void* payload) {

}

return (header*) payload - 1;

QUESTION 3H. Complete this function, which given a header pointer hdr, returns a pointer to the next allocation in default_buffer, or nullptr if hdr is the last allocation in the buffer.

header* next_header(header* hdr) {

}

3pt

char* next = (header*) ((char*) hdr + hdr->size);
if (next < default_buffer.buffer + default_buffer.size) {
    return (header*) next;
} else {
    return nullptr;
}

QUESTION 3I. Complete this function, which given a header pointer hdr, returns a pointer to the previous allocation in default_buffer, or nullptr if hdr is the first allocation in the buffer.

header* previous_header(header* hdr) {

}

2pt

if (hdr->prev_size == 0) {
    return nullptr;
} else {
    return (header*) ((char*) hdr - hdr->prev_size);
}

4. Invariants and allocators (20 points)

Louise Bogan’s problem set 1 implementation uses external metadata. She has also written several data representation invariant checkers. Here are her data structures and checker functions.

struct allocation {    
    size_t size;         // size requested by user
    unsigned padding;    // additional padding beyond `size`, e.g.,
                         // for boundary write detection and/or alignment
    bool free;           // true iff allocation is free
};

std::map<char*, allocation> allocs;   // address -> allocation


void check_addresses_in_buffer() {
    char* buf_first = default_buffer.buffer;
    char* buf_last = buf_first + default_buffer.size;
    for (auto& a : allocs) {
        char* ptr = a.first;
        assert(ptr != nullptr);
        assert(ptr >= buf_first);
        assert(ptr <= buf_last);
        assert(ptr + a.second.size + a.second.padding <= buf_last);
    }
}

void check_addresses_aligned() {
    for (auto& a : allocs) {
        uintptr_t addr = (uintptr_t) a.first;
        assert(addr % alignof(std::max_align_t) == 0);
    }
}

void check_sizes_valid() {
    for (auto& a : allocs) {
        assert(a.second.size + a.second.padding <= default_buffer.size);
    }
}

void check_non_overlapping() {
    char* last = default_buffer.buffer;
    for (auto& a : allocs) {
        char* ptr = a.first;
        assert(ptr >= last);
        last = ptr + a.second.size + a.second.padding;
    }
}

void check_contiguous() {
    char* last = default_buffer.buffer;
    for (auto& a : allocs) {
        char* ptr = (char*) a.first;
        assert(ptr == last);
        last = ptr + a.second.size + a.second.padding;
    }
}

QUESTION 4A. Some of these checker functions are redundant. (A checker F is redundant when another checker, G, checks everything that F checks and more.) List the redundant checker(s) and say which checkers they are redundant with.

3pt check_sizes_valid is redundant with check_addresses_in_buffer.
check_non_overlapping is redundant with check_contiguous.

QUESTION 4B. Add an assertion to the check_contiguous function that also checks that the allocs map is complete, meaning that the entire space of the default_buffer is included in allocs. Give the line number where the assertion should go.

3pt assert(last == default_buffer.buffer + default_buffer.size); after line 51

QUESTION 4C. Write an assertion that would hold if Louise implements boundary write detection, but would not necessarily hold if she did not. This assertion could go in check_sizes_valid. Give the line number where the assertion should go.

2pt assert(a.second.padding > 0); after line 32

QUESTION 4D. Louise’s m61_free, shown below, is meant to check for invalid or double frees. Write an expression that evaluates to true on line 83 for most invalid or double frees (and never evaluates to true for valid frees). The expression should use only the structures Louise already defined.

void m61_free(void* ptr) {
    if (ptr != nullptr) {
        auto it = allocs.find((char*) ptr);
        // ...Check for invalid or double free...
        it->second.free = true;
    }
}

2pt it == allocs.end() || it->second.free

QUESTION 4E. Louise is currently using this function to make an allocation.

static char* find_allocation(size_t sz, size_t required_padding) {
    assert((sz + required_padding) % alignof(std::max_align_t) == 0);
    check_all_invariants(); // calls all checker functions
    for (auto it = allocs.begin(); it != allocs.end(); ++it) {
        // Is this allocation free & does it fit the space we need?
        if (it->second.free && sz <= it->second.size) {
            // Find this allocation’s boundaries
            char* ptr = it->first;
            char* end_ptr = ptr + it->second.size + it->second.padding;
            // Split off the remaining space
            char* split_ptr = ptr + sz + required_padding;
            allocs.insert(it, {split_ptr, {end_ptr - split_ptr, 0, true}});
            // Adjust current allocation
            it->second.size = sz;
            it->second.padding = required_padding;
            it->second.free = false;
            // Check invariants and return
            check_all_invariants();
            return ptr;
        }
    }
    return nullptr;
}

She observes that when using this function, none of her invariants ever fail, but her allocator seems to grow slower and slower over time. For instance:

for (int i = 0; i != 1000000; ++i) {
    void* ptr = m61_malloc(10);
    m61_free(ptr);
}
void* ptr = m61_malloc(20); // This takes far longer than expected!

Why is this? Explain briefly, referring to specific lines in Louise’s code.

3pt Louise always splits off the remaining space, even when there is 0 remaining space. She is larding up her allocs map with zero-sized allocations. Lines 110–111 should not be executed if split_ptr == end_ptr.

QUESTION 4F. Suggest an invariant that would have caught this slowing-down problem and write a C++ assertion that would eventually fail with Louise’s problematic code. You can write a new assertion function or give a line number from the above assertions where the new assertion would go.

1pt You could assert that every allocation has nonzero padded size.

QUESTION 4G. Louise also observes that when using her find_allocation function, m61_malloc sometimes returns nullptr even when there should be enough space available. For instance:

void* ptr1 = m61_malloc(1);
void* ptr2 = m61_malloc(default_buffer.size - 16);
assert(ptr1 && ptr2);
// The whole buffer is occupied, so any allocation will fail.
void* ptr3 = m61_malloc(2);
assert(!ptr3);
// After we free an allocation, though, there should be space—but there’s not!
m61_free(ptr1);
ptr3 = m61_malloc(2);
assert(ptr3); // Assertion fails!

Why is this? Explain briefly, referring to specific lines in Louise’s code.

2pt Louise only checks sz against the size of the freed allocation on line 105, whereas she probably should compare sz <= it->second.size + it->second.padding.

QUESTION 4H. Louise’s find_allocation has a “time bomb” in it—a problem that doesn’t occur the way it is currently used, but that could occur if it were called with valid, but unusual, arguments. Specifically, find_allocation(1, 31) would cause a problem in some situations. Why is this? Explain briefly, referring to specific lines in Louise’s code.

2pt Line 105 should test that sz + required_padding <= it->second.size + it->second.padding. This isn't a problem for Louise normally because normally required_padding is always the minimum required.

QUESTION 4I. Write a checker function that checks that Louise’s allocations are completely coalesced, meaning that there are no adjacent free allocations.

2pt

void check_completely_coalesced() {
    bool last_free = false;
    for (auto& a : allocs) {
        assert(!(last_free && a.second.free));
        last_free = a.second.free;
    }
}

5. Assembly special cases (15 points)

All the questions in this section concern function assembly generated by C++ compilers for a conventional Linux x86-64 executable.

QUESTION 5A. List at least two different reasons that assembly for a function might not refer to the %rdi register.

1pt; full credit for at least one correct answer The function might not take arguments, or it might pass its first argument to another function. (In either case it must also not use %rdi as a temporary.)

Please note: a reference to %edi counts as a reference to %rdi because it is the same physical register; %edi just specifies bits 0–31 of %rdi.

QUESTION 5B. List at least two different reasons that assembly for a function that takes zero arguments might refer to the %rdi register.

1pt; same The function might pass an argument to another function, or it might use %rdi as a temporary.

QUESTION 5C. List at least two different reasons that assembly for a function might not include a ret instruction.

1pt; same The function might end with an infinite loop (and therefore never need to return), or it might jump to the start of another function (i.e., tail call elimination).

Please note: a void function does typically include a ret instruction. It just doesn't put anything meaningful in %rax before the call to ret.

QUESTION 5D. Any function that refers to the %rbx register always explicitly saves %rbx’s initial value and restores it to that value before returning. Why is this? List all that apply; a fully correct response will list at least two answers.

  1. The %rbx register must always have a fixed value when a function returns.
  2. The %rbx register is callee-saved.
  3. The %rbx register is caller-saved.
  4. The calling convention does not assign the %rbx register any meaning at function entry, so its initial value is meaningless in the context of the function.
  5. The %rbx register is the base pointer, and is used by debuggers to implement backtraces.
  6. Other (describe briefly).

2pt 2, 4

Recall that %rbx is general-purpose register, not a special-purpose register. It was commonly confused with %rbp on this midterm.

QUESTION 5E. Some functions that refer to the %rsp register do not explicitly save and restore its initial value. Why is this? List all that apply.

  1. Some functions’ local variables can fit entirely in registers.
  2. Some function arguments may be stored on the stack by the caller.
  3. Some functions never return.
  4. Some functions use a different calling convention.
  5. Some functions’ local variables can fit in the “red zone.”
  6. None of the above.

1pt 1, 2, 3, 5. Full credit for 3 or more

QUESTION 5F. How have the compilers we’ve seen taken advantage of undefined behavior? List the best answers; explain briefly if unsure.

  1. If a compiler detects undefined behavior, it can refuse to compile the function.
  2. If a compiler can prove that an argument value causes undefined behavior within a function, it can assume that the argument will never have that value.
  3. If a compiler believes that undefined behavior might occur, it can perform fewer optimizations.
  4. After a pointer is dereferenced, the compiler can assume that the pointer value is not nullptr.
  5. After a unsigned addition, the compiler can assume there was no integer overflow.
  6. None of the above.

2pt 2, 4. (1 and 3 are true in theory, but they are not good answers because no well-known compiler acts that way. 5 is not true.)

QUESTION 5G. In the remaining parts, we give function assembly generated by different buggy C++ compilers. Your job is to briefly explain how you can tell from the assembly that the relevant compiler is buggy.

How can you tell that this compiler is buggy?

_Z1fv:
    movq $1, %rdi
    callq _Z1gl
    ret

1pt It doesn’t align the stack before callq.

QUESTION 5H. How can you tell that this compiler is buggy?

_Z1fv:
    movq $1, %rdi
    pushq %rax
    callq _Z1gl
    cmp %rax, %rdi
    popq %rdi
    je .L2
    movq $1, %rax
    retq
.L2:
    xorl %eax, %eax
    retq

1pt It refers to %rdi after callq, but %rdi is caller-saved, so after the function _Z1gl returns %rdi’s value is meaningless.

A common incorrect answer was that if you push a register onto the stack, then you have to use the same register when you pop off of the stack. That is not true. The pushq %rax on line 3 and popq %rdi on line 6 are valid; the registers in these instructions are simply the source (for push) or destination (for pop) of values when modifying the stack segment. In this code, the values pushed and popped are not actually used—the push and pop are just there for stack alignment.

QUESTION 5I. How can you tell that this compiler is buggy?

_Z1fv:
    movq $1, %rdi
    pushq %rdi
    callq _Z1gl
    movq (%rsp), %rdi
    cmp %rax, %rdi
    je .L2
    movq $1, %rax
    retq
.L2:
    xorl %eax, %eax
    retq

1pt It pushes 8 bytes to align the stack pointer for the call to g, but does not pop to restore it before returning.

QUESTION 5J. The remaining three compilers are generating assembly for this source code:

int calc(int a, int b) {
    if (a > b) {
        return a + b;
    } else {
        return a + b + 1;
    }
}

How can you tell that this compiler is buggy?

_Z4calcii:
    cmpl %esi, %edi
    movl (%rdi,%rsi), %eax
    jg .L2
    addl $1, %eax
.L2:
    ret

4pts for J-L The mov instruction dereferences its source, but neither %rdi nor %rsi is a pointer.

QUESTION 5K. (Same source code as above) How can you tell that this compiler is buggy?

_Z4calcii:
    cmpl %esi, %edi
    leal (%rdi,%rsi), %eax
    ja .L2
    addl $1, %eax
.L2:
    ret

ja is unsigned comparison, but these are signed ints.

QUESTION 5L. (Same source code as above) How can you tell that this compiler is buggy?

_Z4calcii:
    movq %rdi, %rax
    cmpl %esi, %eax
    addl %esi, %eax
    jg .L2
    addl $1, %eax
.L2:
    ret

This performs an addl after the cmpl, so the jg will use the wrong flags.

6. Data representation inference (10 points)

For each question, write one to three assembly instructions that might imply that the containing function has the given property. Explain briefly if unsure.

QUESTION 6A. The first argument to the function has type int or unsigned int (but not long, unsigned long, or a pointer type).

5pt for A-E Example: movl %edi, %eax. We can tell that it’s not an 8-byte type because the mov is a 32-bit move (we can see that from the l suffix as well as the %e__ register names).

QUESTION 6B. The first argument to the function has type short* (not unsigned short*).

Examples: movswl (%rdi), %eax, movw (%rdi), %ax. Any instruction that dereferences %rdi, and moves a 16-bit value into or out of that location.

A common mistake on this midterm was to confuse the size of a pointer with the size of the type that the pointer refers to. A pointer of any type is 64 bits. While a short is 16 bits, a short* is 64 bits. Here we want to imply a pointer, so we refer to the full 64-bit register name. It is incorrect to use the 16-bit register name %di.

QUESTION 6C. The first argument to the function is a pointer to an array of point structures, where a point is defined as:

struct point {
    int x;
    int y;
};

Example: movl 4(%rdi,%rsi,8), %eax. We are looking for an index into an array of 8-byte objects (thus the shift component 8), but where 4 of those bytes might be worth loading independent of the other 4 bytes.

QUESTION 6D. The first argument to the function has type char* or unsigned char*.

movzbl (%rdi), %eax

QUESTION 6E. The first argument to the function has type int**.

movq (%rdi), %rcx; movl (%rcx), %rax At least two instructions are required to demonstrate the pointer-to-pointer effect.

QUESTION 6F. The first argument to the function has type long or unsigned long (but not a pointer type).

5pt for F-I Example: shlq $2, %rdi—it is extremely unlikely that left-shifting a pointer would produce a useful value. Other examples might use other mathematical operations unlikely to apply to pointers, such as multiplications or divides or comparisons against a numeric constant like 0 or 100.

QUESTION 6G. The first argument to the function has type unsigned long (but likely not long or pointer type).

cmpq $10, %rdi; ja .L2. The ja indicates unsigned comparison; so would jb, jae, jbe, seta, etc.

QUESTION 6H. The return value for the function has integral type (but likely not pointer type).

movq $10, %rax; ret, because 10 is almost certainly an invalid pointer value.

QUESTION 6I. The return value for the function has size greater than 8.

movq $10, %rax; movq $11, %rdx; ret. The move into %rdx is useless unless the function is returning a large object; %rdx is caller-saved, but used for return values 8–16 bytes big.

7. Addition is magic (15 points)

For each question, complete the given assembly so that it correctly implements this C++ function:

long add(long a, long b) {
    return a + b;
}

The completed assembly function must produce the correct answer according to the abstract machine. For full credit, you must write one instruction for each numbered blank. Your instructions must not include ret 😉, and you should not reuse any instruction line. (You can reuse an instruction opcode, like addl, more than once; just don’t repeat an instruction line, which includes both opcode and instruction arguments.)

QUESTION 7A.

_Z3addll:
    endbr64
    ____1____
    ret

3pt leaq (%rdi,%rsi), %rax

QUESTION 7B.

_Z3addll:
    endbr64
    movq %rdi, %rax
    xorl %edi, %edi
    ____1____
    ret

3pt addq %rsi, %rax

QUESTION 7C.

_Z3addll:
    endbr64
    testq %rsi, %rsi
    je .L2
.L3:
    incq %rdi
    ____1____
    ____2____
.L2:
    ____3____
    ret

3pt If %rsi is zero, the function jumps to .L2. It must return %rdi in that case, so slot 3 most hold movq %rdi, %rax.

The other instructions indicate a loop where %rdi is incremented once per iteration. So the corresponding C++ code might be (ignoring integer overflow):

long add(long a, long b) {
    while (____2____) {
         ++a;
         ____1____
    }
    return a;
}

The sensible answers are decrementing b in slot 1 and comparing b != 0 in slot 2.

Other tricky, less-good answers are possible; for instance, 1 and 2 could be leaq -1(%rdi,%rsi), %rax; ret.

QUESTION 7D.

_Z3addll:
    endbr64
    movq $1, %rdx
.L2:
    testq %rdx, %rsi
    ____1____
    addq %rdx, %rdi
.L3:
    ____2____
    jnc .L2
    movq %rdi, %rax
    ret

2pt jz .L3; shlq %rdx

A good first line to focus on is line 5, which, the first time through, computes testq $1, %rsi—a bitwise and of %rsi and 1. The result will be zero if and only if %rsi’s lowest-order bit is set. We don’t know what the jump does yet, but line 7, if executed, will perform %rdi += 1.

Now, lines 5 and 7 are part of some loop, as we can see at line 10. The loop exits on unsigned overflow, after which the final value of %rdi is returned.

The intuitive leap here is that this function is performing bitwise addition, adding bits one at a time to a based on whether they are set in b. That means the loop will need to cover all bits (the left-shift does this), and the addq should be skipped when a given bit is not on in b (the jz does this).

The C++ version of the above code might be:

long add(long a, long b) {
    long rdx = 1;
    while (rdx != 0) {
        if (b & rdx) {
            a += rdx;
        }
        rdx <<= 1;
    }
    return a;
}

QUESTION 7E.

_Z3addll:
    endbr64
    ____1____
    ____2____
    orq %rdi, %rsi
    ____3____
    movq %rsi, %rax
    ret

2pt movq %rdi, %rcx; andq %rsi, %rcx; addq %rcx, %rsi

This solution uses the identity a + b == (a | b) + (a & b).

QUESTION 7F.

_Z3addll:
    endbr64
    ____1____
    ____2____
    ____3____     # must be a branch instruction
    ret
.L9:
    pushq   %rax
    movq    %rsi, %rdx
    movq    %rdi, %rsi
    leaq    .Lubsan_data0(%rip), %rdi
    call    __ubsan_handle_add_overflow_abort@PLT

2pt movq %rsi, %rax; addq %rdi, %rax; jo .L9

Per the abstract machine, signed overflow is undefined behavior, so the assembly can choose to catch it or it can choose to do nothing. In this case the the assembly does handle it. Recall that the carry flag (CF) is set iff the result overflowed when considered as unsigned, while the overflow flag (OF) is set iff the result overflowed when considered as signed. Both jo (corresponding to OF) and jc (corresponding to CF) were accepted on this midterm, but because the inputs to the C++ function add are considered as signed, jo is the expected jump instruction.

8. The end (5 points)

QUESTION 8A. What improvements would you suggest to the class? Any answer but no answer will receive full credit.

5pts for A-B

QUESTION 8B. What topics do you wish we had spent more time on? Any answer but no answer will receive full credit.