CS 61
  • Info
    Course Description Course Staff Extension Schedule Style Textbook Setup: GitHub Setup: Docker Setup: VM C and C++ Patterns Diff File Descriptors GDB Commands GDB Introduction Git HTTP BNF grammars Midterm
  • Problem sets
    Problem set 0 Problem set 1 Data representation exercises Problem set 2 Assembly exercises Problem set 3 Kernel exercises Problem set 4 Storage exercises Problem set 5 Process control exercises Problem set 6 Synchronization exercises
  • Lectures
    Data representation 1: Introduction Data representation 2: Sizes and layout Data representation 3: Layout and allocators Data representation 4: Pointer and integer arithmetic Data representation 5: Undefined behavior and bits Data representation 6: Code representation Assembly Assembly 1: Data movement and arithmetic Assembly 2: Calling convention and control flow Assembly 3: Branching Assembly 4: Sanitizers, attacks, kernel interaction Kernel Kernel 1: Processes, isolation, robustness Kernel 2: Kernel isolation, protected control transfer Kernel 3: Virtual memory EthiCS: Unicode EthiCS: UTF-8 Storage Storage 1: Caches, memory hierarchy, buffer cache, system calls Storage 2: Costs and benefits, stdio Storage 3: Multi-slot caches and mmap Storage 4: Processor caches and forking Process control Processes 1: Basics Processes 2: Inter-process communication Processes 3: Interruption and race conditions Synchronization Synchronization 1: Atomicity Synchronization 2: Condition variables and lost wakeups Synchronization 3: Databases Synchronization 4: Synthesis
  • Sections
    Section 1: C++ data structures Section 2: Memory bugs Section 3: Data representation exercises Section 4: Fun Section 5: Assembly exercises Section 6: Virtual memory Section 7: Kernel exercises Section 8: Single-slot cache Section 9: Storage exercises Section 10: Command line representation Section 11: Process control exercises Section 12: Threads and lock granularity
  • Schedule

2021 Midterm

Show all solutions Hide all solutions

Rules

The exam is open book, open note, open computer. You may access the book and your own notes. You may also use computers or electronic devices to access your own class materials and public class materials. Specifically:

  • You may access a browser and a PDF reader.
  • You may access your own notes and problem set code electronically.
  • You may access Internet sites on which your own notes and problem set code are stored.
  • You may access the course site.
  • You may access pages directly linked from the course site, including our lecture notes, exercises, and section notes.
  • You may access our lecture, problem set, and section code.
  • You may run a C or C++ compiler, including an assembler and linker.
  • You may access manual pages and common utilities, including calculators and a Python interpreter.

However:

  • You may not access class discussion forums or other question forums such as Stack Overflow.
  • You may not access an on-line disassembler or compiler explorer.
  • You may not access solutions from any previous exam, by paper or computer, except for those on the course site.
  • You may not broadly search the Internet for answers to questions or access other courses’ materials. Stick to our course site.
  • You absolutely may not contact other humans with questions about the exam—whether in person, via IM, or by posting questions to forums—with the exception of course staff.

Any violations of this policy are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Additionally, students are taking the exam at different times. Do not post publicly about the exam until given permission to do so by course staff. This includes general comments about exam difficulty.

Completing your exam

You have 3 hours to complete the exam starting from the time you press Start. Different students may have different exams. Enter your answers directly in this form; a green checkmark appears when an answer is saved.

Students with extra time accommodations: If your extra time is not reflected next to the start button, contact course staff and wait for confirmation before pressing it.

You should not need to attach a drawing or other longer notes, but you may do so by adding them to a midterm subdirectory in your cs61-f21-psets repository. Make sure you push your changes to GitHub before time is up, and explain in this form where we should look at your notes.

Notes

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Sizes and alignments (10 points)

QUESTION 1A. What are the size and alignment of x?

const char* x = "Hello";

1pt 8, 8. x is a pointer.

QUESTION 1B. What are the size and alignment of x?

const char x[6] = "Hello";

1pt 6, 1. Now x is an array of six characters.

QUESTION 1C. What are the size and alignment of x?

unsigned x = 0xFEED;

1pt 4, 4.

QUESTION 1D. What are the size and alignment of x?

struct {
    int a;
    char b;
} x;

2pt 8, 4. The struct ends with 3 bytes of padding after b to ensure that its size is a multiple of its alignment.

QUESTION 1E. In this question, we model the C++ struct layout rules in code.

struct struct_layout {
    size_t off = 0;           // current byte offset in struct
    size_t align = 0;         // alignment of struct, considering members seen so far
};

// add_member(sl, sz, align)
//    Model adding a member with size `sz` and alignment `align` to the struct.
//    Returns the offset of the added member and updates `*sl`.
size_t add_member(struct_layout* sl, size_t sz, size_t align) {
    if (sl->off % align != 0) {
        /* HOLE 1 */
    }
    size_t result = sl->off;
    /* HOLE 2 */
    return result;
}

Here is an example of how this function should work. In this example, we make calls to add_member that model struct x:

struct x {
    int a;
    char b[7];
    long c;
};

struct_layout sl;
size_t a_offset = add_member(&sl, sizeof(x::a), alignof(x::a)); // == add_member(&sl, 4, 4)
assert(a_offset == 0);
size_t b_offset = add_member(&sl, sizeof(x::b), alignof(x::b)); // == add_member(&sl, 7, 1)
assert(b_offset == 4);
size_t c_offset = add_member(&sl, sizeof(x::c), alignof(x::c)); // == add_member(&sl, 8, 8)
assert(c_offset == 16);
size_t struct_size = add_member(&sl, 0, sl.align); // padding at end of struct
assert(struct_size == 24);
assert(sl.align == 8);

Complete the function by saying what goes in each HOLE.

3pt The first hole must bring the current offset up to a multiple of the new element’s alignment, align. The second hole must update sl->off to account for the new element’s size, and update the structure’s overall alignment.

HOLE1:
   while (sl->off % align != 0) {
       ++sl->off;
   }
   /* OR */
   sl->off += align - sl->off % align;
   /* OR */
   sl->off = (sl->off / align + 1) * align;
HOLE2:
   sl->off += sz;
   if (sl->align < align) {
       sl->align = align;
   }

Note that hole 2 must not add padding. For instance, in a struct like int a; char b; char c;, the c member appears immediately after b, with no gap; if sl->off were prematurely updated to account for end-of-struct padding, there would be a gap.

QUESTION 1F. Assuming that add_member’s arguments must always represent valid members of x86-64 structs (or the padding at the end of a struct), which of these assertions must hold for those arguments? List all that apply.

  1. assert(sl != nullptr)
  2. assert(align != 0)
  3. assert(size % align == 0)
  4. assert(size >= align)
  5. assert((align & (align - 1)) == 0)
  6. assert(size / align > 0)
  7. None of the above

2pt 1, 2, 3, 5 is the best answer.

  1. If sl == nullptr the code would segmentation-fault.
  2. No datatype has alignment 0.
  3. Every datatype’s size is a multiple of its alignment.
  4. Almost every datatype’s size is at least as big as its alignment, but there is one exception: zero-element arrays. A zero-element array of T has size 0 and alignment alignof(T).
  5. Every alignment is a power of two, and (x & (x - 1)) == 0 iff x is a power of two.
  6. This is almost always true, except again for zero-element arrays.

We accepted #4 and #6 with no penalty.

2. Tell me without telling me, integer edition (10 points)

The answer to each of these questions is one or more C++ assert statements that meet the problem’s constraints. No assert statement may execute undefined behavior. Variables with names like iN are signed integers and variables with names like uN are unsigned integers.

SAMPLE QUESTION 1. Tell me that u0 is zero without using ==.
ANSWER. assert(!u0);

SAMPLE QUESTION 2. Tell me that u1 == 2 without using any integer constants.
ANSWER. assert(u1 == sizeof(short));

QUESTION 2A. Tell me that i1 is odd (i.e., not a multiple of two) without using / or %.

5pts for A-C: +5 for all 3, +4 for 2, +2 for 1 assert(i1 & 1); Odd integers’ representations have least significant bit 1.

QUESTION 2B. Tell me that u1 == 4 * u2 without using *.

  • assert(u1 == u2 << 2);
  • assert(u1 == u2 + u2 + u2 + u2);

QUESTION 2C. Tell me that u1 == ~u2 + 1 without using ~ or +.

assert(u1 == -u2);

QUESTION 2D. Tell me that i1 != u1 without using any comparison operator (==, !=, <, >, <=, >=).

1pt

  • assert(i1 - u1); (which implicitly checks that i1 - u1 != 0)
  • assert(i1 ^ u1);

QUESTION 2E. Tell me that (u1 + u2) != (u1 | u2) without using | or +.

1pt The statement holds iff u1 + u2 contains a carry, and that is true iff there is a bit position that is 1 in both u1 and u2.

  • assert(u1 & u2);
  • assert((u1 & ~u2) == u1);

QUESTION 2F. Tell me that the u1th bit of u2 is nonzero without using *. (Bits are numbered starting from 0, with 0 the least significant bit.)

2pts

  • assert(u2 & (1U << u1));
  • assert((u2 >> u1) & 1);

QUESTION 2G. Tell me that u1 would fit in an unsigned short variable without overflow without using a typecast operator.

1pt assert(u1 < 65536);

3. Things to know about undefined behavior (8 points)

QUESTION 3A. True or false? If a process exits with a segmentation fault, division-by-zero fault, or other related fault, then it has executed undefined behavior.

1pt True. Segmentation faults and related faults aren’t part of the C++ abstract machine; if they happen the program is not a valid C++ program and has executed undefined behavior.

QUESTION 3B. True or false? If a process executes undefined behavior, then it must exit with a segmentation fault, division-by-zero fault, or other related fault.

1pt False. The awful thing about UB is that it can have any effect.

QUESTION 3C. True or false? The result of any integer comparison (via any of <, >, ==, !=, <=, and >=) is always well defined.

1pt True.

QUESTION 3D. True or false? Compiler optimizers can make use of reasoning about undefined behavior to remove redundant code.

1pt True, as we saw in class.

QUESTION 3E. True or false? Sanitizers work by putting the CPU (the processor) into a mode where undefined behavior triggers an exception and passes control to the kernel.

1pt False. As we saw in class, sanitizers change the instructions generated by the compiler.

QUESTION 3F. Given int i, write two different expressions E1 and E2 so that neither i + E1 nor i + E2 can cause undefined behavior for any value of i.

2pt (1pt each)

  • 0 (or 0L, 0UL, (char) 0, etc.; allow any two different forms of 0)
  • ~i
  • (long) i (many long or long long values will work)

Note that -i will not work, because if i == INT_MIN == 0x8000'0000, then -i causes undefined behavior! But it's reasonable to give credit for it anyway.

QUESTION 3G. Assume you have a pointer T* p and an integer long i. True or false? If p + i causes undefined behavior, then (long) p + i will cause undefined behavior also.

1pt False. p + i causes undefined behavior if the result is out of range for the array pointed to by p. (long) p + i causes undefined behavior only on integer overflow.

4. Compressed integer representation (15 points)

Compression—reducing the space required to store information—can reduce cost and improve transmission speed. In this problem, you’ll consider a compressed data representation for integers. A processor would not use this representation natively, but a database might use this representation to compress records before storing them to stable storage or sending them over the network (two contexts where reducing space is important).

We call the method Ldata. It stores a signed integer of arbitrary size as a length byte followed by data bytes:

           ___________ `L` bytes total ___________
          /                                       \
+---------+---------+---------+---...---+---------+
|    L    |  Data0  |  Data1  |         |Data(L-1)|
+---------+---------+---------+---...---+---------+
     ^         ^         ^                   ^
     |         |         |                   |
     |         |         |    Most significant byte of integer (bits ((L-1)*8)–(L*8-1))
     |         |         |
     |         |         +--- Second least significant byte of integer (bits 8-15)
     |         +------------- Least significant byte of integer (bits 0-7)
     +----------------------- Number of data bytes stored; must be at least 1

The most significant bit of byte Data(L-1) is the sign bit: when Data(L-1) (considered as unsigned) is ≥128, the represented integer is negative.

Here are some examples.

  • 0x01 0x00: A representation of zero. The first byte, L=0x01, says the integer has one data byte. The second byte, 0x00, is the representation of zero.
  • 0x02 0x00 0x00: Another representation of zero.
  • 0x04 0x00 0x00 0x00 0x00: Another representation of zero :)
  • 0x01 0x01: A representation of 1.
  • 0x04 0xAA 0xBB 0xCC 0x11: A representation of 0x11CCBBAA.
  • 0x01 0xFF: A representation of -1. (The represented value is negative because the sign bit of 0xFF is on.)
  • 0x02 0xFF 0x00: A representation of +255.

Several students assumed that the data was encoded as hexadecimal numbers represented in ASCII. I’m not sure how this misunderstanding arose.

QUESTION 4A. Does Ldata format use little-endian or big-endian representation for integers?

2pt Little-endian: the least-significant bits of a number are stored in lower addresses.

QUESTION 4B. Give an example of an Ldata representation (a series of bytes) of an integer that C++’s built-in integer types cannot represent on x86-64.

1pt 0x09 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10 0x10. Any number comprising 9 or more data bytes will do, with two exceptions: if bytes Data8 through Data(L-1) are all 0, this is still representable on x86-64, and if bytes Data8 through Data(L-1) are all 0xFF and the most significant bit of Data7 is set, this is also representable on x86-64.

QUESTION 4C. Write this function, which represents a signed char (that is, a signed 8-bit integer, with values between -128 and +127) in Ldata format.

// Store `i` in Ldata format, starting at `buf`.
// Returns the number of bytes stored.
size_t Ldata_store_signed_char(signed char i, unsigned char* buf) {
    ...
}

2pt The most natural representation simply stores the character in the buffer.

buf[0] = 1; buf[1] = i; return 2;

Remember that in C, char is an integer type.

QUESTION 4D. The Ldata_store_int_noc function represents a signed integer in Ldata format, but does so without actually compressing the integer.

// Store `i` in Ldata format, starting at `buf`.
// Returns the number of bytes stored.
size_t Ldata_store_int_noc(int i, unsigned char* buf) {
    buf[0] = sizeof(int);
    memcpy(&buf[1], &i, sizeof(int));
    return 1 + sizeof(int);
}

List all that apply.

  1. The function returns a value independent of i.
  2. On x86-64, the function returns 2 for any i.
  3. On x86-64, the function returns 5 for any i.
  4. On x86-64, the function returns 9 for any i.
  5. On a different (non-x86-64) architecture, the function might return 3 for some i.
  6. None of the above

1pt 1, 3, 5.

  1. The return value, 1 + sizeof(int), clearly does not depend on i’s value.
  2. On x86-64, sizeof(int) == 4, so the function returns 5.
  3. Ditto.
  4. Ditto.
  5. Yes, another architecture might have sizeof(int) == 2.

QUESTION 4E. Write lines of C++ code that call Ldata_store_int_noc and cause it to execute undefined behavior.

1pt This will cause the function to overwrite the one-character buffer:

char buf[1];
Ldata_store_int_noc(0, buf);

Another example is simply Ldata_store_int_noc(0, nullptr);.

QUESTION 4F. The following version of Ldata_store_int_noc is incorrect. Why? Refer to specific line numbers.

size_t bad_Ldata_store_int_noc(int i, unsigned char* buf) {
1:    buf[0] = sizeof(int);
2:    int* ibuf = (int*) &buf[1];
3:    *ibuf = i;
4:    return 1 + sizeof(int);
}

1pt This function violates the alignment constraint applying to int. On x86-64, int has alignment 4, so every integer pointer should have a value that’s a multiple of 4; but line 3 may store an integer at an improperly aligned location.

QUESTION 4G. The following function compresses better, but incorrectly encodes some inputs (the encoded value, when read back, represents a different integer than the input integer i). Give an example of an incorrectly encoded input and correct the function by modifying one or two lines of code.

size_t bad_Ldata_store_int(int i, unsigned char* buf) {
1:    if (i >= 0 && i < 256) {
2:        buf[0] = sizeof(unsigned char);
3:        buf[1] = i;
4:        return 2;
5:    } else {
6:        buf[0] = sizeof(int);
7:        memcpy(buf + 1, &i, sizeof(int));
8:        return 1 + sizeof(int);
9:    }
}

1pt 128 is incorrectly encoded: the function produces a representation of -128. Specifically, 128 is encoded as 0x01 0x80, so the sign bit of the most-significant byte in the representation is 1, indicating a negative number. To correct the function, modify line 1 to i >= 0 && i < 128 or, even better, i >= -128 && i < 128.

QUESTION 4H. This function aims to store an integer in the minimum number of bytes. Finish the function by saying what goes in each HOLE.

size_t Ldata_store(long i, unsigned char* buf) {
    size_t n = 0;
    while (i != 0 && i != -1) {
        ++n;
        buf[n] = i;
        /* HOLE 1 */
    }
    if (n == 0 || /* HOLE 2 */) {
        ++n;
        buf[n] = i;
    }
    buf[0] = n;
    return 1 + n;
}

2pt The line immediately before Hole 1 stores the least-significant 8 bits of i in buf[n]. Hole 1 then must shift i to the right so the next loop iteration deals with the next slice of the integer. Note that the right shift operator on signed integers in C++ shifts in copies of the sign bit, so, for example, -1 >> 8 == -1.

Hole 2’s goal is to prevent the problem highlighted in Question 4G. If the if statement ran only when n == 0, then some numbers would be encoded with the wrong sign: +128 would be encoded as 0x01 0x80, which means -128; and -256 would be encoded as 0x01 0x00, which means 0. The encoder must add another data byte to the encoding when the most-significant encoded bit disagrees with the sign bit. Thus:

HOLE1: i >>= 8;
HOLE2: (buf[n] & 0x80) != (i & 0x80)

Equivalent solutions for hole 2:

  • ((buf[n] ^ i) & 0x80)
  • ((signed char) buf[n] < 0) != (i < 0)
  • ((buf[n] & 0x80) != 0) != (i < 0)

QUESTION 4I. Write this function, which loads a long integer from Ldata representation, returning the resulting long. (If the input represents a value that overflows long, the function should return the least significant 64 bits of that value.)

// Read the Ldata integer representation starting at `*buf`.
// Return the long integer equaling that representation.
long Ldata_load(const unsigned char* buf) {
    ...
}

3pt We assigned an extra point to answers that came close to handling negative representations correctly.

long Ldata_load(const unsigned char* buf) {
    unsigned long val = 0;
    for (size_t i = 0; i < buf[0] && i < 8; ++i) {
        val += (unsigned long) buf[i + 1] << (i * 8);
    }
    // handle negative representations
    if (buf[0] < 8 && (buf[buf[0]] & 0x80) != 0) {
        val += -1L << (buf[0] * 8);
    }
    return val;
}

This alternate design uses the fact that x86-64 longs are stored in little endian representation:

long Ldata_load(const unsigned char* buf) {
    unsigned long val = 0;
    memcpy(&val, buf + 1, std::min(buf[0], sizeof(val)));
    // handle negative representations
    if (buf[0] < 8 && (buf[buf[0]] & 0x80) != 0) {
        val += -1L << (buf[0] * 8);
    }
    return val;
}

We did not assign full credit to answers that could execute undefined behavior, for instance by performing a left-shift operation with a shift amount of 64 or more.

QUESTION 4J. Design a change to the Ldata representation so that some common values can be represented using even fewer bytes.

1pt; +1 extra for creativity For example, let the representation 0x00 represent zero!

5. Debugging (10 points)

The questions in this part concern the following buggy implementation of a debugging memory allocator from pset 1.

struct metadata {
    size_t trailer_offset;
};
struct trailer {
    uintptr_t magic;
};
static size_t active_size = 0;

void* m61_malloc(size_t sz) {
M1:     size_t trailer_offset = sz;
M2:     // ensure trailer is aligned
M3:     while (trailer_offset % alignof(trailer) != 0) {
M4:         ++trailer_offset;
M5:     }
M6:     // allocate space
M7:     void* ptr = base_malloc(sizeof(metadata) + trailer_offset + sizeof(trailer));
M8:     if (ptr) {
M9:         // prepare metadata and trailer
M10:        metadata* m = (metadata*) ptr;
M11:        void* payload = m + 1; 
M12:        trailer* t = (trailer*) ((char*) payload + trailer_offset);
M13:        m->trailer_offset = trailer_offset;
M14:        trailer->magic = (uintptr_t) m;
M15:        active_size += trailer_offset;
M16:        return payload;
M17:    } else {
M18:        return nullptr;
M19:    }
}

void m61_free(void* ptr) {
F1:     metadata* m = ((metadata*) ptr) - 1;
F2:     trailer* t = (trailer*) ((char*) ptr + m->trailer_offset);
F3:     assert(t->magic == (uintptr_t) m);
F4:     active_size -= m->trailer_offset;
F5:     base_free(ptr);
}

QUESTION 5A. Write a call to m61_free that should never cause undefined behavior, but always causes undefined behavior in this implementation.

1pt m61_free(nullptr). It’s undefined to access memory at or near a null pointer.

QUESTION 5B. On line F5, m61_free passes a invalid pointer to base_free (a pointer that was not returned by base_malloc). Fix this line.

2pt It should read base_free(m). The value returned by base_malloc (on line M7) points at metadata; the equivalent pointer in m61_free is m.

QUESTION 5C. Which of these best describes the value and function of trailer::magic?

  1. The size of the allocation; used to track memory statistics
  2. The size of the allocation; used to detect invalid frees
  3. The address of the payload; used to track memory statistics
  4. The address of the payload; used to detect invalid frees
  5. The address of the base allocation; used to track memory statistics
  6. The address of the base allocation; used to detect invalid frees

1pt #6. The magic number is the address of the base allocation, seen on lines M14 and F3. It is used to detect invalid frees on line F3.

QUESTION 5D. Will m61_free detect invalid frees that are not double frees? If yes, explain briefly (1 or 2 sentences); if no, explain how to change m61_free to detect invalid frees (refer to specific line numbers).

1pt Yes; it’s unlikely that random memory will have a proper trailer magic value.

QUESTION 5E. Will m61_free detect double frees? If yes, explain briefly (1 or 2 sentences); if no, explain how to change m61_free to detect double frees (refer to specific line numbers).

2pt No, it will not, because m61_free doesn’t modify metadata to indicate that the allocation has been freed. To improve this, you could, for example, add t->magic = 0 before F4.

QUESTION 5F. Give an example of a boundary write error that a good debugging memory allocator would detect, but that this allocator would not detect. Your example should contain a call to m61_malloc and a call to m61_free.

1pt

char* ptr = (char*) m61_malloc(1);
ptr[1] = 0; // boundary write
m61_free(ptr);

This is not detected because the trailer metadata is carefully aligned to a multiple of 8 by lines M3-5, so the one-byte-off boundary write error above will miss the trailer metadata entirely.

QUESTION 5G. Does m61_malloc return properly-aligned memory? If yes, explain briefly (1 or 2 sentences); if no, explain how to change the code so it does (refer to specific line numbers).

1pt No, it does not; we assume base_malloc returns a 16-byte-aligned pointer, but we then add just 8 bytes to that address to compute the payload pointer, producing an address with (addr % 16) == 8. To fix it, you could add size_t padding to struct metadata, making it a multiple of 16 bytes big.

QUESTION 5H. Does m61_malloc correctly handle unsigned integer overflow? If yes, explain briefly (1 or 2 sentences); if no, explain how to change the code so it does (refer to specific line numbers).

1pt No, it does not. Before M7, add:

if (trailer_offset < sz
    || sizeof(metadata) + trailer_offset + sizeof(trailer) < trailer_offset) {
    return nullptr;
}

6. A minimum-overhead allocator (8 points)

In these questions, we consider an arena allocator (a special-purpose object that can allocate and free memory) whose goal is to minimize per-allocation overhead. You might use this allocator for small strings. The allocator works as follows:

  1. It requests memory from the system one large block at a time, using the aligned_alloc function.

    // aligned_alloc(align, sz)
    //    Return a pointer to `sz` bytes of memory aligned on a multiple of `align`.
    //    `align` must be a power of two, and `sz` must be a multiple of `align`.
    //    Returns `nullptr` on failure. The returned memory can be freed by passing
    //    it to `free`.
    void* aligned_alloc(size_t align, size_t sz);
    

    It then parcels out the memory in the block according to user requests.

  2. Each block starts with 4 bytes of metadata: the number of active allocations stored in that block. The rest of the block contains allocations, crammed up next to each other, with no per-allocation metadata. For example, after this code runs:

    minoverhead_arena moa;
    char* p = moa.allocate(3);
    strcpy(p, "Hi");
    
    p = moa.allocate(4);
    int i = 4;
    memcpy(p, &i, 4);
    
    for (int i = 'A'; i < 'K'; ++i) {
        p = moa.allocate(1);
        p[0] = i;
    }
    

    there will be one block in the system laid out as follows:

    address is a multiple of 8192
    |
    v
    +---+---+---+---+---+---+---+---+---+---+---+---+---+
    |      12       |'H'|'i'|0x0|0x4|0x0|0x0|0x0|'A'|'B'| =>>
    +---+---+---+---+---+---+---+---+---+---+---+---+---+
    \___ nalloc ____/
    
          +---+---+---+---+---+---+---+---+--------- ... ---------+
      =>> |'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|         .....         |
          +---+---+---+---+---+---+---+---+--------- ... ---------+
                                           \__ 8171 free bytes __/
    
  3. A block of memory is freed only after all allocations in it have been passed to minoverhead_arena::deallocate.

Allocators like this are sometimes used when it’s more important to minimize per-allocation overhead than to free underlying memory quickly.

Here’s code for the allocator. The deallocate function is the subject of some of our questions. As you work through the questions, you should not need to change the definition of block or add new members to minoverhead_arena.

struct minoverhead_arena {
    struct block {
        unsigned nalloc;
        char buf[8188];
    };

    block* current_block = nullptr;
    size_t pos = 0;

    // Return an allocation of `sz` bytes or `nullptr` on failure.
    // This allocator is specialized for byte arrays, so the returned
    // pointer may have any alignment.
    char* allocate(size_t sz) {
A1:     assert(sizeof(block) == 8192);

A2:     if (this->current_block == nullptr
A3:         || this->pos + sz > sizeof(block::buf)) {
A4:         this->current_block = aligned_alloc(sizeof(block), sizeof(block));
A5:         assert(this->current_block != nullptr);
A6:         this->current_block->nalloc = 0;
A7:         this->pos = 0;
A8:     }

A9:     char* result = &this->current_block->buf[this->pos];
A10:    this->pos += sz;
A11:    ++this->current_block->nalloc;
A12:    return result;
    }

    // Free an allocation returned by `allocate()`.
    void deallocate(char* ptr) {
D1:     block* b;
D2:     /* HOLE 1: set `b` to equal the `block` containing `ptr` */
D3:     --b->nalloc;
D4:     if (b->nalloc == 0) {
D5:         /* HOLE 2: free `b`, which is now empty */
D6:     }
    }

    ~minoverhead_arena() {
        // Destructor verifies that all allocations so far have been freed.
        assert(this->current_block == nullptr);
    }
};

QUESTION 6A. What is the largest sz that is safe to pass to minoverhead_arena::allocate?

2pt 8188—the number of bytes in block::buf. If a larger sz is passed, the allocate function returns a pointer to 8188 contiguous free bytes of memory, which is deeply unsafe (because the user will not be able to access all the requested sz bytes).

QUESTION 6B. Fix minoverhead_arena::allocate so any sz is safe to pass. Refer to specific lines of code.

2pt Before A2, add if (sz > 8188) { return nullptr; }.

QUESTION 6C. What code should go in HOLE 1 in deallocate?

2pt The allocator works by storing metadata at the beginning of a block of data. Since blocks are aligned, we can recover the block’s address by masking out the lower 13 bits.

b = (block*) ((uintptr_t) ptr & ~8191UL);
/* or */
b = (block*) ((uintptr_t) ptr - (uintptr_t) ptr % sizeof(block));

QUESTION 6D. What code should go in HOLE 2 in deallocate?

2pt

free(b);
// might have just freed the currently-filling block!
if (this->current_block == b) {
    this->current_block = nullptr;
}

Or, alternately:

if (this->current_block == b) {
    this->pos = 0;
} else {
    free(b);
}

7. Things to know about assembly (8 points)

QUESTION 7A. How many registers may be used to pass function parameters in x86-64 and what are they?

1pt Six: %rdi, %rsi, %rdx, %rcx, %r8, %r9

QUESTION 7B. How many registers may be used to pass return values in x86-64?

1pt; %rax gets full credit Recall that 16-byte return values use two registers, %rax for the least-significant bits and %rdx for the most-significant bits. But %rax is the obvious answer.

QUESTION 7C. Are any of the registers used for function parameters or return values callee-saved?

2pt No—any function call can modify those registers, so they must not be callee-saved.

QUESTION 7D. Which of these best describes the purpose of the red zone?

  1. To store function arguments before calling a function
  2. To store local variables in non-leaf functions
  3. To store local variables in leaf functions
  4. To save caller-saved registers before calling another function
  5. To save callee-saved registers before calling another function
  6. To confuse students

1pt #3. The red zone is a bit of stack space (128 bytes), located at smaller addresses than the current %rsp register, that may be used to store local variables. It doesn’t make sense to use the red zone for caller-saved registers when calling another function (#4) because the called function might clobber that portion of stack.

QUESTION 7E. List all the true statements.

  1. Stacks grow up.
  2. The ret instruction modifies the stack pointer.
  3. The mov instruction may modify the stack pointer.
  4. None of the above.

2pt 2, 3. ret always modifies the stack pointer; mov can move values into the %rsp register.

QUESTION 7F. Describe one way the calling convention for system calls on x86-64 Linux (or WeensyOS) differs from the normal C++ calling convention.

1pt For example, %rax holds the system call number.

8. Library assembly (12 points)

The following five assembly functions implement well-known functions from the C++ library.

f1:
    endbr64
    movq    %rdi, %rax
    testq   %rdx, %rdx
    je  .L2
    movl    $0, %ecx
.L3:
    movzbl  (%rsi,%rcx), %r8d
    movb    %r8b, (%rax,%rcx)
    addq    $1, %rcx
    cmpq    %rdx, %rcx
    jne .L3
.L2:
    ret
f2:
    endbr64
    movzbl  (%rdi), %eax
    testb   %al, %al
    je  .L71
.L73:
    cmpb    %sil, %al
    je  .L74
    addq    $1, %rdi
    movzbl  (%rdi), %eax
    testb   %al, %al
    jne .L73
.L71:
    testb   %sil, %sil
    movl    $0, %eax
    cmove   %rdi, %rax
    ret
.L74:
    movq    %rdi, %rax
    ret
f3:
    endbr64
    testq   %rdx, %rdx
    je  .L24
    movl    $0, %eax
.L23:
    movzbl  (%rdi,%rax), %r8d
    movzbl  (%rsi,%rax), %ecx
    cmpb    %cl, %r8b
    jne .L26
    addq    $1, %rax
    cmpq    %rax, %rdx
    jne .L23
    movl    $0, %eax
    ret
.L26:
    movl    $1, %eax
    cmovbl  $-1, %eax
    ret
.L24:
    movl    $0, %eax
    ret
f4:
    endbr64
    movq    %rdi, %rax
    testq   %rdx, %rdx
    je  .L17
    leaq    (%rdi,%rdx), %rcx
    movq    %rdi, %rdx
.L18:
    movb    %sil, (%rdx)
    addq    $1, %rdx
    cmpq    %rdx, %rcx
    jne .L18
.L17:
    ret
f5:
    endbr64
    movq    %rdi, %rax
    movl    $0, %edx
.L43:
    movzbl  (%rsi,%rdx), %ecx
    movb    %cl, (%rax,%rdx)
    addq    $1, %rdx
    testb   %cl, %cl
    jne .L43
    ret

Remember to refer to the assembly notes if you are confused by an instruction (such as cmovbl). You may (or may not) find it faster to skip to the last question and then return to the others.

QUESTION 8A. What is the most likely number of parameters for each function?

2pt; +2 for 4-5 correct, +1 for 1-3 correct f1: 3, f2: 2, f3: 3, f4: 3, f5: 2

QUESTION 8B. Which of these functions take pointer arguments? List all that apply or say “none”.

2pt All take pointer arguments.

QUESTION 8C. Which of these functions take more than one pointer argument? List all that apply or say “none”.

1pt f1, f3, f5 take two pointer arguments.

QUESTION 8D. Which of these functions might modify primary memory? List all that apply or say “none”.

1pt f1, f4, f5 modify primary memory; we can see this by looking at lines like f5’s movb %cl, (%rax,%rdx), which moves data to primary memory.

QUESTION 8E. Which of these functions make control flow decisions (conditional branches) based on values read from primary memory? List all that apply or say “none”.

1pt f2, f3, f5. f1 and f4 also contain conditional branches, but the flags they test are set based only on values read from function argument registers.

QUESTION 8F. Which of these functions use a general-purpose callee-saved register? List all that apply or say “none”. (Note that %rsp and %rip do not count.)

1pt None do.

QUESTION 8G. Which of these functions use the red zone? List all that apply or say “none”.

1pt None do.

QUESTION 8H. Each of these functions implements a C++ library function specification—specifically, for one of the functions implemented in WeensyOS. What library function corresponds to each assembly function?

3pt; +3 for 4-5 correct, +2 for 2-3 correct, +1 for 1 correct The functions are:

  1. f1 is memcpy (void* memcpy(void* dst, const void* src, size_t n)).
  2. f2 is strchr (char* strchr(const char* s, int ch)).
  3. f3 is memcmp (int memcmp(const void* s1, const void* s2, size_t n)).
  4. f4 is memset (void* memset(void* s, int ch, size_t n)).
  5. f5 is strcpy (char* strcpy(char* dst, const char* src)).

9. Disassemble (12 points)

For each of these questions, your answer should be a C++ function, including a type signature and definitions of any associated structs or types, that could compile to the given assembly.

QUESTION 9A.

_Z2g1i:
        endbr64
        movl    %edi, %eax
        shrl    $31, %eax
        ret

3pts This function shifts out all bits of its single int-sized argument (%edi), except the sign bit. The shrl instruction is a logical shift, meaning it shifts in 0 bits, so %eax will be either 0 or 1.

bool g1(int i) {
    return i < 0;
}

QUESTION 9B.

_Z2g2Pmm:
        endbr64
        movq    %rsi, %rax
        salq    $4, %rax
        movq    (%rdi,%rax), %rax
        subq    (%rdi,%rsi,8), %rax
        ret

3pts This function computes 16*%rsi and uses that as an offset into an array of 8-byte values starting at %rdi. It then subtracts from that the 8-byte value at %rdi + 8*%rsi, and returns the result. Since sizeof(long) == 8, the value stored at %rdi + 16*%rsi could be an element in an array of unsigned longs, specifically the element at index 2*%rsi.

unsigned long g2(unsigned long* a, unsigned long i) {
    return a[2 * i] - a[i];
}

(The generated assembly would be the same for long arguments; the mangled function name, _Z2g2Pmm, indicates unsigned longs. We accepted any array of 8-byte elements.)

QUESTION 9C.

_Z2g3P2ll:
        endbr64
        xorl    %eax, %eax
        testq   %rdi, %rdi
        je      .L7
.L6:
        movq    (%rdi), %rdi
        addq    $1, %rax
        testq   %rdi, %rdi
        jne     .L6
        ret
.L7:
        ret

3pts This function returns the length of a linked list.

struct ll {
    ll* next;
};
size_t g3(ll* l) {
    size_t n;
    for (n = 0; l != nullptr; ++n, l = l->next) {
    }
    return n;
}

QUESTION 9D.

_Z2g4P4statj:
        endbr64
        movl    %esi, %eax
        imull   %esi, %esi
        addq    $1, (%rdi)
        addq    %rax, 8(%rdi)
        addq    %rsi, 16(%rdi)
        ret

3pts This function appears to maintain a set of statistics by updating a structure (passed by pointer, in the first argument) according to a sample (an integer passed as the second argument).

struct stat {
    unsigned long n;
    unsigned long sum;
    unsigned long sum_sq;
};
void g4(stat* s, unsigned i) {
    s->n += 1;
    s->sum += i;
    s->sum_sq += i * i;
}

10. Miscellany (7 points)

QUESTION 10A. Given an object of pointer type T* ptr, what is the best reason to be suspicious of an expression like ptr + N * sizeof(T)?

  1. Array elements should be located using array syntax, such as &ptr[N], rather than pointer arithmetic.
  2. It probably should read ptr + N * T.
  3. Pointers and arrays in C and C++ are unrelated.
  4. Pointer arithmetic automatically includes a factor related to the size of the element type, so it probably should read ptr + N.
  5. None of the above.

2pt #4 is correct.

QUESTION 10B. Put the following values in increasing order. An answer to this question will have the form “B < A < D < C < E < F”.

  1. Zero
  2. The smallest signed char
  3. The address in the stack of the current function’s return address
  4. The address of a local variable in main
  5. The current %rip register
  6. The address of a dynamically-allocated block of memory

1pt B < A < E < F < C < D. The values are:

  1. 0
  2. The smallest signed char has value -128.
  3. The address of the return address: The return address is stored inbetween the current stack frame and the caller’s stack frame, and therefore its address is larger than the addresses of the current function’s local variables, and smaller than the addresses of any callers’ local variables.
  4. See above.
  5. Valeus in %rip are code segment addresses, which are below stack and heap segment addresses.
  6. A heap segment address.

QUESTION 10C. You are working with a program that frequently accesses arrays of color objects. You have two choices for the color definition, namely:

struct color1 {
    unsigned short red, green, blue;
};

struct alignas(8) color2 {
    unsigned short red, green, blue;
};

What are the size and alignment of color1 and color2?

1pt color1 has size 6, alignment 2; color2 has size 8, alignment 8.

QUESTION 10D. Which statements about color1 and color2 are true? List all that apply.

  1. Using color2 might generate more compact instruction sequences, because its size is compatible with x86-64’s indirect addressing modes.
  2. Using color1 will require use of an expensive multiply instruction, because its size is incompatible with x86-64’s indirect addressing modes.
  3. Using color1 might generate faster runtimes, because it uses 25% less memory.
  4. Using color2 might generate slower runtimes, because it takes time to align data.
  5. None of the above.

2pt #1, #3; but no points off for #1, #2, #3.

#1 is true: color2 can use the indirect addressing mode with scale 8, but the sizeof color1, 6, is not a valid scale.

Nevertheless, #2 is false because, as we’ve seen in some compiler-generated assembly, some array arithmetic involving oddly-sized elements is performed using combinations of lea and shl. For instance, a function like this:

color1* f(color1* a, unsigned idx) {
    return &a[idx];
}

compiles to:

leaq (%rsi,%rsi,2), %rax    ; %rax := %rsi + %rsi * 2 == 3 * %rsi
leaq (%rdi,%rax,2), %rax    ; %rax := %rdi + %rax * 2 == %rdi + 6 * %rsi
ret

#3 is true.

#4 is not really true; the compiler solves alignment issues before the program starts running.

QUESTION 10E. Which statements about process isolation are true? List all that apply.

  1. Processes can only communicate with each other using shared memory regions.
  2. Processes cannot communicate with each other at all.
  3. System call wrappers cannot be written entirely within the C++ abstract machine, because the mechanisms for communicating with an operating system are operating system specific.
  4. A CPU executing instructions on behalf of a process will continue doing so until the process voluntarily gives up control via a system call.
  5. A CPU can support as many processes as it has sets of registers.
  6. None of the above.

1pt Only #3. #1 is not true because processes can communicate in other ways, for example using files. #2 is not true because processes can communicate. #3 is true. #4 is not true because of timer interrupts. #5 is not true because it can support more processes than it has sets of registers: the OS kernel can save one process’s registers to memory while executing another process.

11. The end (5 points)

QUESTION 11A. What improvements would you suggest to the class? Any answer but no answer will receive full credit.

5pts for A-B

QUESTION 11B. What topics do you wish we had spent more time on? Any answer but no answer will receive full credit.