Midterm – CS 61 2021

Show all solutions Hide all solutions

Rules

The exam is open book, open note, open computer. You may access the book and your own notes. You may also use computers or electronic devices to access your own class materials and public class materials. Specifically:

You may access a browser and a PDF reader.
You may access your own notes and problem set code electronically.
You may access Internet sites on which your own notes and problem set code are stored.
You may access the course site.
You may access pages directly linked from the course site, including our lecture notes, exercises, and section notes.
You may access our lecture, problem set, and section code.
You may run a C or C++ compiler, including an assembler and linker.
You may access manual pages and common utilities, including calculators and a Python interpreter.

However:

You may not access class discussion forums or other question forums such as Stack Overflow.
You may not access an on-line disassembler or compiler explorer.
You may not access solutions from any previous exam, by paper or computer, except for those on the course site.
You may not broadly search the Internet for answers to questions or access other courses’ materials. Stick to our course site.
You absolutely may not contact other humans with questions about the exam—whether in person, via IM, or by posting questions to forums—with the exception of course staff.

Any violations of this policy are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Additionally, students are taking the exam at different times. Do not post publicly about the exam until given permission to do so by course staff. This includes general comments about exam difficulty.

Completing your exam

You have 3 hours to complete the exam starting from the time you press Start. Different students may have different exams. Enter your answers directly in this form; a green checkmark appears when an answer is saved.

Students with extra time accommodations: If your extra time is not reflected next to the start button, contact course staff and wait for confirmation before pressing it.

You should not need to attach a drawing or other longer notes, but you may do so by adding them to a midterm subdirectory in your cs61-f21-psets repository. Make sure you push your changes to GitHub before time is up, and explain in this form where we should look at your notes.

Notes

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Sizes and alignments (10 points)

QUESTION 1A. What are the size and alignment of x?

const char* x = "Hello";

QUESTION 1B. What are the size and alignment of x?

const char x[6] = "Hello";

QUESTION 1C. What are the size and alignment of x?

unsigned x = 0xFEED;

QUESTION 1D. What are the size and alignment of x?

struct {
    int a;
    char b;
} x;

QUESTION 1E. In this question, we model the C++ struct layout rules in code.

struct struct_layout {
    size_t off = 0;           // current byte offset in struct
    size_t align = 0;         // alignment of struct, considering members seen so far
};

// add_member(sl, sz, align)
//    Model adding a member with size `sz` and alignment `align` to the struct.
//    Returns the offset of the added member and updates `*sl`.
size_t add_member(struct_layout* sl, size_t sz, size_t align) {
    if (sl->off % align != 0) {
        /* HOLE 1 */
    }
    size_t result = sl->off;
    /* HOLE 2 */
    return result;
}

Here is an example of how this function should work. In this example, we make calls to add_member that model struct x:

struct x {
    int a;
    char b[7];
    long c;
};

struct_layout sl;
size_t a_offset = add_member(&sl, sizeof(x::a), alignof(x::a)); // == add_member(&sl, 4, 4)
assert(a_offset == 0);
size_t b_offset = add_member(&sl, sizeof(x::b), alignof(x::b)); // == add_member(&sl, 7, 1)
assert(b_offset == 4);
size_t c_offset = add_member(&sl, sizeof(x::c), alignof(x::c)); // == add_member(&sl, 8, 8)
assert(c_offset == 16);
size_t struct_size = add_member(&sl, 0, sl.align); // padding at end of struct
assert(struct_size == 24);
assert(sl.align == 8);

Complete the function by saying what goes in each HOLE.

3pt The first hole must bring the current offset up to a multiple of the new element’s alignment, align. The second hole must update sl->off to account for the new element’s size, and update the structure’s overall alignment.
HOLE1:
   while (sl->off % align != 0) {
       ++sl->off;
   }
   /* OR */
   sl->off += align - sl->off % align;
   /* OR */
   sl->off = (sl->off / align + 1) * align;
HOLE2:
   sl->off += sz;
   if (sl->align < align) {
       sl->align = align;
   }
Note that hole 2 must not add padding. For instance, in a struct like int a; char b; char c;, the c member appears immediately after b, with no gap; if sl->off were prematurely updated to account for end-of-struct padding, there would be a gap.

QUESTION 1F. Assuming that add_member’s arguments must always represent valid members of x86-64 structs (or the padding at the end of a struct), which of these assertions must hold for those arguments? List all that apply.

assert(sl != nullptr)
assert(align != 0)
assert(size % align == 0)
assert(size >= align)
assert((align & (align - 1)) == 0)
assert(size / align > 0)
None of the above

2. Tell me without telling me, integer edition (10 points)

The answer to each of these questions is one or more C++ assert statements that meet the problem’s constraints. No assert statement may execute undefined behavior. Variables with names like iN are signed integers and variables with names like uN are unsigned integers.

SAMPLE QUESTION 1. Tell me that u0 is zero without using ==.
ANSWER. assert(!u0);

SAMPLE QUESTION 2. Tell me that u1 == 2 without using any integer constants.
ANSWER. assert(u1 == sizeof(short));

QUESTION 2A. Tell me that i1 is odd (i.e., not a multiple of two) without using / or %.

QUESTION 2B. Tell me that u1 == 4 * u2 without using *.

QUESTION 2C. Tell me that u1 == ~u2 + 1 without using ~ or +.

QUESTION 2D. Tell me that i1 != u1 without using any comparison operator (==, !=, <, >, <=, >=).

QUESTION 2E. Tell me that (u1 + u2) != (u1 | u2) without using | or +.

QUESTION 2F. Tell me that the u1th bit of u2 is nonzero without using *. (Bits are numbered starting from 0, with 0 the least significant bit.)

QUESTION 2G. Tell me that u1 would fit in an unsigned short variable without overflow without using a typecast operator.

3. Things to know about undefined behavior (8 points)

QUESTION 3A. True or false? If a process exits with a segmentation fault, division-by-zero fault, or other related fault, then it has executed undefined behavior.

QUESTION 3B. True or false? If a process executes undefined behavior, then it must exit with a segmentation fault, division-by-zero fault, or other related fault.

QUESTION 3C. True or false? The result of any integer comparison (via any of <, >, ==, !=, <=, and >=) is always well defined.

QUESTION 3D. True or false? Compiler optimizers can make use of reasoning about undefined behavior to remove redundant code.

QUESTION 3E. True or false? Sanitizers work by putting the CPU (the processor) into a mode where undefined behavior triggers an exception and passes control to the kernel.

QUESTION 3F. Given int i, write two different expressions E1 and E2 so that neither i + E1 nor i + E2 can cause undefined behavior for any value of i.

QUESTION 3G. Assume you have a pointer T* p and an integer long i. True or false? If p + i causes undefined behavior, then (long) p + i will cause undefined behavior also.

4. Compressed integer representation (15 points)

Compression—reducing the space required to store information—can reduce cost and improve transmission speed. In this problem, you’ll consider a compressed data representation for integers. A processor would not use this representation natively, but a database might use this representation to compress records before storing them to stable storage or sending them over the network (two contexts where reducing space is important).

We call the method Ldata. It stores a signed integer of arbitrary size as a length byte followed by data bytes:

           ___________ `L` bytes total ___________
          /                                       \
+---------+---------+---------+---...---+---------+
|    L    |  Data0  |  Data1  |         |Data(L-1)|
+---------+---------+---------+---...---+---------+
     ^         ^         ^                   ^
     |         |         |                   |
     |         |         |    Most significant byte of integer (bits ((L-1)*8)–(L*8-1))
     |         |         |
     |         |         +--- Second least significant byte of integer (bits 8-15)
     |         +------------- Least significant byte of integer (bits 0-7)
     +----------------------- Number of data bytes stored; must be at least 1

The most significant bit of byte Data(L-1) is the sign bit: when Data(L-1) (considered as unsigned) is ≥128, the represented integer is negative.

Here are some examples.

0x01 0x00: A representation of zero. The first byte, L=0x01, says the integer has one data byte. The second byte, 0x00, is the representation of zero.
0x02 0x00 0x00: Another representation of zero.
0x04 0x00 0x00 0x00 0x00: Another representation of zero :)
0x01 0x01: A representation of 1.
0x04 0xAA 0xBB 0xCC 0x11: A representation of 0x11CCBBAA.
0x01 0xFF: A representation of -1. (The represented value is negative because the sign bit of 0xFF is on.)
0x02 0xFF 0x00: A representation of +255.

Several students assumed that the data was encoded as hexadecimal numbers represented in ASCII. I’m not sure how this misunderstanding arose.

QUESTION 4A. Does Ldata format use little-endian or big-endian representation for integers?

QUESTION 4B. Give an example of an Ldata representation (a series of bytes) of an integer that C++’s built-in integer types cannot represent on x86-64.

QUESTION 4C. Write this function, which represents a signed char (that is, a signed 8-bit integer, with values between -128 and +127) in Ldata format.

// Store `i` in Ldata format, starting at `buf`.
// Returns the number of bytes stored.
size_t Ldata_store_signed_char(signed char i, unsigned char* buf) {
    ...
}

QUESTION 4D. The Ldata_store_int_noc function represents a signed integer in Ldata format, but does so without actually compressing the integer.

// Store `i` in Ldata format, starting at `buf`.
// Returns the number of bytes stored.
size_t Ldata_store_int_noc(int i, unsigned char* buf) {
    buf[0] = sizeof(int);
    memcpy(&buf[1], &i, sizeof(int));
    return 1 + sizeof(int);
}

List all that apply.

The function returns a value independent of i.
On x86-64, the function returns 2 for any i.
On x86-64, the function returns 5 for any i.
On x86-64, the function returns 9 for any i.
On a different (non-x86-64) architecture, the function might return 3 for some i.
None of the above

QUESTION 4E. Write lines of C++ code that call Ldata_store_int_noc and cause it to execute undefined behavior.

QUESTION 4F. The following version of Ldata_store_int_noc is incorrect. Why? Refer to specific line numbers.

size_t bad_Ldata_store_int_noc(int i, unsigned char* buf) {
1:    buf[0] = sizeof(int);
2:    int* ibuf = (int*) &buf[1];
3:    *ibuf = i;
4:    return 1 + sizeof(int);
}

QUESTION 4G. The following function compresses better, but incorrectly encodes some inputs (the encoded value, when read back, represents a different integer than the input integer i). Give an example of an incorrectly encoded input and correct the function by modifying one or two lines of code.

size_t bad_Ldata_store_int(int i, unsigned char* buf) {
1:    if (i >= 0 && i < 256) {
2:        buf[0] = sizeof(unsigned char);
3:        buf[1] = i;
4:        return 2;
5:    } else {
6:        buf[0] = sizeof(int);
7:        memcpy(buf + 1, &i, sizeof(int));
8:        return 1 + sizeof(int);
9:    }
}

QUESTION 4H. This function aims to store an integer in the minimum number of bytes. Finish the function by saying what goes in each HOLE.

size_t Ldata_store(long i, unsigned char* buf) {
    size_t n = 0;
    while (i != 0 && i != -1) {
        ++n;
        buf[n] = i;
        /* HOLE 1 */
    }
    if (n == 0 || /* HOLE 2 */) {
        ++n;
        buf[n] = i;
    }
    buf[0] = n;
    return 1 + n;
}

2pt The line immediately before Hole 1 stores the least-significant 8 bits of i in buf[n]. Hole 1 then must shift i to the right so the next loop iteration deals with the next slice of the integer. Note that the right shift operator on signed integers in C++ shifts in copies of the sign bit, so, for example, -1 >> 8 == -1.

Hole 2’s goal is to prevent the problem highlighted in Question 4G. If the if statement ran only when n == 0, then some numbers would be encoded with the wrong sign: +128 would be encoded as 0x01 0x80, which means -128; and -256 would be encoded as 0x01 0x00, which means 0. The encoder must add another data byte to the encoding when the most-significant encoded bit disagrees with the sign bit. Thus:
HOLE1: i >>= 8;
HOLE2: (buf[n] & 0x80) != (i & 0x80)
Equivalent solutions for hole 2:

((buf[n] ^ i) & 0x80)

((signed char) buf[n] < 0) != (i < 0)

((buf[n] & 0x80) != 0) != (i < 0)

QUESTION 4I. Write this function, which loads a long integer from Ldata representation, returning the resulting long. (If the input represents a value that overflows long, the function should return the least significant 64 bits of that value.)

// Read the Ldata integer representation starting at `*buf`.
// Return the long integer equaling that representation.
long Ldata_load(const unsigned char* buf) {
    ...
}

3pt We assigned an extra point to answers that came close to handling negative representations correctly.

long Ldata_load(const unsigned char* buf) {
    unsigned long val = 0;
    for (size_t i = 0; i < buf[0] && i < 8; ++i) {
        val += (unsigned long) buf[i + 1] << (i * 8);
    }
    // handle negative representations
    if (buf[0] < 8 && (buf[buf[0]] & 0x80) != 0) {
        val += -1L << (buf[0] * 8);
    }
    return val;
}

This alternate design uses the fact that x86-64 longs are stored in little endian representation:

long Ldata_load(const unsigned char* buf) {
    unsigned long val = 0;
    memcpy(&val, buf + 1, std::min(buf[0], sizeof(val)));
    // handle negative representations
    if (buf[0] < 8 && (buf[buf[0]] & 0x80) != 0) {
        val += -1L << (buf[0] * 8);
    }
    return val;
}

We did not assign full credit to answers that could execute undefined behavior, for instance by performing a left-shift operation with a shift amount of 64 or more.

QUESTION 4J. Design a change to the Ldata representation so that some common values can be represented using even fewer bytes.

5. Debugging (10 points)

The questions in this part concern the following buggy implementation of a debugging memory allocator from pset 1.

struct metadata {
    size_t trailer_offset;
};
struct trailer {
    uintptr_t magic;
};
static size_t active_size = 0;

void* m61_malloc(size_t sz) {
M1:     size_t trailer_offset = sz;
M2:     // ensure trailer is aligned
M3:     while (trailer_offset % alignof(trailer) != 0) {
M4:         ++trailer_offset;
M5:     }
M6:     // allocate space
M7:     void* ptr = base_malloc(sizeof(metadata) + trailer_offset + sizeof(trailer));
M8:     if (ptr) {
M9:         // prepare metadata and trailer
M10:        metadata* m = (metadata*) ptr;
M11:        void* payload = m + 1; 
M12:        trailer* t = (trailer*) ((char*) payload + trailer_offset);
M13:        m->trailer_offset = trailer_offset;
M14:        trailer->magic = (uintptr_t) m;
M15:        active_size += trailer_offset;
M16:        return payload;
M17:    } else {
M18:        return nullptr;
M19:    }
}

void m61_free(void* ptr) {
F1:     metadata* m = ((metadata*) ptr) - 1;
F2:     trailer* t = (trailer*) ((char*) ptr + m->trailer_offset);
F3:     assert(t->magic == (uintptr_t) m);
F4:     active_size -= m->trailer_offset;
F5:     base_free(ptr);
}

QUESTION 5A. Write a call to m61_free that should never cause undefined behavior, but always causes undefined behavior in this implementation.

QUESTION 5B. On line F5, m61_free passes a invalid pointer to base_free (a pointer that was not returned by base_malloc). Fix this line.

QUESTION 5C. Which of these best describes the value and function of trailer::magic?

The size of the allocation; used to track memory statistics
The size of the allocation; used to detect invalid frees
The address of the payload; used to track memory statistics
The address of the payload; used to detect invalid frees
The address of the base allocation; used to track memory statistics
The address of the base allocation; used to detect invalid frees

QUESTION 5D. Will m61_free detect invalid frees that are not double frees? If yes, explain briefly (1 or 2 sentences); if no, explain how to change m61_free to detect invalid frees (refer to specific line numbers).

QUESTION 5E. Will m61_free detect double frees? If yes, explain briefly (1 or 2 sentences); if no, explain how to change m61_free to detect double frees (refer to specific line numbers).

QUESTION 5F. Give an example of a boundary write error that a good debugging memory allocator would detect, but that this allocator would not detect. Your example should contain a call to m61_malloc and a call to m61_free.

1pt
char* ptr = (char*) m61_malloc(1);
ptr[1] = 0; // boundary write
m61_free(ptr);
This is not detected because the trailer metadata is carefully aligned to a multiple of 8 by lines M3-5, so the one-byte-off boundary write error above will miss the trailer metadata entirely.

QUESTION 5G. Does m61_malloc return properly-aligned memory? If yes, explain briefly (1 or 2 sentences); if no, explain how to change the code so it does (refer to specific line numbers).

QUESTION 5H. Does m61_malloc correctly handle unsigned integer overflow? If yes, explain briefly (1 or 2 sentences); if no, explain how to change the code so it does (refer to specific line numbers).

6. A minimum-overhead allocator (8 points)

In these questions, we consider an arena allocator (a special-purpose object that can allocate and free memory) whose goal is to minimize per-allocation overhead. You might use this allocator for small strings. The allocator works as follows:

It requests memory from the system one large block at a time, using the aligned_alloc function.

// aligned_alloc(align, sz)
//    Return a pointer to `sz` bytes of memory aligned on a multiple of `align`.
//    `align` must be a power of two, and `sz` must be a multiple of `align`.
//    Returns `nullptr` on failure. The returned memory can be freed by passing
//    it to `free`.
void* aligned_alloc(size_t align, size_t sz);

It then parcels out the memory in the block according to user requests.

Each block starts with 4 bytes of metadata: the number of active allocations stored in that block. The rest of the block contains allocations, crammed up next to each other, with no per-allocation metadata. For example, after this code runs:

minoverhead_arena moa;
char* p = moa.allocate(3);
strcpy(p, "Hi");

p = moa.allocate(4);
int i = 4;
memcpy(p, &i, 4);

for (int i = 'A'; i < 'K'; ++i) {
    p = moa.allocate(1);
    p[0] = i;
}

there will be one block in the system laid out as follows:

address is a multiple of 8192
|
v
+---+---+---+---+---+---+---+---+---+---+---+---+---+
|      12       |'H'|'i'|0x0|0x4|0x0|0x0|0x0|'A'|'B'| =>>
+---+---+---+---+---+---+---+---+---+---+---+---+---+
\___ nalloc ____/

      +---+---+---+---+---+---+---+---+--------- ... ---------+
  =>> |'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|         .....         |
      +---+---+---+---+---+---+---+---+--------- ... ---------+
                                       \__ 8171 free bytes __/

A block of memory is freed only after all allocations in it have been passed to minoverhead_arena::deallocate.

Allocators like this are sometimes used when it’s more important to minimize per-allocation overhead than to free underlying memory quickly.

Here’s code for the allocator. The deallocate function is the subject of some of our questions. As you work through the questions, you should not need to change the definition of block or add new members to minoverhead_arena.

struct minoverhead_arena {
    struct block {
        unsigned nalloc;
        char buf[8188];
    };

    block* current_block = nullptr;
    size_t pos = 0;

    // Return an allocation of `sz` bytes or `nullptr` on failure.
    // This allocator is specialized for byte arrays, so the returned
    // pointer may have any alignment.
    char* allocate(size_t sz) {
A1:     assert(sizeof(block) == 8192);

A2:     if (this->current_block == nullptr
A3:         || this->pos + sz > sizeof(block::buf)) {
A4:         this->current_block = aligned_alloc(sizeof(block), sizeof(block));
A5:         assert(this->current_block != nullptr);
A6:         this->current_block->nalloc = 0;
A7:         this->pos = 0;
A8:     }

A9:     char* result = &this->current_block->buf[this->pos];
A10:    this->pos += sz;
A11:    ++this->current_block->nalloc;
A12:    return result;
    }

    // Free an allocation returned by `allocate()`.
    void deallocate(char* ptr) {
D1:     block* b;
D2:     /* HOLE 1: set `b` to equal the `block` containing `ptr` */
D3:     --b->nalloc;
D4:     if (b->nalloc == 0) {
D5:         /* HOLE 2: free `b`, which is now empty */
D6:     }
    }

    ~minoverhead_arena() {
        // Destructor verifies that all allocations so far have been freed.
        assert(this->current_block == nullptr);
    }
};

QUESTION 6A. What is the largest sz that is safe to pass to minoverhead_arena::allocate?

QUESTION 6B. Fix minoverhead_arena::allocate so any sz is safe to pass. Refer to specific lines of code.

QUESTION 6C. What code should go in HOLE 1 in deallocate?

2pt The allocator works by storing metadata at the beginning of a block of data. Since blocks are aligned, we can recover the block’s address by masking out the lower 13 bits.
b = (block*) ((uintptr_t) ptr & ~8191UL);
/* or */
b = (block*) ((uintptr_t) ptr - (uintptr_t) ptr % sizeof(block));

QUESTION 6D. What code should go in HOLE 2 in deallocate?

2pt

free(b);
// might have just freed the currently-filling block!
if (this->current_block == b) {
    this->current_block = nullptr;
}

Or, alternately:

if (this->current_block == b) {
    this->pos = 0;
} else {
    free(b);
}

7. Things to know about assembly (8 points)

QUESTION 7A. How many registers may be used to pass function parameters in x86-64 and what are they?

QUESTION 7B. How many registers may be used to pass return values in x86-64?

QUESTION 7C. Are any of the registers used for function parameters or return values callee-saved?

QUESTION 7D. Which of these best describes the purpose of the red zone?

To store function arguments before calling a function
To store local variables in non-leaf functions
To store local variables in leaf functions
To save caller-saved registers before calling another function
To save callee-saved registers before calling another function
To confuse students

QUESTION 7E. List all the true statements.

Stacks grow up.
The ret instruction modifies the stack pointer.
The mov instruction may modify the stack pointer.
None of the above.

QUESTION 7F. Describe one way the calling convention for system calls on x86-64 Linux (or WeensyOS) differs from the normal C++ calling convention.

8. Library assembly (12 points)

The following five assembly functions implement well-known functions from the C++ library.

f1:
    endbr64
    movq    %rdi, %rax
    testq   %rdx, %rdx
    je  .L2
    movl    $0, %ecx
.L3:
    movzbl  (%rsi,%rcx), %r8d
    movb    %r8b, (%rax,%rcx)
    addq    $1, %rcx
    cmpq    %rdx, %rcx
    jne .L3
.L2:
    ret

f2:
    endbr64
    movzbl  (%rdi), %eax
    testb   %al, %al
    je  .L71
.L73:
    cmpb    %sil, %al
    je  .L74
    addq    $1, %rdi
    movzbl  (%rdi), %eax
    testb   %al, %al
    jne .L73
.L71:
    testb   %sil, %sil
    movl    $0, %eax
    cmove   %rdi, %rax
    ret
.L74:
    movq    %rdi, %rax
    ret

f3:
    endbr64
    testq   %rdx, %rdx
    je  .L24
    movl    $0, %eax
.L23:
    movzbl  (%rdi,%rax), %r8d
    movzbl  (%rsi,%rax), %ecx
    cmpb    %cl, %r8b
    jne .L26
    addq    $1, %rax
    cmpq    %rax, %rdx
    jne .L23
    movl    $0, %eax
    ret
.L26:
    movl    $1, %eax
    cmovbl  $-1, %eax
    ret
.L24:
    movl    $0, %eax
    ret

f4:
    endbr64
    movq    %rdi, %rax
    testq   %rdx, %rdx
    je  .L17
    leaq    (%rdi,%rdx), %rcx
    movq    %rdi, %rdx
.L18:
    movb    %sil, (%rdx)
    addq    $1, %rdx
    cmpq    %rdx, %rcx
    jne .L18
.L17:
    ret

f5:
    endbr64
    movq    %rdi, %rax
    movl    $0, %edx
.L43:
    movzbl  (%rsi,%rdx), %ecx
    movb    %cl, (%rax,%rdx)
    addq    $1, %rdx
    testb   %cl, %cl
    jne .L43
    ret

Remember to refer to the assembly notes if you are confused by an instruction (such as cmovbl). You may (or may not) find it faster to skip to the last question and then return to the others.

QUESTION 8A. What is the most likely number of parameters for each function?

QUESTION 8B. Which of these functions take pointer arguments? List all that apply or say “none”.

QUESTION 8C. Which of these functions take more than one pointer argument? List all that apply or say “none”.

QUESTION 8D. Which of these functions might modify primary memory? List all that apply or say “none”.

QUESTION 8E. Which of these functions make control flow decisions (conditional branches) based on values read from primary memory? List all that apply or say “none”.

QUESTION 8F. Which of these functions use a general-purpose callee-saved register? List all that apply or say “none”. (Note that %rsp and %rip do not count.)

QUESTION 8G. Which of these functions use the red zone? List all that apply or say “none”.

QUESTION 8H. Each of these functions implements a C++ library function specification—specifically, for one of the functions implemented in WeensyOS. What library function corresponds to each assembly function?

9. Disassemble (12 points)

For each of these questions, your answer should be a C++ function, including a type signature and definitions of any associated structs or types, that could compile to the given assembly.

QUESTION 9A.

_Z2g1i:
        endbr64
        movl    %edi, %eax
        shrl    $31, %eax
        ret

3pts This function shifts out all bits of its single int-sized argument (%edi), except the sign bit. The shrl instruction is a logical shift, meaning it shifts in 0 bits, so %eax will be either 0 or 1.
bool g1(int i) {
    return i < 0;
}

QUESTION 9B.

_Z2g2Pmm:
        endbr64
        movq    %rsi, %rax
        salq    $4, %rax
        movq    (%rdi,%rax), %rax
        subq    (%rdi,%rsi,8), %rax
        ret

3pts This function computes 16*%rsi and uses that as an offset into an array of 8-byte values starting at %rdi. It then subtracts from that the 8-byte value at %rdi + 8*%rsi, and returns the result. Since sizeof(long) == 8, the value stored at %rdi + 16*%rsi could be an element in an array of unsigned longs, specifically the element at index 2*%rsi.
unsigned long g2(unsigned long* a, unsigned long i) {
    return a[2 * i] - a[i];
}
(The generated assembly would be the same for long arguments; the mangled function name, _Z2g2Pmm, indicates unsigned longs. We accepted any array of 8-byte elements.)

QUESTION 9C.

_Z2g3P2ll:
        endbr64
        xorl    %eax, %eax
        testq   %rdi, %rdi
        je      .L7
.L6:
        movq    (%rdi), %rdi
        addq    $1, %rax
        testq   %rdi, %rdi
        jne     .L6
        ret
.L7:
        ret

QUESTION 9D.

_Z2g4P4statj:
        endbr64
        movl    %esi, %eax
        imull   %esi, %esi
        addq    $1, (%rdi)
        addq    %rax, 8(%rdi)
        addq    %rsi, 16(%rdi)
        ret

3pts This function appears to maintain a set of statistics by updating a structure (passed by pointer, in the first argument) according to a sample (an integer passed as the second argument).
struct stat {
    unsigned long n;
    unsigned long sum;
    unsigned long sum_sq;
};
void g4(stat* s, unsigned i) {
    s->n += 1;
    s->sum += i;
    s->sum_sq += i * i;
}

10. Miscellany (7 points)

QUESTION 10A. Given an object of pointer type T* ptr, what is the best reason to be suspicious of an expression like ptr + N * sizeof(T)?

Array elements should be located using array syntax, such as &ptr[N], rather than pointer arithmetic.
It probably should read ptr + N * T.
Pointers and arrays in C and C++ are unrelated.
Pointer arithmetic automatically includes a factor related to the size of the element type, so it probably should read ptr + N.
None of the above.

QUESTION 10B. Put the following values in increasing order. An answer to this question will have the form “B < A < D < C < E < F”.

Zero
The smallest signed char
The address in the stack of the current function’s return address
The address of a local variable in main
The current %rip register
The address of a dynamically-allocated block of memory

QUESTION 10C. You are working with a program that frequently accesses arrays of color objects. You have two choices for the color definition, namely:

struct color1 {
    unsigned short red, green, blue;
};

struct alignas(8) color2 {
    unsigned short red, green, blue;
};

What are the size and alignment of color1 and color2?

QUESTION 10D. Which statements about color1 and color2 are true? List all that apply.

Using color2 might generate more compact instruction sequences, because its size is compatible with x86-64’s indirect addressing modes.
Using color1 will require use of an expensive multiply instruction, because its size is incompatible with x86-64’s indirect addressing modes.
Using color1 might generate faster runtimes, because it uses 25% less memory.
Using color2 might generate slower runtimes, because it takes time to align data.
None of the above.

2pt #1, #3; but no points off for #1, #2, #3.

#1 is true: color2 can use the indirect addressing mode with scale 8, but the sizeof color1, 6, is not a valid scale.

Nevertheless, #2 is false because, as we’ve seen in some compiler-generated assembly, some array arithmetic involving oddly-sized elements is performed using combinations of lea and shl. For instance, a function like this:
color1* f(color1* a, unsigned idx) {
    return &a[idx];
}
compiles to:
leaq (%rsi,%rsi,2), %rax    ; %rax := %rsi + %rsi * 2 == 3 * %rsi
leaq (%rdi,%rax,2), %rax    ; %rax := %rdi + %rax * 2 == %rdi + 6 * %rsi
ret
#3 is true.

#4 is not really true; the compiler solves alignment issues before the program starts running.

QUESTION 10E. Which statements about process isolation are true? List all that apply.

Processes can only communicate with each other using shared memory regions.
Processes cannot communicate with each other at all.
System call wrappers cannot be written entirely within the C++ abstract machine, because the mechanisms for communicating with an operating system are operating system specific.
A CPU executing instructions on behalf of a process will continue doing so until the process voluntarily gives up control via a system call.
A CPU can support as many processes as it has sets of registers.
None of the above.

11. The end (5 points)

QUESTION 11A. What improvements would you suggest to the class? Any answer but no answer will receive full credit.

QUESTION 11B. What topics do you wish we had spent more time on? Any answer but no answer will receive full credit.

2021 Midterm