CS 61
  • Info
    Concepts Course Description Course Staff Extension Resources Schedule Setup: GitHub Setup: Docker Setup: VM Style C and C++ Patterns Git Diff GDB Introduction GDB Commands File Descriptors HTTP Test Information Test 1 Test 2 Final Midterm 2023 Final 2023 Midterm 2022 Final 2022 Exercises: Miscellaneous
  • Problem sets
    Problem set 0 Problem set 1 Problem set 2 Problem set 3 Problem set 4 Problem set 5 Problem set 5 EC: Interruption Problem set 6 Problem set 6 EC Data representation exercises Assembly exercises Kernel exercises Storage exercises Process control exercises Synchronization exercises
  • Lectures
    Textbook Data representation Data representation 1: Introduction Data representation 2: Sizes and layout Data representation 3: Layout and dynamic allocation Data representation 4: Dynamic allocation, invariants Assembly Kernel Kernel 1: Processes, isolation, robustness Kernel 2: Virtual memory Kernel 3: Virtual memory and page tables Storage Storage 1: OS input/output, memory hierarchy Storage 2: Cache optimizations and coherence Storage 3: Eviction policies and coherence Process control Processes 1: Basics Processes 2: Inter-process communication Processes 3: Interruption and race conditions EthiCS: Harms and Unicode EthiCS: UTF-8 Synchronization Synchronization 1: Threads and atomicity Synchronization 2: Critical sections, serializability, lock granularity Synchronization 5: Databases Synchronization 6: Databases and Experiences
  • Sections
    Times & locations Datarep Section 1: C++ data structures Datarep Section 2: Memory bugs Assembly Section 1: Fun Datarep/Assembly Exercises Section Kernel Section 1: WeensyOS Kernel Section 2: Kernel exercises Kernel Section 3: Pipes Storage Section 1: Single-slot cache Process Section 1 and Storage Exercises Process Supplement: BNF grammars Process Section 2: Process control exercises Synchronization Section 1: Threads and synchronization objects Synchronization Section 2: Synchronization Exercises

2023 Midterm

Show all solutions Hide all solutions

Rules

The exam is open book, open note, open computer. You may access the book and your own notes. You may also use computers or electronic devices to access your own class materials and public class materials. Specifically:

  • You may access a browser and a PDF reader.
  • You may access your own notes and problem set code electronically.
  • You may access Internet sites on which your own notes and problem set code are stored.
  • You may access the course site.
  • You may access pages directly linked from the course site, including our lecture notes, exercises, and section notes.
  • You may access any web pages referenced by the "Notes" section of the lecture PowerPoint slides.
  • You may access our lecture, problem set, and section code.
  • You may run a C or C++ compiler, including an assembler and linker.
  • You may access manual pages and common utilities, including calculators and a Python interpreter.

However:

  • You may not access class discussion forums, such as the Edboard, or other question forums, such as Stack Overflow.
  • You may not access an on-line disassembler or decompiler.
  • You may not access solutions from any previous exam, by paper or computer, except for those on the course site.
  • You may not broadly search the Internet for answers to questions or access other courses’ materials. Stick to our course site and web pages referenced by the lecture PowerPoint slides.
  • You absolutely may not contact other humans or AI assistants with questions about the exam—whether in person, via IM, via GitHub Copilot, or by posting questions to forums. The single exception is that you may contact course staff.

Any violations of this policy are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Students are taking the exam at different times. Do not post publicly about the exam until given permission to do so by course staff. This includes general comments about exam difficulty.

Questions

If you have questions during the exam, raise your hand if you are taking the exam in person. Otherwise, write us email at cs61-staff@g.harvard.edu. Don’t use the Edboard. We may not notice Edboard posts, and you aren’t supposed to access the Edboard anyway.

Completing your exam

You have 75 minutes to complete the exam starting from the time you press Start. Different students may have different exams. Enter your answers directly in this form. A green checkmark appears when an answer is saved. (Click outside a text box to save the answer.)

Students with extra time accommodations: If your extra time is not reflected next to the start button, contact course staff and wait for confirmation before pressing it.

There is no need to explicitly “submit” the exam. Your changes are automatically saved with timestamps. We’ll grade the latest version saved within your exam window.

Notes

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Functions (18 points)

QUESTION 1A. Alice says that the System V calling convention is enforced by the processor hardware. Erin says that it's enforced by compilers. Who is right? Why?

9pt Erin is right. A compiler is free to define arbitrary calling conventions, and the hardware won't complain.

QUESTION 1B. Suppose that, on a Linux x86-64 machine, a particular function argument needs to be passed by value. Further suppose that the argument is small enough to fit in a single register. Why might the compiler still choose to pass the argument via the stack?

9pt The function might have seven integers arguments, all of which could fit in registers. However, the System V calling convention says that only the first six may be be passed through registers, with the remaining having to be passed via the stack.

2. Variables and Values (37 points)

QUESTION 2A. On a 17-bit CPU, what is the value of the largest unsigned integer that could fit in a single register?

9pt (2^17) - 1

QUESTION 2B. Consider a global variable int* ptr. Is the lifetime of the variable ptr guaranteed to be greater than, greater than or equal to, equal to, less than or equal to, or less than the lifetime of any thing that ptr might point to?

10pt A global variable lives for the entire duration of a program. So, its lifetime will be greater than or equal to the lifetime of any possible thing that it points to: equal to if it points to another global variable, greater than otherwise.

QUESTION 2C. Suppose that the latest x86-64 chip does not provide native support for register slices; instead, the hardware only natively supports operations on full 64-bit registers. Explain how a compiler could use a single instruction to simulate the effects of this sequence from an earlier x86-64 chip that does support slices:

mov $0x0, %rax
mov $0xFFFF,  %ax
mov $0xCC, %ah

Explain your reasoning.

10pt The compiler would emit mov $0xCCFF, %rax. Recall that %ah refers to bits 8--15 of %rax, and %ax refers to bits 0--15. So, the three instruction sequence has the net effect of clearing all bits in %rax, except for bits 0--7 (set to 0xFF by the second instruction) and bits 8--15 (set to 0xCC by the third instruction, overwriting the 0xFF set by the second instruction).

QUESTION 2D. Consider the following C++ code that is executed on a Linux x86-64 machine:

unsigned int i = get_i_val();
unsigned short s = (unsigned short) i;

Give an example of how a sanitizer might rewrite this code to detect a loss of precision caused by the cast in the second line of code. Explain why your code works.

8pt The sanitizer would add code like this immediately before the casting line of code:

if (i > 0xFFFF /*max val for unsigned short*/) {
    report_and_crash();
}

3. Memory Layout and Accessing (45 points)

QUESTION 3A. Consider the following C++ code:

char* arr = new char[16];
memset(arr, 0, 16); //Set each char in `arr`
                    //to the byte `0`.
int* p = (int*) arr;
p[2] = 0xFFFFFFFF;

Imagine that the code runs on a machine in which sizeof(char) is 1 and sizeof(int) is 8. [Note that the latter is different than on a Linux x86-64 computer!] On this machine, how many bytes in the array arr will be set to a non-zero value? Explain your reasoning.

8pt No bytes in the array will be set to a non-zero value! The reference p[2] refers to the sizeof(int)==8 bytes that are immediately outside the end of the array. In other words, writing to p[2] is actually an out-of-bounds write!

QUESTION 3B. Consider the redzone-based address sanitization approach described in Lecture 5. Compared to the error detector for boundary writes that you implemented in pset1, is the Lecture 5 approach, configured with 16 byte redzones, guaranteed to catch more, fewer, or the same number of one-byte boundary write errors? Why? [A one-byte error is an error which updates the byte that lives just beyond the valid end of an allocation.]

12pt The Lecture 5 approach is guaranteed to catch more errors. The pset1 approach puts magic characters at the end of an allocation, and thus cannot detect overwrites that write the "correct" magic character (or that write an incorrect character and then write the correct one again). In contrast, the Lecture 5 approach does not rely on magic characters, but instead instruments every write to see if it's within the bounds (regardless of what the value being written is).

Another correct answer is that the Lecture 5 approach detects both overwrites and underwrites (i.e., writes that occur immediately before an object's valid memory region). In contrast, a pset-style approach only detects overwrites.

QUESTION 3C. On Linux x86-64, sizeof(long double) is 16. Why does this fact make the lea instruction less useful for sequentially iterating through the elements in an array of long doubles?

12pt The scale argument to a lea is restricted to being 1, 2, 4, or 8. This scale works well when an array contains elements of size 1, 2, 4, or 8, since you can just keep (say) the array base in %rax, the scale set to the appropriate sizeof value, and the index (which is incremented sequentially) in %rbx, handing all of these arguments repeatedly to lea. This approach doesn't work if the sizeof value for array elements can't be represented using the scale argument to lea.

QUESTION 3D. Suppose that, on a Linux x86-64 machine, a struct type s must hold three fields: a char[3] array, a char, and a long. Is there an ordering of those fields such that, with compiler-added padding, a buffer overwrite of the char[3] by 5 bytes would (i) still reside within the overall struct s (i.e., within the sizeof(s) bytes that the compiler allocated for the struct), and (ii) only overwrite padding data (not the data belonging to non-padding members of the struct? If so, what is that ordering? If not, why does no such ordering exist?

13pt Yes. Consider the order char[3] then long then char. The array will get placed at offset 0, and for alignment reasons, the long will get placed at offset 8. The char will then be placed at offset 16; at this point, the struct is 17 bytes long, which is not evenly divisble by the required alignment of 8 (i.e., the biggest alignment of any member of the struct); so, the compiler will pad the struct to be of size 24. Anyways, there will be a gap of 5 bytes between the array and the long, enough to absorb a buffer overwrite of 5 bytes.