Exercises not as directly relevant to this year’s class are marked with ⚠️.
MISC-1. Git
Edward Snowden is working on a CS61 problem set and he has some git questions.
QUESTION MISC-1A. The CS61 staff has released some new code. Which commands will help Edward get the code from code.seas.harvard.edu into his repository? Circle all that apply.
git commit
git add
git push
git pull
#4
QUESTION MISC-1B. Edward has made some changes to his code. He hasn’t run git since making the changes. He wants to upload his latest version to code.seas.harvard.edu. Put the following git commands in an order that will accomplish this goal. You won’t necessarily use every command. You may add flags to a command (but you don’t have to). If you add flags, tell us what they are.
git commit
git add
git push
git pull
#2, #1, #3; or “#1 with
-a
”, #3
Edward Snowden’s partner, Edward Norton, has been working on the problem set also. They’ve been working independently.
At midnight on October 10, here’s how things stood. The git log
for
the partners’ shared code.seas.harvard.edu repository looked like this.
The committer is listed in (parentheses).
52d44ee Pset release. (kohler)
The git log
for Snowden’s local repository:
3246d07 Save Greenwald's phone number (snowden)
8633fbd Start work on a direct-mapped cache (snowden)
52d44ee Pset release. (kohler)
The git log
for Norton’s local repository:
81f952e try mmap (norton)
52d44ee Pset release. (kohler)
At noon on October 11, their shared GitHub repository has this log:
d446e60 Increase cache size (snowden)
b677e85 use mmap on mmappable files (norton)
b46cfda Merge branch 'master' of code.seas.harvard.edu:~TheTrueHOOHA/cs61/TheTrueHOOHAs-cs61-psets.git
(norton)
81f952e try mmap (norton)
3246d07 Save Greenwald's phone number (snowden)
8633fbd Start work on a direct-mapped cache (snowden)
52d44ee Pset release. (kohler)
QUESTION MISC-1C. Give an order for these commands that could have produced that log starting from the midnight October 10 state. You might not use every command, and you might use some commands more than once. Sample (incorrect) answer: “1 4 4 5 2.”
- snowden:
git commit -a
- snowden:
git push
- snowden:
git pull
- norton:
git commit -a
- norton:
git push
- norton:
git pull
- #2 (snowden push)
- [#5 (norton push—OPTIONAL; this push would fail)]
- #6 (norton pull) (We know that Snowden pushed first, and Norton pulled before pushing, because Norton committed the merge) [CIRCLE FOR 1D]
- [#4 (norton commit—OPTIONAL for the merge commit; the merge commit will happen automatically if there are no conflicts] [ALLOW CIRCLE FOR 1D]
- #4 (norton commit for b677e85)
- #5 (norton push)
- #3 (snowden pull—snowden pulls before committing because there is no merge)
- #1 (snowden commit for d446e60)
- #2 (snowden push)
QUESTION MISC-1D. In your answer to Part C, circle the step(s) where there might have been a merge conflict.
(see above)
MISC-2. Debugging
QUESTION MISC-2A. Match each tool or technique with a debugging situation for which it is well suited. Produce the best overall match that uses each situation exactly once.
1. strace | A. Investigating segmentation faults |
2. gdb | B. Finding memory leaks |
3. valgrind --tool=memcheck | C. Checking your assumptions and verifying invariants |
4. printf statements | D. Discovering I/O patterns |
5. assert | E. Displaying program state |
1—D, 2—A, 3—B, 4—E, 5—C
MISC-3. Pot Pourri
QUESTION MISC-3A. What does the following instruction place in %eax?
sarl $31, %eax
It fills eax with the sign bit of eax (i.e., all 0's or all 1's)
QUESTION MISC-3B. True/False: A direct-mapped cache with N slots can handle any reference string with < N distinct addresses with no misses except for compulsory misses.
False
QUESTION MISC-3C. What is 1 (binary) TB in hexadecimal?
1 TB = 2^40 = 1 followed by 40 zeros: so those 0's turn into the 10 hex 0's preceded by a 1: 0x10000000000
QUESTION MISC-3D. Write the answer to the following in hexadecimal:
0xabcd + 12
12 = 0xC; 0xD + 0xC = (25 = 0x19), so the answer is 0xABD9
QUESTION MISC-3E. True/False: The garbage collector we discussed is conservative, because it only runs when we tell it to.
False (conservative because it never reclaims something it shouldn't, but might not reclaim things it could).
QUESTION MISC-3F. True/False: Given the definition int array[10] the following two expressions mean the same thing: `&array[4] and array
- 4`.
True
QUESTION MISC-3G. Using the matrix multiply from lecture 12, in what order should you iterate over the indices i, j, and k to achieve the best performance.
ikj
QUESTION MISC-3H. True/False: fopen, fread, fwrite, and fclose are system calls.
False; they are calls to the standard IO library.
QUESTION MISC-3I. Which do you expect to be faster on a modern Linux OS, insertion sorting into a linked list of 1000 elements or into an array of 1000 elements?
The array.
QUESTION MISC-3J. What does the hardware do differently when adding signed versus unsigned numbers?
Nothing
MISC-4. Debugging
In the following short-answer questions, you have access to five
debugging tools: top
, strace
, gdb
, sanitizers, and man
. You
can’t change program source code or use other tools. Answer the
questions briefly (a couple sentences at most).
QUESTION MISC-4A. You are given a program that appears to “get stuck” when run. How would you distinguish whether the program blocked forever (e.g., made a system call that never returned) or entered an infinite loop?
You can use
top
: does it report the process is using 100% of the CPU?You can use
strace
: is the last thing in the strace an incomplete system call?
QUESTION MISC-4B. You are given a program that uses more memory while running than you expect. How would you tell whether the program leaks memory?
Compile with a leak sanitizer and check if it reports any memory leaks when the program exits.
QUESTION MISC-4C. You are given a program that produces weird answers. How would you check if it invoked undefined behavior?
Sanitizers.
QUESTION MISC-4D. You are given a program that blocks forever. How would you tell where the program blocked (which function called the blocking system call)?
Run it under
gdb
. When it blocks, hit Ctrl-C and then enterbacktrace
/bt
to get a backtrace.Or use
strace
.
QUESTION MISC-4E. You are given a program that takes a long time to produce a result. How would you tell whether the program was using system calls unintelligently?
Run it under
strace
and look for stupidity, such as many system calls that report errors, many system calls that are redundant, lots ofread
s that return short counts, etc.
QUESTION MISC-4F. You are given a program that exits with a system call error, but doesn’t explain what happened in detail. How would you find what error condition occurred and understand the conditions that could cause that error?
Run it under
strace
to find the error condition: look for a system call that returned the error. Then useman
on that system call and read about the error (theerrno
description).
MISC-5. Miscellany
QUESTION MISC-5A. True or false in conventional Unix systems?
- File descriptors are often used to communicate among processes on the same machine.
- File descriptors are often used to communicate among processes on different machines.
- File descriptors are often used to communicate with persistent storage.
- File descriptors are often used to access primary memory.
- File descriptors are often used to create child processes.
1, 2, and 3 are true.
QUESTION MISC-5B. Match each numbered process isolation feature with the lettered hardware feature that helps enforce it. Use each hardware feature once (make the best match you can).
- Protected control transfer (processes can transfer control to the kernel only at defined entry points)
- Memory protection (one process cannot modify another process’s memory)
- Interrupt protection (only the kernel can disable interrupts)
- CPU protection (the kernel always regains control of the CPU eventually)
- Traps
- Privileged mode (dangerous instructions fault unless the CPU is in privileged mode)
- Timer interrupts
- Page tables
1—A, 2—D, 3—B, 4—C
The remaining questions refer to the following lines of code.
1.
close(fd);
2.
connect(fd, sockaddr, socklen);
3.
listen(fd);
4.
mmap(nullptr, 4096, PROT_READ, MAP_SHARED, fd, 0);
5.
read(fd, buf, 4096);
6.
write(fd, buf, 4096);
QUESTION MISC-5C. If a program executes the following line without error, which of those lines could be executed next without error? List all numbers that apply.
fd = open("/home/cs61user/cs61-psets/pset6/pong61.c", O_RDWR);
1, 4, 5, 6
QUESTION MISC-5D. If a program executes the following line without error, which lines could be executed next without error? List all numbers that apply.
fd = socket(AF_INET, SOCK_STREAM, 0);
1, 2, 3
QUESTION MISC-5E. If a program executes the following lines without error, which lines could be executed next without error? List all numbers that apply.
pipe(pipefd);
fd = pipefd[0];
1, 5
MISC-6. More Miscellany
QUESTION MISC-6A. True or false: Any C++ arithmetic operation has a well-defined result.
False: signed integer overflow is undefined.
QUESTION MISC-6B. True or false: Any x86-64 processor instruction has a well-defined result.
True
QUESTION MISC-6C. True or false: By executing a trap instruction, a process can force an operating system kernel to execute arbitrary code.
False (unless the kernel is really badly written!)
QUESTION MISC-6D. True or false: By manipulating process memory and registers, an operating system kernel can force a process to execute arbitrary instructions.
True: the OS kernel has full system privilege.
QUESTION MISC-6E. True or false: All signals are sent explicitly via
the kill()
system call.
False—some are sent for other reasons, such as the user hitting Control-C or a child process exiting.
QUESTION MISC-6F. True or false: An operating system’s buffer cache is generally fully associative.
True
QUESTION MISC-6G. True or false: The least-recently-used eviction policy is more useful for very large files that are read sequentially than it is for stacks.
False
QUESTION MISC-6H. True or false: Making a cache bigger can lower its hit rate for a given workload.
True—that’s Bélády’s anomaly.
QUESTION MISC-6I. True or false: x86-64 processor caches are coherent (i.e., always appear to contain the most up-to-date values).
True
QUESTION MISC-6J. True or false: A socket file descriptor supports either reading or writing, but not both.
False; it supports both
MISC-7. Pot Pourri
Parts A-D pertain to the data structures and hexdump output shown here.
struct x {
unsigned long ul;
unsigned short us;
unsigned char uc;
} *sp;
// Hexdump output of some program running on the appliance
08c1b008 e9 11 cf d0 0d d0 3f f3 63 61 74 00 0d f0 fe ca |......?.cat.....|
08c1b018 5e ea 15 0d de c0 ad de |^.......|
You are told that sp
= 0x08c1b008.
QUESTION MISC-7A. What is the value (in hex) of sp->ul
?
0xf33fd00dd0cf11e9
QUESTION MISC-7B. What is the value (in hex) of sp->uc
?
0x3f
QUESTION MISC-7C. At what address will you find the string "cat"?
0x08c1b010
QUESTION MISC-7D. If the bytes after the string "cat" comprise an array of 3 integers, what is the value (in hex) of the integer at index 1 of that array?
0x0d15ea5e
QUESTION MISC-7E. What is the following binary value expressed in hexadecimal: 01011010?
0x5a
QUESTION MISC-7F. What is the value of the hex number 0x7FF in decimal?
255 + 7*256 == 8*256 − 1 == 2*4*256 − 1 == 2*1024 − 1 == 2047
QUESTION MISC-7G. Is 0x98765432 a valid return from malloc?
No, because it isn’t aligned properly—malloc will always return a pointer whose alignment could work for any basic type, which on x86-64, means the last digit must be 0.
QUESTION MISC-7H. What is the minimum number of x86 instruction bytes you need to write an infinite loop?
Two bytes: 0xeb 0xfe
QUESTION MISC-7I. True or False: Every declaration in C++ code allocates space for an object.
False. Extern declarations, such as function declarations or
extern int x;
, don’t allocate space.
QUESTION MISC-7J. True or False: Processes cannot share physical memory in WeensyOS.
False; after step 5, child processes share read-only physical memory with their parents.
For parts K–O, assume we are running on the appliance and we initialize
ival
, p
, and q
as shown below. Write the value of the expression—you may
express the values in hex if that's simpler, just be sure to prefix them with
0x to make it clear that you are doing so. For True/False questions, there is
no need to correct or provide a counterexample for any statements that are
false.
int ival[4] = {0x12345678, 0x9ABCDEF0, 0x13579BDF, 0x2468ACE0};
int* p = &ival[0];
int* q = &ival[3];
int* x = p + 1;
char* cp = (char*) (q - 2);
QUESTION MISC-7K. q - p
3
QUESTION MISC-7L. ((char*) q - (char*) p)
12
QUESTION MISC-7M. x - p
1
QUESTION MISC-7N. *((short*) ((char*) x + 2))
0x9ABC
QUESTION MISC-7O. *cp
0xF0
QUESTION MISC-7P. What system call allows you to block on a collection of file descriptors?
select
(alsopoll
,pselect
,epoll
, …)
QUESTION MISC-7Q. What system call creates a communication channel that can only be used among related processes?
pipe
QUESTION MISC-7R. ⚠️ What system call can change the attributes of a file descriptor so you can poll on it rather than block?
fcntl
QUESTION MISC-7S. What system call produces a file descriptor on which a server can exchange messages with a client?
socket
(oraccept
)
QUESTION MISC-7T. True or False: A program and a process are the same thing.
False
MISC-8. CS61 in Real Life
QUESTION MISC-8A. The CS61 Staff have built a jet (the NightmareLiner) modeled on the Boeing Dreamliner. Unfortunately, they modeled it just a bit too closely on the Dreamliner, which needs to be rebooted periodically to avoid failure. In the case of the NightmareLiner, it needs to be rebooted approximately every 16 days. Your job is to use what you've learned in CS61 about data representation to hypothesize why.
Hint: There are 86,400,000 ms in a day. 86,400,000 is between 226 and 227.
OK, 16 is 24 and 27+4 is 31 -- it looks to me like the have a signed 32-bit number somewhere and if they don't reboot, it overflows.
Google recently discovered (and reported) a bug in the GNU libc
implementation of getaddrinfo
. This function can perform RPC calls,
which involve sending and receiving messages. In some cases,
getaddrinfo
failed to check that a received message could fit inside a
buffer variable located on the stack (2048 bytes).
QUESTION MISC-8B. True or false: This flaw means getaddrinfo
will
always execute undefined behavior.
False: it only executes undefined behavior if the received message exceeds the buffer size of 2048.
QUESTION MISC-8C. Give an example of a message that will cause
getaddrinfo
to exhibit undefined behavior.
A message larger than the buffer (2048 bytes).
QUESTION MISC-8D. Briefly describe the contents of a message that
would cause the getaddrinfo
function to return to address
0x400012988 rather than to its caller.
In the message, the value 0x400012988 should appear beyond the 2048-byte limit of the buffer so that it ends up overwriting the return value on the stack (for example, a message that is 4096 bytes long, and the second half of the message contains repeated instances of 0x400012988).
QUESTION MISC-8E. This code used to appear in the Linux kernel:
1. struct tun_struct *tun = ...; // This is a valid assignment;
2. struct sock *sk = tun->sk;
3. if (!tun)
4. return POLLERR; // This is an error return
The compiler removed lines 3 and 4. Why was that a valid thing for the compiler to do?
Dereferencing a null pointer is undefined behavior. Since tun is dereferenced in line 2, the compiler assumes that it cannot be null; therefore it is free to remove the check in line 3 and the accompanying code in line 4.
MISC-9. Miscellany
QUESTION MISC-9A. Name the property that implies a process cannot cause the kernel to execute code at an arbitrary address.
Kernel isolation. “Process isolation” is certainly acceptable too.
QUESTION MISC-9B. True or false: It’s safe to call any C library function from a signal handler.
False
QUESTION MISC-9C. True or false? Pointer arithmetic is only valid on pointers that point into arrays. Explain briefly.
2pts False. Can add 0 to any pointer; can point one-past-the-end of any object.
QUESTION MISC-9D. True or false? All x86-64 fundamental data types have alignments that are powers of 2. Explain briefly.
3pts True
QUESTION MISC-9E. True or false? Pointer arithmetic on char*
pointers behaves
the same way as address arithmetic (i.e., arithmetic on uintptr_t
unsigned
integers), including with respect to undefined behavior. Explain briefly.
2pts False. Wraparound on pointer arithmetic, or moving past the boundaries of an object or array, causes UB; address arithmetic does not.
QUESTION MISC-9F. True or false? System calls are just as fast as function calls. Explain briefly.
3pts False. System calls are slower because they transfer to kernel.
MISC-10. Listing All That Apply
Assume an underlying architecture of x86-64 for all questions.
QUESTION MISC-10A. Undefined behavior: list all that apply.
- Undefined behavior is a property of the C abstract machine.
- Undefined behavior is a property of the x86-64 architecture.
- Undefined behavior is dangerous.
- Unsigned overflow causes undefined behavior.
- None of the above.
1, 3
QUESTION MISC-10B. Pointer arithmetic: list all that apply. Assume int* x = new
int[10]
.
sizeof(x) == 40
.- The allocation pointed to by
x
contains at least 40 bytes. - The statement
int* y = x + 10
causes undefined behavior. - The statement
int* y = x + 11
causes undefined behavior. - None of the above.
2,4
QUESTION MISC-10C. Size and alignment: list all that apply.
- Pointers have size 8.
sizeof(int) == 4
.sizeof(T)
is a multiple ofalignof(T)
for allT
.- Given an array
T x[N]
,sizeof(x) == sizeof(T) * N
. - None of the above.
1,2,3,4
QUESTION MISC-10D. Registers: list all that apply.
- Every function must restore all processor registers to their original values on exit.
- Function return values are stored on the stack.
%rax
and%eax
refer to (parts of) the same register.%rbp
and%ebx
refer to (parts of) the same register.- None of the above.
3
QUESTION MISC-10E. Return address: list all that apply.
- A function’s return address is known at compile time.
- A function’s return address is stored on the stack.
- A function’s return address is protected from modification by the processor.
- A function’s return address is the address of the
call
instruction that invoked the function. - None of the above.
2
QUESTION MISC-10F. Stdio cache: list all that apply.
- The stdio cache has size 10000 bytes by default.
- The stdio cache behaves similarly for pipes, disk files, and the terminal.
- The stdio cache is stored in the buffer cache.
- The stdio cache is stored in registers.
- None of the above.
5
QUESTION MISC-10G. Protected control transfer: list all that apply.
- Protected control transfer refers to the transfer of control from an unprivileged process to the kernel.
- An unprivileged process can initiate a protected control transfer that executes any instruction in machine memory.
- Executing a protected control transfer is about as expensive as calling a function.
- System calls are typically initiated by faults.
- None of the above.
1
System calls are initiated by traps, not faults.
QUESTION MISC-10H. Virtual memory: list all that apply.
- Virtual memory requires processor support.
- Virtual memory involves page tables.
- Entries in page table data structures contain virtual addresses.
- Entries in page table data structures include flags that are not related to (i.e., part of) addresses.
- None of the above.
1,2,4
QUESTION MISC-10I. System calls: list all that apply.
waitpid
might or might not block depending on its arguments.pipe
might or might not block depending on its arguments.- The
kill
system call relates to signals. - The
dup2
system call can increase the number of open file descriptors in a process. - None of the above.
1,3,4
QUESTION MISC-10J. Pipes and file descriptors: list all that apply.
- All of a process’s file descriptors that refer to the same file share the same file position.
- All processes in a pipeline generally share one standard input.
- All processes in a pipeline generally share one standard error.
STDIN_FILENO == 0
.- None of the above.
3, 4
QUESTION MISC-10K. Threads and address spaces: list all that apply.
- Threads within the same process can share the same address space.
- Threads from different processes can share the same address space.
- Each thread has its own stack and set of registers.
- Accessing a thread’s stack from a different thread raises a segmentation fault.
- None of the above.
1, 3