Final Sample Question Solutions
The final will be cumulative, though it will be weighted more towards the second half of the class. So why not check out:
This bank of questions is taken from prior midterms and finals. The course does change from year to year, so some of the questions may refer to concepts we did not emphasize this year, and some concepts we did emphasize this year may not be represented here.
The final will 3 hours long. It will be open-note, open-book, open-computer, semiopen-network, using rules very similar to those in the midterm.
1. Computer arithmetic
Bitwise operators and computer arithmetic can represent vectors of
bits, which in turn are useful for representing sets. For example, say
we have a function bit
that maps elements to distinct bits; thus,
bit(X) == (1 << u)
for some u
. Then a set {X0, X1, X2, …, Xn} can be
represented as bit(X0) | bit(X1) | bit(X2) | … | bit(Xn)
. Element Xi
is in the set with representation n
if and only if
(bit(Xi) & n) != 0
.
QUESTION 1A. What is the maximum number of set elements that can be
represented in a single unsigned
variable on an x86 machine?
32
QUESTION 1B. Match each set operation with the C operator(s) that could implement that operation. (Complement is a unary operation.)
intersection |
|
equality |
|
complement |
|
union |
|
toggle membership |
|
intersection | & |
equality | == |
complement | ~ |
union | ` |
toggle membership | ^ |
QUESTION 1C. Complete this function, which should return the set
difference between the sets with representations a
and b
. This is
the set containing exactly those elements of set a
that are not in
set b
.
unsigned set_difference(unsigned a, unsigned b) {
return a & ~b;
OR return a - (a & b);
OR return a & ~(a & b);
QUESTION 1D. Below we’ve given a number of C expressions, some of
their values, and some of their set representations for a set of
elements. For example, the first row says that the integer value of
expression 0
is just 0, which corresponds to an empty set. Fill in the
blanks. This will require figuring out which bits correspond to the set
elements A
, B
, C
, and D
, and the values for the 32-bit int
variables a
, x
, and s
. No arithmetic operation overflows; abs(x)
returns the absolute value of x
(that is, x < 0 ? -x : x
).
Expression |
Integer value |
Represented set |
---|---|---|
|
0 |
|
|
1 |
|
|
1 |
|
|
1 |
|
|
15 |
|
|
4 |
|
|
2 |
|
|
0 |
|
|
3 |
|
|
4 |
|
|
8 |
|
2. Data structure assembly
Here are four assembly functions, f1
through f4
.
f1:
movl 4(%esp), %eax
movl 8(%esp), %ecx
testl %ecx, %ecx
jle .L2
xorl %edx, %edx
.L3:
movl 4(%eax), %eax
incl %edx
cmpl %ecx, %edx
jne .L3
.L2:
movl (%edx), %eax
ret
f2:
movl 8(%esp), %edx
leal 0(,%edx,4), %ecx
movl 4(%esp), %eax
movl (%eax,%ecx), %eax
addl %ecx, %eax
movl (%eax), %eax
ret
f3:
pushl %esi
pushl %ebx
movl 12(%esp), %ecx
movl 16(%esp), %esi
movl 20(%esp), %eax
testl %esi, %esi
jle .L9
xorl %edx, %edx
.L10:
movl %eax, %ebx
andl $1, %ebx
movl 4(%ecx,%ebx,4), %ecx
incl %edx
sarl %eax
cmpl %esi, %edx
jne .L10
.L9:
movl (%ecx), %eax
popl %ebx
popl %esi
ret
f4:
movl 8(%esp), %edx
movl 4(%esp), %eax
movl (%eax,%edx,4), %eax
ret
QUESTION 2A. Each function returns a value loaded from some data structure. Which function uses which data structure?
- Array
- Array of pointers to arrays
- Linked list
- Binary tree
Array—f4
; Array of pointers to arrays—f2
; Linked list—f1
; Binary
tree—f3
QUESTION 2B. The array data structure is an array of type T. Considering the code for the function that manipulates the array, which of the following types are likely possibilities for T? Circle all that apply.
char
int
unsigned long
unsigned long long
char*
- None of the above
int`, `unsigned long`, `char *
3. Disassembly I
Here’s some assembly produced by compiling a C program with gcc
.
.LC1:
.string "%d %d\n"
.globl f
.type f, @function
f:
pushl %ebp
movl $1, %ecx
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $12, %esp
.L13:
movl $1, %eax
.L9:
movl %eax, %ebx
imull %eax, %ebx
movl %ecx, %esi
imull %ecx, %esi
movl $1, %edx
.L4:
movl %edx, %edi
imull %edx, %edi
addl %ebx, %edi
cmpl %esi, %edi
je .L11
incl %edx
cmpl %eax, %edx
jle .L4
incl %eax
cmpl %ecx, %eax
jle .L9
incl %ecx
jmp .L13
.L11:
pushl %ecx
pushl %eax
pushl %edx
pushl $.LC1
call printf
leal -12(%ebp), %esp
movl $1, %eax
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
QUESTION 3A. How many arguments might this function have? Circle all that apply.
- 0
- 1
- 2
- 3 or more
All
QUESTION 3B. What might this function return? Circle all that apply.
- 0
- 1
- −1
- Its first argument, whatever that argument is
- A square number other than 0 or 1
- None of the above
Choice #2 (1)
QUESTION 3C. Which callee-saved registers does this function save and restore? Circle all that apply.
- %eax
- %ebx
- %ecx
- %edx
- %ebp
- %esi
- %edi
- None of the above
%ebx, %ebp, %esi, %edi
QUESTION 3D. This function handles signed integers. If we changed the C source to use unsigned integers instead, which instructions would change? Circle all that apply.
movl
imull
addl
cmpl
je
jge
popl
- None of the above
jge
We accepted circled imull
or not! Although x86 imull
is signed, in
fact as used in C it’s equivalent to mull
, and gcc does use imull
for unsigned multiplication here. From the Intel manuals:
“[These] forms [of
imul
] may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to determine if the upper half of the result is non-zero.”
QUESTION 3E. What might this function print? Circle all that apply.
0 0
1 1
3 4
4 5
6 8
- None of the above
Choice #3 (3 4
) only. The function searches for a solution to
x
2
+ y
2
== z
2
, under the
constraint that x ≤ y
. When it finds one, it prints x
and y
and
then returns. It always starts from 1 1
and increments x
and y
one
at a time, so it can only print 3 4
.
4. Disassembly II
The questions in this section concern a function called ensmallen
,
which has the following assembly.
ensmallen:
1.
pushl %ebx
2.
movl 8(%esp), %ebx
3.
movl 12(%esp), %eax
4.
movb (%eax), %cl
5.
movb %cl, (%ebx)
6.
testb %cl, %cl
7.
jne .L34
8.
jmp .L26
9.
.L29:
10.
incl %eax
11.
.L34:
12.
movb (%eax), %dl
13.
cmpb %dl, %cl
14.
je .L29
15.
incl %ebx
16.
movb %dl, %cl
17.
movb %cl, (%ebx)
18.
testb %cl, %cl
19.
jne .L34
20.
.L26:
21.
popl %ebx
22.
ret
QUESTION 4A. How many arguments is this function likely to take? Give line numbers that helped you determine an answer.
2. Lines 2 & 3
QUESTION 4B. Are the argument(s) pointers? Give line numbers that helped you determine an answer.
Yes. Lines 4, 5, 12, 17
QUESTION 4C. What type(s) are the argument(s) likely to have? Give line numbers that helped you determine an answer.
[unsigned] char*`. Lines 4, 5, 12, 17: `movb
QUESTION 4D. Write a likely signature for the function. Use return
type void
.
void ensmallen(char* a, char* b)
QUESTION 4E. Write an alternate likely signature for the function,
different from your last answer. Again, use return type void
.
void ensmallen(char* a, const char* b)
void ensmallen(unsigned char* a, unsigned char* b)
void ensmallen(void* dst, const void* src)
etc., etc.
QUESTION 4F. Which callee-saved registers does this function use? Give line numbers that helped you determine an answer.
%ebx; lines 1, 21. (%esp also counts as callee-saved.)
QUESTION 4G. The function has an “input” and an “output”. Give an
“input” that would cause the CPU to jump from line 8 to label .L26
,
and describe what is placed in the “output” for that “input”.
The input is an empty string (""
), and the function puts an empty
string in the output.
You might think the function’s output was the value of its %eax register
what it returned. But remember that functions without return values can
also use %eax, and we told you above that this function’s return type is
void
! ensmallen
’s “output” is most likely the string pointed to by
its first parameter. In that sense ensmallen
is sort of like strcpy
or memcpy
.
QUESTION 4H. Give an “input” for which the corresponding “output” is not a copy of the “input”. Your answer must differ from the previous answer.
"aaaa"
(output is "a"
); any string that has adjacent characters that
are the same
QUESTION 4I. Write C code corresponding to this function. Make it as compact as you can.
void ensmallen(char* dst, const char* src) {
while ((*dst = *src)) {
while (*dst == *src)
++src;
++dst;
}
}
Or, a little less compactly:
void ensmallen(char* dst, const char* src) {
while (*src) {
*dst = *src;
while (*src == *dst)
++src;
++dst;
}
*dst = 0;
}
5. Machine programming
Intel really messed up this time. They’ve released a processor, the Fartium Core Trio, where every instruction is broken except the ones on this list.
1. | cmpl %ecx, %edx |
2. | decl %edx |
3. | incl %eax |
4. | je L1 |
5. | jl L2 |
6. | jmp L3 |
7. | movl 4(%esp), %ecx [movc] |
8. | movl 8(%esp), %edx [movd] |
9. | movl (%ecx,%eax,4), %ecx [movx] |
10. | ret |
11. | xchgl %eax, %ecx |
12. | xorl %eax, %eax |
(In case you forgot, xchgl
swaps two values—here, the values in two
registers—without modifying condition codes.)
“So what if it’s buggy,” says Andy Grove; “it can still run programs.” For instance, he argues convincingly that this function:
void do_nothing(void) {
}
is implemented correctly by this Fartium instruction sequence:
ret
Your job is to implement more complex functions using only Fartium
instructions. Your implementations must have the same semantics as the C
functions, but may perform much worse than one might expect. You may
leave off arguments and write instruction numbers (#1–12) or instruction
names (for mov
, use the bracketed abbreviations). Indicate where
labels L1–L3
point (if you need them). Assume that on function entry,
the stack is set up as on a normal x86.
QUESTION 5A.
int return_zero(void) {
return 0;
}
xorl %eax, %eax; ret
. (#12, #10)
%eax
has unknown value when a function begins, so we need to clear it.
QUESTION 5B.
int identity(int a) {
return a;
}
movl 4(%esp), %ecx; xchgl %eax, %ecx; ret
. (#7, #11, #10)
At function entry, the value on the top of the stack, at
(%esp) = 0(%esp)
, is the return address. Arguments are stored
immediately above that, so 4(%esp)
is the first argument.
QUESTION 5C.
void infinite_loop(void) {
while (1)
/* do nothing */;
}
L3: jmp L3
. (L3: #6)
QUESTION 5D.
typedef struct point {
int x;
int y;
int z;
} point;
int extract_z(point *p) {
return p->z;
}
movl 4(%esp), %ecx
xorl %eax, %eax
incl %eax
incl %eax
movl (%ecx,%eax,4), %ecx
xchgl %eax, %ecx
ret
(#7 #12 #3 #3 #9 #11 #10)
The value we want is located 8 bytes after the p
pointer. In x86
assembly, this is written 8(%register_containing_p)
. Only one Fartium
instruction could work, namely #9, movl (%ecx,%eax,4), %ecx
. (The
other indirect loads use %esp
as a base, but we aren’t given an
instruction that could modify %esp
the way we need to.) This format
uses the address %ecx + 4*%eax
, so we must load %eax
with 2
.
So much for the easy ones. Now complete one out of the following 3 questions, or more than one for extra credit. (Question 5G is worth more than the others.)
QUESTION 5E. [Reminder: Complete at least one of 5E–5G for full credit.]
int add(int a, int b) {
return a + b;
}
movl 4(%esp), %ecx # %ecx := a
movl 8(%esp), %edx # %edx := b
xorl %eax, %eax # %eax := 0
xchgl %eax, %ecx # now %eax == a and %ecx == 0
L3: cmpl %ecx, %edx # compare %edx and %ecx (which is 0)
je L1 # "if %edx == 0 goto L1"
incl %eax # ++%eax
decl %edx # --%edx
jmp L3
L1: ret
(#7 #8 #12 #11 L3: #1 #4 #3 #2 #6 L1: #10)
The loop at L3
executes b
times, incrementing %eax
each time.
Here’s morally equivalent C:
int add(int a, int b) {
while (b != 0) {
++a; --b;
}
return a;
}
This takes a long time if b < 0
, but it does work! We saw many other
correct answers. Common errors included comparing incrementing a
and
decrementing b
, something like this:
int add(int a, int b) {
int result = a;
while (b < a) {
++result; ++a; --b;
}
return a;
}
But in this design a
and b
can pass each other, and result
is
incremented half as many times as it should be.
QUESTION 5F. [Reminder: Complete at least one of 5E–5G for full credit.]
int array_dereference(int *a, int i) {
return a[i];
}
movl 8(%esp), %edx # %edx := i
xorl %eax, %eax # %eax := 0
L3: xchgl %eax, %ecx
cmpl %ecx, %edx # compare %edx and %ecx
xchgl %eax, %ecx
je L1 # "if %eax == i goto L1"
incl %eax # ++%eax
jmp L3
L1: movl 4(%esp), %ecx # %ecx := a
movl (%ecx,%eax,4), %ecx # %ecx := a[i]
xchgl %eax, %ecx
ret
(#8 #12 L3: #11 #1 #11 #4 #3 #6 L1: #7 #9 #11 #10)
QUESTION 5G. [Reminder: Complete at least one of 5E–5G for full credit.]
int traverse_array_tree(int *a, int x) {
int i = 0;
while (1) {
if (x == a[i])
return i;
else if (x < a[i])
i = a[i+1];
else
i = a[i+2];
}
}
(This funky function traverses a binary tree that’s represented as an
array of ints. It returns the position of the x
argument in this
“tree.” For example, given the following array:
int a[] = {100, 3, 6, 50, 9, 12, 150, 0, 0, 25, 0, 0, 80, 0, 0};
the call traverse_array_tree(a, 100)
returns 0, because that’s the
position of 100
in a
. The call traverse_array_tree(a, 80)
first
examines position 0; since a[0] == 100
and 80 < 100
, it jumps to
position a[0+1] == 3
; since a[3] == 50
and 80 > 50
, it jumps to
position a[3+2] == 12
; and then it returns 12, since a[12] == 80
.
The code breaks if x
isn’t in the tree; don’t worry about that.)
movl 8(%esp), %edx # %edx := x
` xorl %eax, %eax # %eax := 0 (holds `i`) `
L3: movl 4(%esp), %ecx # %ecx := a
movl (%ecx,%eax,4), %ecx # %ecx := a[i]
cmpl %ecx, %edx # compare x and a[i]
je L1 # "if x == a[i] goto L1"
jl L2 # "if x < a[i] goto L2"
incl %eax
movl 4(%esp), %ecx
movl (%ecx,%eax,4), %ecx
xchgl %eax, %ecx # i := a[i+1]
jmp L3
L2: incl %eax
incl %eax
movl 4(%esp), %ecx
movl (%ecx,%eax,4), %ecx
xchgl %eax, %ecx # i := a[i+2]
jmp L3
L1: ret # return i
(#8 #12 L3: #7 #9 #1 #4 #5 #3 #7 #9 #11 #6 L2: #3 #3 #7 #9 #11 #6 L1: #10)
We accepted solutions that misinterpreted the order of arguments for
cmpl
. (In AT&T syntax, “cmpl %ecx, %edx
” performs the subtraction
%edx − %ecx
, so after such a comparison, jl
will branch if
%edx < %ecx
.)
6. Virtual memory
QUESTION 6A. What is the x86 page size? Circle all that apply.
- 4096 bytes
- 64 cache lines
- 512 words
0x1000
bytes- 216 bits
- None of the above
#1, #2, #4. The cache line size is 64 = 26, and 26×26 = 212. The word size is 4; 512×4 = 2048, not 4096. There are 8 bits per byte; 216/8 = 213, not 212.
The following questions concern the sizes of page tables. Answer the questions in units of pages. For instance, the page tables in WeensyOS (Assignment 4) each contained one level-1 page table page and one level-2 page table page, for a total size of 2 pages per page table.
QUESTION 6B. What is the maximum size (in pages) of an x86 page table?
210 level-2 page table pages+ 1 level-1 page table page = 1025.
Despite the example above, many people misinterpreted this question as including the physical pages referenced by a page directory, and came up with answers like 220.
QUESTION 6C. What is the minimum size (in pages) of an x86 page table that would allow a process to access 222 distinct physical addresses?
One.
Whaaat?! Think about a page directory page where one of the entries referred to the page directory page itself, and the other entries referred to different pages. Like this PDP:
Physical |
Index |
(Physical |
Contents |
---|---|---|---|
|
0 |
|
|
1 |
|
|
|
2 |
|
|
|
3 |
|
|
|
... |
|||
1023 |
|
|
With this page directory in force, the 222 virtual addresses
0x0
through 0x3FFFFF
(which all have PDI 0) access the
222 distinct physical addresses 0x1000
through 0x400FFF
.
The x86 architecture we discussed in class has 32-bit virtual addresses and 32-bit physical addresses. Extensions to the x86 architecture have increased both these limits.
- Physical Address Extensions (PAE) allow 32-bit machines to access up to 252 bytes of physical memory (which is about 4000000 GB). That is, virtual addresses are 32 bits, and physical addresses are 52 bits.
- The x86-64 architecture evolves the x86 architecture to a 64-bit word size. x86-64 pointers are 64 bits wide instead of 32. However, only 48 of those bits are meaningful: the upper 16 bits of each virtual address are ignored. Thus, virtual addresses are 48 bits. As with PAE, physical addresses are 52 bits.
QUESTION 6D. Which of these two machines would support a higher number of concurrent processes?
- x86 with PAE with 100 GB of physical memory.
- x86-64 with 20 GB of physical memory.
#1 x86 with PAE. Each concurrent process occupies some space in physical memory, and #1 has more physical memory.
(Real operating systems swap, so either machine could support more processes than fit in virtual memory, but this would cause thrashing. #1 supports more processes before it starts thrashing.)
QUESTION 6E. Which of these two machines would support a higher maximum number of threads per process?
- x86 with PAE with 100 GB of physical memory.
- x86-64 with 20 GB of physical memory.
#2 x86-64. Each thread in a process needs some address space for its stack, and an x86-64 process address space is much bigger than an x86’s.
7. Virtual memory and kernel programming
These problems consider implementations of virtual memory features in a
WeensyOS-like operating system. Recall the signatures and specifications
of the virtual_memory_lookup
and virtual_memory_map
functions:
// virtual_memory_map(pagetable, va, pa, sz, perm)
` // Map virtual address range `[va, va+sz)` in `pagetable`. `
` // When `X >= 0 && X < sz`, the new pagetable will map virtual address `
` // `va+X` to physical address `pa+X` with permissions `perm`. `
//
// Preconditions:
` // * `va`, `pa`, and `sz` must be multiples of PAGESIZE (4096). `
// * The level-2 pagetables referenced by the virtual address range
` // must exist and be writable (e.g., `va + sz < MEMSIZE_VIRTUAL`). `
//
` // Typically `perm` is a combination of `PTE_P` (the memory is Present), `
` // `PTE_W` (the memory is Writable), and `PTE_U` (the memory may be `
` // accessed by User applications). If `!(perm & PTE_P)`, `pa` is ignored. `
void virtual_memory_map(pageentry_t* pagetable, uintptr_t va, uintptr_t pa, size_t sz, int perm);
// virtual_memory_lookup(pagetable, va)
` // Returns information about the mapping of the virtual address `va` in `
` // `pagetable`. The information is returned as a `vamapping` object, `
// which has the following components:
typedef struct vamapping {
int pn; // physical page number; -1 if unmapped
uintptr_t pa; // physical address; (uintptr_t) -1 if unmapped
int perm; // permissions; 0 if unmapped
} vamapping;
vamapping virtual_memory_lookup(pageentry_t* pagetable, uintptr_t va);
Also recall that WeensyOS tracks physical memory using an array of
pageinfo
structures:
typedef struct physical_pageinfo {
int8_t owner;
int8_t refcount; // 0 means the page is free
} physical_pageinfo;
static physical_pageinfo pageinfo[PAGENUMBER(MEMSIZE_PHYSICAL)];
The WeensyOS kernel occupies virtual addresses 0 through 0xFFFFF; the
process address space starts at PROC_START_ADDR
== 0x100000 and goes
up to (but not including) MEMSIZE_VIRTUAL
== 0x300000.
QUESTION 7A. True or false: On x86 Linux, like on WeensyOS, the kernel occupies low virtual addresses.
False
QUESTION 7B. On WeensyOS, which region of a process’s address space is closest to the kernel’s address space? Choose from code, data, stack, and heap.
Code
QUESTION 7C. On Linux on an x86 machine, which region of a process’s address space is closest to the kernel’s address space? Choose from code, data, stack, and heap.
Stack
Recall that the WeensyOS sys_page_alloc(addr)
system call allocates a
new physical page at the given virtual address. Here’s an example kernel
implementation of sys_page_alloc
, taken from the WeensyOS interrupt
function:
case INT_SYS_PAGE_ALLOC: {
uintptr_t addr = current->p_registers.reg_eax; // address is passed to kernel in %eax
//
[A]
int free_pn = find_free_physical_page();
if (free_pn < 0) { // no free physical pages
console_printf(CPOS(24, 0), 0x0C00, "Out of physical memory!\n");
current->p_registers.reg_eax = -1; // return result in %eax
break; // will call run(current)
}
//
[B]
// otherwise, allocate the page
assert(pageinfo[free_pn].refcount == 0);
pageinfo[free_pn].refcount += 1;
pageinfo[free_pn].owner = current->p_pid;
//
[C]
// and map it into the user’s address space
virtual_memory_map(current->p_pagetable, addr, PAGEADDRESS(free_pn), PAGESIZE, PTE_P | PTE_U | PTE_W);
current->p_registers.reg_eax = 0;
//
[D]
break;
}
QUESTION 7D. Thanks to insufficient checking, this implementation allows a WeensyOS process to crash the operating system or even take it over. This kernel is not isolated. What the kernel should do is return −1 when the calling process supplies bad arguments. Write code that, if executed at slot [A], would preserve kernel isolation and handle bad arguments correctly.
if (addr % PAGESIZE != 0 || addr < PROC_START_ADDR || addr >= MEMSIZE_VIRTUAL) {
current->p_registers.reg_eax = -1;
break;
}
QUESTION 7E. This implementation has another problem, which the following process would trigger:
void process_main(void) {
heap_top = ROUNDUP((uint8_t*) end, PAGESIZE); // first address in heap region
while (1) {
sys_page_alloc(heap_top);
sys_yield();
}
}
This process code repeatedly allocates a page at the same address. What should happen is that the kernel should repeatedly deallocate the old page and replace it with a newly-allocated page. But that’s not what will happen given the example implementation.
What will happen instead? And what is the name of this kind of problem?
Eventually the OS will run out of physical memory. At least it will
print “Out of physical memory!
” (that was in the code we provided).
This is a memory leak.
QUESTION 7F. Write code that would fix the problem, and name the
slot in the INT_SYS_PAGE_ALLOC
implementation where your code should
go.
vamapping vm = virtual_memory_lookup(current->p_pagetable, addr);
if (vm.perm)
pageinfo[vm.pn].refcount -= 1;
This goes in slot B or slot C. Slot D is too late; it would free the
newly mapped page. Slot A is too early, for a subtle reason. Imagine
that the page at addr
was shared with another process, so
pageinfo[vm.pn].refcount > 1
. Then, if there was no free memory, it
would be possible for the implementation to dereference the old page,
but fail to allocate a new page! This would break the kernel’s
invariants, since that the pageinfo
reference count would be one off
from the actual number of references in page tables.
8. Cost expressions
In the following questions, you will reason about the abstract costs of various operations, using the following tables of constants.
Table of Basic Costs
S | System call overhead (i.e., entering and exiting the kernel) |
F | Page fault cost (i.e., entering and exiting the kernel) |
P | Cost of allocating a new physical page |
M | Cost of installing a new page mapping |
B | Cost of copying a byte |
Table of Sizes
nk | Number of memory pages allocated to the kernel | |
Per-process sizes (defined for each process p) | ||
np | Number of memory pages allocated to p | |
rp | Number of read-only memory pages allocated to p | |
wp | = np − rp | Number of writable memory pages allocated to p |
mp | Number of memory pages actually modified by p after the previous fork() |
One of our tiny operating systems from class (OS02) included a program that called a recursive function. When the recursive function’s argument grew large enough, the stack pointer moved beyond the memory actually allocated for the stack, and the program crashed.
QUESTION 8A. In our first solution for this problem, the process
called the sys_page_alloc(void *addr)
system call, which allocated and
mapped a single new page at address addr
(the new stack page). Write
an expression for the cost of this sys_page_alloc()
system call in
terms of the constants above.
S + P + M
QUESTION 8B. Our second solution for this problem changed the operating system’s page fault handler. When a fault occurred in a process’s stack region, the operating system allocated a new page to cover the corresponding address and restarted the process. Write an expression for the cost of such a fault in terms of the constants above.
F + P + M
QUESTION 8C. Design a revised version of sys_page_alloc
that
supports batching. Give its signature and describe its behavior.
Example: sys_page_alloc(void *addr, int npages)
QUESTION 8D. Write an expression for the cost of a call to your batching allocation API.
Can vary; for this example, S + npages*(P + M)
In the remaining questions, a process p calls fork()
, which creates
a child process, c.
Assume that the base cost of performing a fork()
system call is Φ.
This cost includes the fork()
system call overhead (S), the overhead
of allocating a new process, the overhead of allocating a new page
directory with kernel mappings, and the overhead of copying registers.
But it does not include overhead from allocating, copying, or mapping
other memory.
QUESTION 8E. Consider the following implementations of fork()
:
A. | Naive fork: Copy all process memory (Assignment 4, Step 5). |
B. | Eager fork: Copy all writable process memory; share read-only process memory, such as code (Assignment 4, Step 6). |
C. | Copy-on-write fork: initially share all memory as read-only. Create writable copies later, on demand, in response to write faults (Assignment 4 extra credit). |
Which expression best represents the total cost of the fork()
system
call in process p, for each of these fork implementations? Only
consider the system call itself, not later copy-on-write faults.
(Note: Per-process variables, such as n, are defined for each process. So, for example, np is the number of pages allocated to the parent process p, and nc is the number of pages allocated to the child process c.)
- Φ
- Φ + np × M
- Φ + (np + wp) × M
- Φ + np × 212 × (B + F)
- Φ + np × (212B + P + M)
- Φ + np × (P + M)
- Φ + wp × (212B + P + M)
- Φ + np × (212B + P + M) − rp × (212B + P)
- Φ + np × M + mc × (P + F)
- Φ + np × M + mc × (212B + F + P)
- Φ + np × M + (mp+mc) × (P + F)
- Φ + np × M + (mp+mc) × (212B + F + P)
A: #5, B: #8 (good partial credit for #7), C: #2
QUESTION 8F. When would copy-on-write fork be more efficient than eager fork (meaning that the sum of all fork-related overheads, including faults for pages that were copied on write, would be less for copy-on-write fork than eager fork)? Circle the best answer.
- When np < nk.
- When wp × F < wp × (M + P).
- When mc × (F + M + P) < wp × (M + P).
- When (mp+mc) × (F + M + P + 212B) < wp × (P + 212B).
- When (mp+mc) × (F + P + 212B) < wp × (P + M + 212B).
- When mp < mc.
- None of the above.
#4
9. Processes
This question builds versions of the existing system calls based on new abstractions. Here are three system calls that define a new abstraction called a rendezvous.
int newrendezvous(void) Returns a rendezvous ID that hasn’t been used yet.
int rendezvous(int rid, int data) Blocks the calling process P1 until some other process P2 calls rendezvous() with the same rid (rendezvous ID). Then, both of the system calls return, but P1’s system call returns P2’s data and vice versa. Thus, the two processes swap their data. Rendezvous acts pairwise; if three processes call rendezvous, then two of them will swap values and the third will block, waiting for a fourth.
void freezerendezvous(int rid, int freezedata) Freezes the rendezvous rid. All future calls to rendezvous(rid, data) will immediately return freezedata.
Here's an example. The two columns represent two processes. Assume they are the only processes using rendezvous ID 0.
int result; |
int result; |
result = rendezvous(0, 5); |
printf("About to rendezvous\n"); |
result = rendezvous(0, 600); |
|
/* The processes swap data; | both become runnable */ |
printf("Process A got %d\n", result); |
printf("Process B got %d\n", result); |
This code will print
About to rendezvous
Process B got 5
Process A got 600
(the last 2 lines might appear in either order).
QUESTION 9A. How might you implement pipes in terms of rendezvous? Try to figure out analogues for the pipe(), close(), read(), and write() system calls (perhaps with different signatures), but only worry about reading and writing 1 character at a time.
QUESTION 9B. Can a rendezvous-pipe support all pipe features?
10. Process management
Here’s the skeleton of a shell function implementing a simple
two-command pipeline, such as “cmd1 | cmd2
”.
void simple_pipe(const char* cmd1, char* const* argv1, const char* cmd2, char* const* argv2) {
int pipefd[2], r, status;
[A]
pid_t child1 = fork();
if (child1 == 0) {
[B]
execvp(cmd1, argv1);
}
assert(child1 > 0);
[C]
pid_t child2 = fork();
if (child2 == 0) {
[D]
execvp(cmd2, argv2);
}
assert(child2 > 0);
[E]
}
And here is a grab bag of system calls.
[1] close(pipefd[0]);
[2] close(pipefd[1]);
[3] dup2(pipefd[0], STDIN_FILENO);
[4] dup2(pipefd[0], STDOUT_FILENO);
[5] dup2(pipefd[1], STDIN_FILENO);
[6] dup2(pipefd[1], STDOUT_FILENO);
[7] pipe(pipefd);
[8] r = waitpid(child1, &status, 0);
[9] r = waitpid(child2, &status, 0);
Your task is to assign system call IDs, such as “1
”, to slots, such as
“A
”, to achieve several behaviors, including a correct pipeline and
several incorrect pipelines. For each question:
- You may use each system call ID once, more than once, or not at all.
- You may use zero or more system call IDs per slot. Write them in the order they should appear in the code.
- You may assume that no signals are delivered to the shell process (so
no system call ever returns an
EINTR
error). - The function should wait for both commands in the pipeline to complete before returning.
- It may help to detach the last “Reference material” page of the exam.
QUESTION 10A. Implement a correct foreground pipeline.
|
|
|
|
|
---|---|---|---|---|
7 |
6, 1,
2 |
3, 1,
2 |
1, 2, 8, 9 |
|
OR |
||||
7 |
6, 1,
2 |
2 |
3, 1 |
1, 8, 9 |
QUESTION 10B. Implement a pipeline so that, given arguments
corresponding to “echo foo | wc -c
”, the wc
process reads “foo
”
from its standard input but does not exit thereafter. For partial
credit describe in words how this might happen.
Anything that doesn’t close the pipe’s write end will do it. Below we leave both ends of the pipe open in the shell. We could also enter just “3” in slot D.
A |
B (child1) |
C |
D (child2) |
E |
---|---|---|---|---|
7 | 6, 1, 2 | 3, 1, 2 | 8, 9 |
QUESTION 10C. Implement a pipeline so that, given arguments
corresponding to “echo foo | wc -c
”, “foo
” is printed to the
shell’s standard output and the wc
process prints “0
”. (In a
correctly implemented pipeline, “wc
” would print 4
, which is the
number of characters in “foo\n
”.) For partial credit describe in
words how this might happen.
Anything that doesn’t redirect the left-hand side’s standard output will
do it. It is important that the read end of the pipe be properly
redirected, or wc
would block reading from the shell’s standard
input.
|
|
|
|
|
---|---|---|---|---|
7 |
1, 2 |
3, 1, 2 |
1, 2, 8, 9 |
QUESTION 10D. Implement a pipeline that appears to work correctly on
“echo foo | wc -c
”, but always blocks forever if the left-hand command
outputs more than 65536 characters. For partial credit describe in
words how this might happen.
This happens when we execute the two sides of the pipe in series: first the left-hand side, then the right-hand side. Since the pipe contains 64KiB of buffering, this will often appear to work.
A |
B (child1) |
C |
D (child2) |
E |
---|---|---|---|---|
7 | 6, 1, 2 | 8 | 3, 1, 2 | 1, 2, 9 |
QUESTION 10E. Implement a pipeline so that, given arguments
corresponding to “echo foo | wc -c
”, both echo
and wc
report a
“Bad file descriptor” error. (This error, which corresponds to EBADF
,
is returned when a file descriptor is not valid or does not support the
requested operation.) For partial credit describe in words how this
might happen.
Given these system calls, the only way to make this happen is to redirect the wrong ends of the pipe into stdin/stdout.
A |
B (child1) |
C |
D (child2) |
E |
---|---|---|---|---|
7 | 4, 1, 2 | 5, 1, 2 | 1, 2, 8, 9 |
11. Networking
QUESTION 11A. Which of the following system calls should a programmer expect to sometimes block (i.e., to return after significant delay)? Circle all that apply.
1. socket |
5. connect |
|
2. read |
6. write |
|
3. accept |
7. usleep |
|
4. listen |
8. None of these |
#2 read
, #3 accept
, #5 connect
, (#6 write
), #7 usleep
.
(write
can definitely block if the reading end hasn’t caught up, but
we didn’t emphasize this in class, so no points off.)
QUESTION 11B. Below are seven message sequence diagrams demonstrating the operation of a client–server RPC protocol. A request such as “get(X)” means “fetch the value of the object named X”; the response contains that value. Match each network property or programming strategy below with the diagram with which it best corresponds. You will use every diagram once.
1. Loss | 4. Duplication | 7. Exponential backoff | ||
2. Delay | 5. Batching | |||
3. Reordering | 6. Prefetching |
#1—B, #2—C, #3—F, #4—D, #5—G, #6—A, #7—E (A—#6, B—#1, C—#2, D—#4, E—#7, F—#3, G—#5)
While G could also represent prefetching, A definitely does not represent batching at the RPC level—each RPC contains one request—so under the rule that each diagram is used once, we must say G is batching and A is prefetching.
A |
B |
C |
D |
E |
F |
G |
12. Threads
The following code performs a matrix multiplication, c = ab
, where
a
, b
, and c
are all square matrices of dimension sz
. It uses the
cache-friendly ikj index ordering.
#define MELT(matrix, sz, row, col) (matrix)[(row)*(sz) + (col)]
void matrix_multiply(double* c, const double* a, const double* b, size_t sz) {
for (size_t i = 0; i < sz; ++i)
for (size_t j = 0; j < sz; ++j)
MELT(c, sz, i, j) = 0;
for (size_t i = 0; i < sz; ++i)
for (size_t k = 0; k < sz; ++k)
for (size_t j = 0; j < sz; ++j)
MELT(c, sz, i, j) += MELT(a, sz, i, k) * MELT(b, sz, k, j);
}
But matrix multiplication is a naturally parallelizable problem. Here’s
some code that uses threads to multiply even faster on a multicore
machine. We use sz
parallel threads, one per row of c
.
typedef struct matrix_args {
double* c;
const double* a;
const double* b;
size_t sz;
size_t i;
} matrix_args;
void* matrix_multiply_ikj_thread(void* arg) {
(α) matrix_args* m = (matrix_args*) arg;
(β) for (size_t j = 0; j < m->sz; ++j)
(γ) MELT(m->c, m->sz, m->i, j) = 0;
(δ) for (size_t k = 0; k < m->sz; ++k)
(ε) for (size_t j = 0; j < m->sz; ++j)
(ζ) MELT(m->c, m->sz, m->i, j) += MELT(m->a, m->sz, m->i, k) * MELT(m->b, m->sz, k, j);
(η) return NULL;
}
void matrix_multiply_ikj(double* c, const double* a, const double* b, size_t sz) {
(1) pthread_t* threads = (pthread_t*) malloc(sizeof(pthread_t) * sz);
(2) for (size_t i = 0; i < sz; ++i) {
(3) matrix_args m = { c, a, b, sz, i };
(4) int r = pthread_create(&threads[i], NULL, &matrix_multiply_ikj_thread, &m);
(5) assert(r == 0);
(6) }
(7) for (size_t i = 0; i < sz; ++i)
(8) pthread_join(threads[i], NULL);
(9) free(threads);
}
But when run, this code gives wildly incorrect results.
QUESTION 12A. What is wrong? Describe why the problem is a synchronization issue.
The matrix_multiply_ikj
function starts many threads, each with its
own logically different set of matrix_args
. But matrix_multiply_ikj
allocates these matrix_args
structures on the stack! The m
variable is initialized on each loop iteration, but then the variable’s
stack space is immediately reused for the next loop iteration. It is
very likely that during a thread’s execution, its matrix_args
have
been replaced by other arguments. This is a synchronization issue
because the code would work correctly if matrix_multiply_ikj_thread
was called serially, rather than concurrently. The
matrix_multiply_ikj_thread
function must synchronize with
matrix_multiply_ikj
’s reuse of m
.
QUESTION 12B. Write C code showing how the problem could be fixed
with changes only to matrix_multiply_ikj
. Refer to the numbered lines
to indicate replacements and/or insertions. Use one or more additional
heap allocations and no additional calls to pthread functions. Free
any memory you allocate once it is safe to do so.
There are many solutions, but here’s one: we place each thread’s arguments in different, heap-allocated memory. This solves the synchronization issue by eliminating the memory reuse. New and changed lines are marked with ***.
void matrix_multiply_ikj(double *c, const double *a, const double *b, size_t sz) {
(1) pthread_t *threads = (pthread_t *) malloc(sizeof(pthread_t) * sz);
* matrix_args *marr = (matrix_args *) malloc(sizeof(matrix_args) * sz);
(2) for (size_t i = 0; i < sz; ++i) {
(3) matrix_args m = { c, a, b, sz, i };
* marr[i] = m;
* int r = pthread_create(&threads[i], NULL, &matrix_multiply_ikj_thread,
* &marr[i]);
(5) assert(r == 0);
(6) }
(7) for (size_t i = 0; i < sz; ++i)
(8) pthread_join(threads[i], NULL);
(9) free(threads);
* free(marr);
}
On single-core machines, the kij order performs almost as fast as the ikj order. Here’s a version of the parallel matrix multiplication code that uses kij.
typedef struct matrix_args_kij {
double* c;
const double* a;
const double* b;
size_t sz;
size_t k;
} matrix_args_kij;
void* matrix_multiply_kij_thread(void* arg) {
(α) matrix_args_kij* m = (matrix_args_kij*) arg;
(β) for (size_t i = 0; i < m->sz; ++i)
(γ) for (size_t j = 0; j < m->sz; ++j)
(δ) MELT(m->c, m->sz, i, j) += MELT(m->a, m->sz, i, m->k) * MELT(m->b, m->sz, m->k, j);
(ε) return NULL;
}
void matrix_multiply_kij(double* c, const double* a, const double* b, size_t sz) {
(1) pthread_t* threads = (pthread_t*) malloc(sizeof(pthread_t) * sz);
(2) for (size_t i = 0; i < sz; ++i)
(3) for (size_t j = 0; j < sz; ++j)
(4) MELT(c, sz, i, j) = 0;
(5) for (size_t k = 0; k < sz; ++k) {
(6) matrix_args_kij m = { c, a, b, sz, k };
(7) int r = pthread_create(&threads[k], NULL, &matrix_multiply_kij_thread, &m);
(8) assert(r == 0);
(9) }
(10) for (size_t k = 0; k < sz; ++k)
(11) pthread_join(threads[k], NULL);
(12) free(threads);
}
This problem has the same problem as the previous version, plus another problem. Even after your fix from 8A–8B is applied, this version produces incorrect results.
QUESTION 12C. What is the new problem? Describe why it is a synchronization issue.
The new problem is that now, different matrix_multiply_kij_thread
functions might try to modify the same matrix element at the
same time. This is a synchronization issue because the concurrent access
to a matrix element might cause some updates to get lost. In the
previous version, each matrix_multiply_ikj_thread
thread worked on a
different matrix row, so no synchronization was required.
QUESTION 12D. Write pseudocode or C code that fixes this problem. You should refer to pthread functions. For full credit your solution should have low contention.
The simplest solutions introduce mutual exclusion locking. This means different threads can’t modify the same matrix element simultaneously. To reduce contention, the locking should be fine-grained—for instance, there could be one lock per matrix element. But other tradeoffs are possible; one lock per matrix element is a lot; you could instead have one lock per matrix row or column.
Here’s working code, including the fix for the matrix_args
reuse
problem. (We didn’t require working code.)
typedef struct matrix_args_kij {
double *c;
const double *a;
const double *b;
* pthread_mutex_t *locks;
size_t sz;
size_t k;
} matrix_args;
void *matrix_multiply_kij_thread(void *arg) {
matrix_args_kij *m = (matrix_args_kij *) arg;
for (size_t i = 0; i < m->sz; ++i)
for (size_t j = 0; j < m->sz; ++j) {
* pthread_mutex_lock(&m->locks[i * m->sz + j]);
MELT(m->c, m->sz, i, j) += MELT(m->a, m->sz, i, m->k) * MELT(m->b, m->sz, m->k, j);
* pthread_mutex_unlock(&m->locks[i * m->sz + j]);
}
return NULL;
}
void matrix_multiply_kij(double *c, const double *a, const double *b, size_t sz) {
pthread_t *threads = (pthread_t *) malloc(sizeof(pthread_t) * sz);
* matrix_args_kij *marr = (matrix_args_kij *) malloc(sizeof(matrix_args_kij) * sz);
* pthread_mutex_t *locks = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t) * sz * sz);
for (size_t i = 0; i < sz; ++i)
for (size_t j = 0; j < sz; ++j) {
MELT(c, sz, i, j) = 0;
* pthread_mutex_init(&locks[i * sz + j], NULL);
}
for (size_t k = 0; k < sz; ++k) {
* matrix_args_kij m = { c, a, b, locks, sz, k };
* marr[k] = m;
* int r = pthread_create(&threads[k], NULL, &matrix_multiply_kij_thread,
&marr[k]);
assert(r == 0);
}
for (size_t k = 0; k < sz; ++k)
pthread_join(threads[k], NULL);
free(threads);
* free(marr);
* free(locks);
}
13. Synchronization and concurrency
Most synchronization objects have at least two operations. Mutual-exclusion locks support lock and unlock; condition variables support wait and signal; and from section notes you may remember the semaphore synchronization object, one of the earliest synchronization objects ever invented, which supports P and V.
In this problem, you’ll work with a synchronization object with only one operation, which we call a hemiphore. Hemiphores behave like the following; it is very important that you understand this pseudocode.
typedef struct hemiphore {
int value;
} hemiphore;
// Initialize the hemiphore to value 0.
void hemiphore_init(hemiphore* h) {
h->value = 0;
}
` // Block until the hemiphore has value >= `bound`, then ``**`atomically`**`` increment its value by `delta`. `
void H(hemiphore* h, int bound, int delta) {
// This is pseudocode; a real hemiphore implementation would block, not spin, and would
// ensure that the test and the increment happen in one atomic step.
while (h->value < bound)
sched_yield();
h->value += delta;
}
Once a hemiphore is initialized with hemiphore_init
, application code
should access the hemiphore only through the H
operation.
QUESTION 13A. Use hemiphores to implement mutual-exclusion locks.
Fill out the code below. (You may not need to fill in every empty slot.
You may use standard C constants; for example, INT_MIN
is the smallest
possible value for a variable of type int
, which on a 32-bit machine
is −2147483648.)
typedef struct mutex {
hemiphore h;
} mutex;
void mutex_init(mutex* m) {
hemiphore_init(&m->h);
}
void mutex_lock(mutex* m) {
H(&m->h, 0, -1);
}
void mutex_unlock(mutex* m) {
H(&m->h, -1, 1);
}
QUESTION 13B. Use hemiphores to implement condition variables. Fill
out the code below. You may assume that the implementation of mutex
is
your hemiphore-based implementation from above (so, for instance,
cond_wait
may access the hemiphore m->h
). See the Hints at the end
of the question.
typedef struct condvar {
mutex m;
hemiphore h;
int nwaiting;
} condvar;
void cond_init(condvar* c) {
mutex_init(&c->m);
hemiphore_init(&c->h);
c->nwaiting = 0;
}
void cond_signal(condvar* c) {
mutex_lock(&c->m);
if (c->nwaiting > 0) {
H(&c->h, INT_MIN, 1);
--c->nwaiting;
}
mutex_unlock(&c->m);
}
void cond_wait(condvar* c, mutex* m) {
mutex_lock(&c->m);
++c->nwaiting;
mutex_unlock(&c->m);
mutex_unlock(m);
H(&c->h, 0, -1);
mutex_lock(m);
}
The nwaiting
variable ensures that cond_signal(c)
does nothing if no
one is waiting. The mutex_unlock(m)
must happen before H
(which can
block); it must happen after mutex_lock(&c->m)
(to avoid sleep-wakeup
races).
The following code is broken if no one is waiting when signal
is
called, because it “stores” the signal for later. It works otherwise,
though—it even avoids sleep-wakeup races.
typedef struct condvar {
hemiphore h;
} condvar;
void cond_init(condvar* c) {
hemiphore_init(&c->h);
}
void cond_signal(condvar* c) {
H(&c->h, INT_MIN, 1);
}
void cond_wait(condvar* c, mutex* m) {
mutex_unlock(m);
H(&c->h, 0, -1);
mutex_lock(m);
}
Hints. For full credit:
- If no thread is waiting on condition variable
c
, thencond_signal(c)
should do nothing. - Assume N threads are waiting on condition variable
c
. Then N calls tocond_signal(c)
are both necessary and sufficient to wake them all up. - Your solution must not add new sleep-wakeup race conditions to the user’s code. (That is, no sleep-wakeup race conditions unless the user uses mutexes incorrectly.)
QUESTION 13C. Use pthread mutexes and condition variables to implement hemiphores. Fill out the code below. See the hints after the question.
typedef struct hemiphore {
pthread_mutex_t m;
int value;
pthread_cond_t c;
} hemiphore;
void hemiphore_init(hemiphore* h) {
pthread_mutex_init(&h->m);
h->value = 0;
pthread_cond_init(&h->c);
}
void H(hemiphore* h, int bound, int delta) {
pthread_mutex_lock(&h->m);
while (h->value < bound)
pthread_cond_wait(&h->c, &h->m);
h->value += delta;
pthread_cond_broadcast(&h->c);
pthread_mutex_unlock(&h->m);
}
The pthread_mutex_lock
s protect h->value
from access conflicts. It
is not correct to simply pthread_cond_signal(&h->c)
, since different
waiters might be waiting for different bounds (-1). You don’t need to
broadcast/signal
each time; if delta <= 0
there’s no point.
Hints. The pthread mutex and condition variable operations have the
following signatures. You should pass NULL
for any attributes
arguments. Don’t worry about the pthread_mutex_destroy
and
pthread_cond_destroy
operations, and feel free to abbreviate (e.g.
“lock
” instead of “pthread_mutex_lock
”).
pthread_mutex_init(pthread_mutex_t* m, const pthread_mutexattr_t* attributes)
pthread_mutex_lock(pthread_mutex_t* m)
pthread_mutex_unlock(pthread_mutex_t* m)
pthread_cond_init(pthread_cond_t* c, const pthread_condattr_t* attributes)
pthread_cond_signal(pthread_cond_t* c)
(wakes up at most one thread waiting onc
)pthread_cond_broadcast(pthread_cond_t* c)
(wakes up all threads waiting onc
)pthread_cond_wait(pthread_cond_t* c, pthread_mutex_t* m)
QUESTION 13D. Consider the following two threads, which use a shared
hemiphore h
with initial value 0.
Thread 1
Thread 2
H(&h, 1000, 1); while (1) {
printf("Thread 1 done\n"); H(&h, 0, 1);
H(&h, 0, -1);
}
Thread 2 will never block, and the hemiphore’s value will alternate
between 1 and 0. Thread 1 will never reach the printf
, because the
hemiphore’s value never reaches 1000. However, in most people’s first
implementation of hemiphores using pthread mutexes and condition
variables, Thread 1 will not block. Every call to H
in Thread 2 will
effectively wake up Thread 1. Though Thread 1 will then check the
hemiphore’s value and immediately go back to sleep, doing so wastes CPU
time.
Design an implementation of hemiphores using pthread mutexes and condition variables that solves this problem. In your revised implementation, Thread 1 above should block forever. For full credit, write C code. For partial credit, write pseudocode or English describing your design.
Hint. One working implementation constructs a linked list of “waiter” objects, where each waiter object is on a different thread’s stack, as initially sketched below. You can use such objects or not as you please.
This is a tough one.
typedef struct hemiphore_waiter {
struct hemiphore_waiter* next;
int bound;
pthread_cond_t c;
} hemiphore_waiter;
typedef struct hemiphore {
pthread_mutex_t m;
int value;
hemiphore_waiter* waiters;
} hemiphore;
void hemiphore_init(hemiphore* h) {
pthread_mutex_init(&h->m);
h->value = 0;
h->waiters = NULL;
}
void H(hemiphore* h, int bound, int delta) {
hemiphore_waiter w;
w.bound = bound;
`**`pthread_cond_init(&w.c);`**` // no points off if missing
pthread_mutex_lock(&h->m);
while (h->value < bound) {
w.next = h->waiters;
h->waiters = &w;
pthread_cond_wait(&w.c, &h->m);
}
h->value += delta;
// no points off for linked list messups
for (hemiphore_waiter** ww = &h->waiters; *ww; )
if (h->value >= (*ww)->bound) {
pthread_cond_signal(&(*ww)->c);
*ww = (*ww)->next;
} else
ww = &(*ww)->next;
pthread_mutex_unlock(&h->m);
}
Here’s a substantial-partial-credit solution that tracks the lowest bound anyone cares about.
typedef struct hemiphore {
pthread_mutex_t m;
int value;
int max_bound;
pthread_cond_t c;
} hemiphore;
void hemiphore_init(hemiphore* h) {
pthread_mutex_init(&h->m);
h->value = 0;
h->max_bound = INT_MIN;
pthread_cond_init(&h->c);
}
void H(hemiphore* h, int bound, int delta) {
pthread_mutex_lock(&h->m);
while (h->value < bound) {
if (h->max_bound < bound)
h->max_bound = bound;
pthread_cond_wait(&h->c, &h->m);
}
h->value += delta;
if (h->value >= h->max_bound) {
h->max_bound = INT_MIN;
pthread_cond_broadcast(&h->c);
}
pthread_mutex_unlock(&h->m);
}
14. Miscellany
QUESTION 14A. True or false: Any C arithmetic operation has a well-defined result.
False
QUESTION 14B. True or false: Any x86 processor instruction has a well-defined result.
True
QUESTION 14C. True or false: By executing a trap instruction, a process can force an operating system kernel to execute arbitrary code.
False
QUESTION 14D. True or false: By manipulating process memory and registers, an operating system kernel can force a process to execute arbitrary instructions.
True
QUESTION 14E. True or false: All signals are sent explicitly via the
kill()
system call.
False
QUESTION 14F. True or false: An operating system’s buffer cache is generally fully associative.
True
QUESTION 14G. True or false: The least-recently-used eviction policy is more useful for very large files that are read sequentially than it is for stacks.
False
QUESTION 14H. True or false: Making a cache bigger can lower its hit rate for a given workload.
True
QUESTION 14I. True or false: x86 processor caches are coherent (i.e., always appear to contain the most up-to-date values).
True
QUESTION 14J. True or false: A socket file descriptor supports either reading or writing, but not both.
False; it supports both