Final Sample Question Solutions

The final will be cumulative, though it will be weighted more towards the second half of the class. So why not check out:

This bank of questions is taken from prior midterms and finals. The course does change from year to year, so some of the questions may refer to concepts we did not emphasize this year, and some concepts we did emphasize this year may not be represented here.

The final will 3 hours long. It will be open-note, open-book, open-computer, semiopen-network, using rules very similar to those in the midterm.

FUN-1. Computer arithmetic

Bitwise operators and computer arithmetic can represent vectors of bits, which in turn are useful for representing sets. For example, say we have a function bit that maps elements to distinct bits; thus, bit(X) == (1 << u) for some u. Then a set {X0, X1, X2, …, Xn} can be represented as bit(X0) | bit(X1) | bit(X2) | … | bit(Xn). Element Xi is in the set with representation n if and only if (bit(Xi) & n) != 0.

QUESTION FUN-1A. What is the maximum number of set elements that can be represented in a single unsigned variable on an x86 machine?

QUESTION FUN-1B. Match each set operation with the C operator(s) that could implement that operation. (Complement is a unary operation.)

intersection	`==`
equality	`~`
complement	`&`
union	`^`
toggle membership (flip whether an element is in the set)	`\|`


intersection	`&`
equality	`==`
complement	`~`
union	`
toggle membership	`^`

QUESTION FUN-1C. Complete this function, which should return the set difference between the sets with representations a and b. This is the set containing exactly those elements of set a that are not in set b.

unsigned set_difference(unsigned a, unsigned b) {

return a & ~b;

OR   return a - (a & b);

OR return a & ~(a & b);

QUESTION FUN-1D. Below we’ve given a number of C expressions, some of their values, and some of their set representations for a set of elements. For example, the first row says that the integer value of expression 0 is just 0, which corresponds to an empty set. Fill in the blanks. This will require figuring out which bits correspond to the set elements A, B, C, and D, and the values for the 32-bit int variables a, x, and s. No arithmetic operation overflows; abs(x) returns the absolute value of x (that is, x < 0 ? -x : x).

Expression `e`	Integer value	Represented set
`0`	0	`{}`
`a == a`	1	`{A}`
`(unsigned) ~a < (unsigned) a`	1	`{A}`
`a < 0`	1	`{A}`
`(1 << (s/2)) - 1`	15	`{A,B,C,D}`
`a * a`	4	`{C}`
`abs(a)`	2	`{D}`
`x & (x - 1)`	0	`{}`
`x - 1`	3	`{A,D}`
`x`	4	`{C}`
`s`	8	`{B}`

FUN-2. Bit Tac Toe

Brenda Bitdiddle is implementing tic-tac-toe using bitwise arithmetic. Her implementation starts like this:

typedef struct {
    unsigned moves[2];
} tictactoe;
#define XS 0
#define OS 1

void tictactoe_init(tictactoe* b) {
    b->moves[XS] = b->moves[OS] = 0;
}

static const unsigned ttt_values[3][3] = {
    { 0x001, 0x002, 0x004 },
    { 0x010, 0x020, 0x040 },
    { 0x100, 0x200, 0x400 }
};

`     // Mark a move by player `p` at row `row` and column `col`. `
`     // Return 0 on success; return –1 if position `row,col` has already been used. `
    int tictactoe_move(tictactoe* b, int p, int row, int col) {

1. assert(row >= 0 && row < 3 && col >= 0 && col < 3); 2. assert(p == XS || p == OS); 3. /* TODO: check for position reuse */ 4. b->moves[p] |= ttt_values[row][col]; 5. return 0;

Each position on the board is assigned a distinct bit.

QUESTION FUN-2A. Brenda’s current code doesn’t check whether a move reuses a position. Write a snippet of C code that returns –1 if an attempted move is reusing a position. This snippet will replace line 3.

Lots of people misinterpreted this to mean the player reused their own position and ignored the other player. That mistake is allowed (no points off).

if ((b->moves[XS] | b->moves[OS]) & ttt_values[row][col])
    return -1;
OR if ((b->moves[XS] | b->moves[OS] | ttt_values[row][col]) == (b->moves[XS] | b->moves[OS]))
       return -1;
OR if ((b->moves[XS] + b->moves[OS]) & ttt_values[row][col])
       return -1;
OR if ((b->moves[p] ^ ttt_values[row][col]) < b->moves[p])
       return -1;

etc.

QUESTION FUN-2B. Complete the following function. You may use the following helper function:

int popcount(unsigned n) Return the number of 1 bits in n. (Stands for “population count”; is implemented on recent x86 processors by a single instruction, popcnt.)

For full credit, your code should consist of a single “return” statement with a simple expression, but for substantial partial credit write any correct solution.

// Return the number of moves that have happened so far.
int tictactoe_nmoves(const tictactoe* b) {

return popcount(b->moves[XS] | b->moves[OS]);

QUESTION FUN-2C. Write a simple expression that, if nonzero, indicates that player XS has a win on board b across the main diagonal (has marks in positions 0,0, 1,1, and 2,2).

(b->moves[XS] & 0x421) == 0x421

Lydia Davis notices Brenda’s code and has a brainstorm. “If you use different values,” she suggests, “it becomes easy to detect any win.” She suggests:

static const unsigned ttt_values[3][3] = {
    { 0x01001001, 0x00010002, 0x10100004 },
    { 0x00002010, 0x22020020, 0x00200040 },
    { 0x40004100, 0x00040200, 0x04400400 }
};

QUESTION FUN-2D. Repeat Question 1A for Lydia’s values: Write a snippet of C code that returns –1 if an attempted move is reusing a position. This snippet will replace line 3 in Brenda’s code.

if ((b->moves[XS] | b->moves[OS]) & ttt_values[row][col])
    return -1;

QUESTION FUN-2E. Repeat Question 1B for Lydia’s values: Use popcount to complete tictactoe_nmoves.

int tictactoe_nmoves(const tictactoe* b) {

return popcount((b->moves[0] | b->moves[1]) & 0x777);

    `**`OR`**` return popcount((b->moves[0] | b->moves[1]) & 0x777000);

}

QUESTION FUN-2F. Complete the following function for Lydia’s values. For full credit, your code should consist of a single “return” statement containing exactly two constants, but for substantial partial credit write any correct solution.

` // Return nonzero if player `p` has won, 0 if `p` has not won. `
int tictactoe_check_win(const tictactoe* b, int p) {
    assert(p == XS || p == OS);

return (b->moves[p] + 0x11111111) & 0x88888888;

// Here’s another amazing possibility (Allen Chen and others):

return b->moves[p] & (b->moves[p] << 1) & (b->moves[p] << 2);

FUN-3. Data Representation

Write the value of the variable or expression in each problem -- use signed decimal representation.

For example, if we gave you:

int i = 0xA;
int j = 0xFFFFFFFF;

you would write A) 10 B) -1

QUESTION FUN-3A. int i = 0xFFFF; (You may write this either in decimal or as an expression using a power of 2)

2¹⁶ − 1 or 65535

QUESTION FUN-3B. short s = 0xFFFF; (You may write this either in decimal or as an expression using a power of 2)

−1

QUESTION FUN-3C. unsigned u = 1 \<\< 10;

1024 (or 2¹⁰).

QUESTION FUN-3D. From WeensyOS: unsigned long l = PTE_P \| PTE_U;

QUESTION FUN-3E. int j = ~0;

−1

QUESTION FUN-3F. From WeensyOS: sizeof(x86_pagetable);

4096 or 2¹²

QUESTION FUN-3G. Given this structure:

struct s {
    char c;
    short s;
    long l;
};
struct s *ps;

This expression: sizeof(ps);

TRICK QUESTION! 8

QUESTION FUN-3H. Using the structure above: sizeof(\*ps);

QUESTION FUN-3I. unsigned char u = 0xABC;

0xBC == 11*16 + 12 == 160 + 16 + 12 == 188

QUESTION FUN-3J. signed char c = 0xABC;

0xBC has bit 0 on, so the value is less than zero. We seek the value x so that 0xBC + x == 0x100; the answer will equal −x. The answer is 0x44: 0xBC + 4 == 0xC0, and 0xC0 + 0x40 == 0x100. So −0x44 == −4*16 − 4 == −68.

FUN-4. Memory and Pointers

Two processes are mapping a file into their address space. The mapped file contains an unsorted linked list of integers. As the processes cannot ensure that the file will be mapped at the same virtual address, they use relative pointers to link elements in the list. A relative pointer holds not an address, but an offset that user code can use to calculate a true address. Our processes define the offset as relative to the start of the file.

Thus, each element in the linked list is represented by the following structure:

struct ll_node {
    int value;
    size_t offset;
};

offset == (size_t) -1 indicates the end of the list. Other offset values represent the position of the next item in the list, calculated relative to the start of the file.

QUESTION FUN-4A. Write a function to find an item in the list. The function's prototype is:

struct ll_node* find_element(void* mapped_file, struct ll_node* list, int value);

The mapped_file parameter is the address of the mapped file data; the list parameter is a pointer to the first node in the list; and the value parameter is the value for which we are searching. The function should return a pointer to the linked list element if the value appears in the list or NULL if the value is not in the list.

struct ll_node* find_element(void* mapped_file, struct ll_node* list, int value) {
    while (1) {
        if (list->value == value)
            return list;
        if (list->offset == (size_t) -1)
            return NULL;
        list = (struct ll_node*) ((char*) mapped_file + list->offset);
    }
}

ASM-1. Data structure assembly

Here are four assembly functions, f1 through f4.

f1:
    movl    4(%esp), %eax
    movl    8(%esp), %ecx
    testl   %ecx, %ecx
    jle .L2
    xorl    %edx, %edx
.L3:
    movl    4(%eax), %eax
    incl    %edx
    cmpl    %ecx, %edx
    jne .L3
.L2:
    movl    (%edx), %eax
    ret

f2:
    movl    8(%esp), %edx
    leal    0(,%edx,4), %ecx
    movl    4(%esp), %eax
    movl    (%eax,%ecx), %eax
    addl    %ecx, %eax
    movl    (%eax), %eax
    ret

f3:
    pushl   %esi
    pushl   %ebx
    movl    12(%esp), %ecx
    movl    16(%esp), %esi
    movl    20(%esp), %eax
    testl   %esi, %esi
    jle .L9
    xorl    %edx, %edx
.L10:
    movl    %eax, %ebx
    andl    $1, %ebx
    movl    4(%ecx,%ebx,4), %ecx
    incl    %edx
    sarl    %eax
    cmpl    %esi, %edx
    jne .L10
.L9:
    movl    (%ecx), %eax
    popl    %ebx
    popl    %esi
    ret

f4:
    movl    8(%esp), %edx
    movl    4(%esp), %eax
    movl    (%eax,%edx,4), %eax
    ret

QUESTION ASM-1A. Each function returns a value loaded from some data structure. Which function uses which data structure?

Array
Array of pointers to arrays
Linked list
Binary tree

Array—f4; Array of pointers to arrays—f2; Linked list—f1; Binary tree—f3

QUESTION ASM-1B. The array data structure is an array of type T. Considering the code for the function that manipulates the array, which of the following types are likely possibilities for T? Circle all that apply.

char
int
unsigned long
unsigned long long
char*
None of the above

int`, `unsigned long`, `char *

ASM-2. Where’s Waldo?

In the following questions, we give you C code and a portion of the assembly generated by some compiler for that code. (Sometimes we blank out a part of the assembly.) The C code contains a variable, constant, or function called waldo, and a point in the assembly is marked with asterisks ***. Your job is to find Waldo: write an assembly expression or constant that holds the value of waldo at the marked point. We’ve done the first one for you.

NON-QUESTION: Where’s Waldo?

int identity(int waldo) {
    return waldo;
}

00000000004007f6 `<identity>`:
  4007f6:       55                      push   %rbp
  4007f7:       48 89 e5                mov    %rsp,%rbp
  4007fa:       89 7d fc                mov    %edi,-0x4(%rbp)
  4007fd:       8b 45 fc                mov    -0x4(%rbp),%eax

***

  400800:       5d                      pop    %rbp
  400801:       c3                      retq

ANSWER: %edi, -0x4(%rbp), %eax, and %rax all hold the value of waldo at the marked point, so any of them is a valid answer. If the asterisks came before the first instruction, only %edi would work.

QUESTION ASM-2A: Where’s Waldo?

int f1(int a, int b, int waldo, int d) {
    if (a > b)
        return waldo;
    else
        return d;
}

0000000000400802 `<f1>`:

***

  400802:       55                      push   %rbp
  400803:       48 89 e5                mov    %rsp,%rbp
  400806:       89 7d fc                mov    %edi,-0x4(%rbp)
  400809:       89 75 f8                mov    %esi,-0x8(%rbp)
  40080c:       89 55 f4                mov    %edx,-0xc(%rbp)
  40080f:       89 4d f0                mov    %ecx,-0x10(%rbp)
  400812:       8b 45 fc                mov    -0x4(%rbp),%eax
  400815:       3b 45 f8                cmp    -0x8(%rbp),%eax
  400818:       7e 05                   jle    40081f <f1+0x1d>
  40081a:       8b 45 f4                mov    -0xc(%rbp),%eax
  40081d:       eb 03                   jmp    400822 <f1+0x20>
  40081f:       8b 45 f0                mov    -0x10(%rbp),%eax
  400822:       5d                      pop    %rbp
  400823:       c3                      retq

%edx

QUESTION ASM-2B: Where’s Waldo?

int int_array_get(int* a, int waldo) {
    int x = a[waldo];
    return x;
}

00000000004007d9 `<int_array_get>`:
INSTRUCTIONS OMITTED

***

 4007dc:       8b 04 b7                mov    (%rdi,%rsi,4),%eax
 4007df:       c3                      retq

%rsi

QUESTION ASM-2C: Where’s Waldo?

int matrix_get(int** matrix, int row, int col) {
    int* waldo = matrix[row];
    return waldo[col];
}

00000000004007e0 `<matrix_get>`:
 4007e0:       48 63 f6                movslq %esi,%rsi
 4007e3:       48 63 d2                movslq %edx,%rdx

***

 4007e6:       ?? ?? ?? ??             mov    ??,%rax
 4007ea:       8b 04 90                mov    (%rax,%rdx,4),%eax
 4007ed:       c3                      retq

(%rdi,%rsi,8)

QUESTION ASM-2D: Where’s Waldo?

int f5(int x) {
    extern int waldo(int);
    return waldo(x * 45);
}

0000000000400be0 `<f5>`:

***

 400be0:       6b ff 2d                imul   $0x2d,%edi,%edi
 400be3:       eb eb                   jmp    400bd0

0x400bd0

QUESTION ASM-2E: Where’s Waldo?

int factorial(int waldo) {
    if (waldo < 2)
        return 1;
    else
        return waldo * factorial(waldo - 1);
}

    0000000000400910 `<factorial>`:
     400910:       83 ff 01                cmp    $0x1,%edi
     400913:       b8 01 00 00 00          mov    $0x1,%eax
     400918:       7e 13                   jle    .L2 <factorial+0x1b>
     40091a:       [6 bytes of padding (a no-op instruction)]

.L1: ***

     400920:       0f af c7                imul   %edi,%eax
     400923:       83 ef 01                sub    $0x1,%edi
     400926:       83 ff 01                cmp    $0x1,%edi
     400929:       75 f5                   jne    .L1 <factorial+0x10>
.L2: 40092b:       f3 c3                   repz retq

%edi

QUESTION ASM-2F: Where’s Waldo?

Currently using 32-bit assembly

int binary_search(const char* needle, const char** haystack, unsigned sz) {
    unsigned waldo = 0, r = sz;
    while (waldo < r) {
        unsigned m = waldo + ((r - waldo) >> 1);
        if (strcmp(needle, haystack[m]) < 0)
            r = m;
        else if (strcmp(needle, haystack[m]) == 0)
            waldo = r = m;
        else
            waldo = m + 1;
    }
    return waldo;
}

80484ab `<binary_search>`:
     INSTRUCTIONS OMITTED
.L1: 80484c3:       89 fe                   mov    %edi,%esi
     80484c5:       29 de                   sub    %ebx,%esi
     80484c7:       d1 ee                   shr    %esi
     80484c9:       01 de                   add    %ebx,%esi
     80484cb:       8b 44 b5 00             mov    0x0(%ebp,%esi,4),%eax
     80484cf:       89 44 24 04             mov    %eax,0x4(%esp)
     80484d3:       8b 44 24 30             mov    0x30(%esp),%eax
     80484d7:       89 04 24                mov    %eax,(%esp)
     80484da:       e8 11 fe ff ff          call   80482f0 <strcmp@plt>
     80484df:       85 c0                   test   %eax,%eax
     80484e1:       78 09                   js     .L2 <binary_search+0x41>
     80484e3:       85 c0                   test   %eax,%eax
     80484e5:       74 13                   je     80484fa <binary_search+0x4f>

***

     80484e7:       8d 5e 01                lea    0x1(%esi),%ebx
     80484ea:       eb 02                   jmp    .L3 <binary_search+0x43>
.L2: 80484ec:       89 f7                   mov    %esi,%edi
.L3: 80484ee:       39 df                   cmp    %ebx,%edi
     80484f0:       77 d1                   ja     .L1 <binary_search+0x18>
     INSTRUCTIONS OMITTED

%ebx

In the remaining questions, you are given assembly compiled from one of the above functions by a different compiler, or at a different optimization level. Your goal is to figure out what C code corresponds to the given assembly.

QUESTION ASM-2G:

Currently using 32-bit assembly

804851d `<waldo>`:
804851d:       55                      push   %ebp
804851e:       89 e5                   mov    %esp,%ebp
8048520:       83 ec 18                sub    $0x18,%esp
8048523:       83 7d 08 01             cmpl   $0x1,0x8(%ebp)
8048527:       7f 07                   jg     8048530
8048529:       b8 01 00 00 00          mov    $0x1,%eax
804852e:       eb 10                   jmp    8048540
8048530:       8b 45 08                mov    0x8(%ebp),%eax
8048533:       48                      dec    %eax
8048534:       89 04 24                mov    %eax,(%esp)
8048537:       e8 e1 ff ff ff          call   804851d
804853c:       0f af 45 08             imul   0x8(%ebp),%eax
8048540:       c9                      leave  
8048541:       c3                      ret

What’s Waldo? Circle one.

f1
f5

matrix_get
permutation_compare

factorial
binary_search

5. factorial

QUESTION ASM-2H:

Currently using 32-bit assembly

8048425 `<waldo>`:
8048425:       55                      push   %ebp
8048426:       89 e5                   mov    %esp,%ebp
8048428:       8b 45 08                mov    0x8(%ebp),%eax
804842b:       3b 45 0c                cmp    0xc(%ebp),%eax
804842e:       7e 05                   jle    8048435 <waldo+0x10>
8048430:       8b 45 10                mov    0x10(%ebp),%eax
8048433:       eb 03                   jmp    8048438 <waldo+0x13>
8048435:       8b 45 14                mov    0x14(%ebp),%eax
8048438:       5d                      pop    %ebp
8048439:       c3                      ret

What’s Waldo? Circle one.

f1
f5

matrix_get
permutation_compare

factorial
binary_search

1. f1

QUESTION ASM-2I:

00000000004008b4 `<waldo>`:
 4008b4:       55                      push   %rbp
 4008b5:       48 89 e5                mov    %rsp,%rbp
 4008b8:       48 83 ec 10             sub    $0x10,%rsp
 4008bc:       89 7d fc                mov    %edi,-0x4(%rbp)
 4008bf:       8b 45 fc                mov    -0x4(%rbp),%eax
 4008c2:       6b c0 2d                imul   $0x2d,%eax,%eax
 4008c5:       89 c7                   mov    %eax,%edi
 4008c7:       e8 9e 05 00 00          callq  400e6a
 4008cc:       c9                      leaveq 
 4008cd:       c3                      retq   
80484a1 `<waldo>`:
80484a1:       55                      push   %ebp
80484a2:       89 e5                   mov    %esp,%ebp
80484a4:       83 ec 18                sub    $0x18,%esp
80484a7:       8b 55 08                mov    0x8(%ebp),%edx
80484aa:       89 d0                   mov    %edx,%eax
80484ac:       c1 e0 02                shl    $0x2,%eax
80484af:       01 d0                   add    %edx,%eax
80484b1:       01 c0                   add    %eax,%eax
80484b3:       01 d0                   add    %edx,%eax
80484b5:       c1 e0 02                shl    $0x2,%eax
80484b8:       01 d0                   add    %edx,%eax
80484ba:       89 04 24                mov    %eax,(%esp)
80484bd:       e8 2b 01 00 00          call   80485ed
80484c2:       c9                      leave  
80484c3:       c3                      ret

What’s Waldo? Circle one.

f1
f5

matrix_get
permutation_compare

factorial
binary_search

2. f5

ASM-3. Disassembly I

Here’s some assembly produced by compiling a C program with gcc.

.LC1:
    .string "%d %d\n"

        .globl  f
        .type   f, @function
f:
        pushq   %rbp
        movl    $1, %ecx
.L7:
        movl    %ecx, %r8d
        movl    $1, %edx
        imull   %ecx, %r8d
.L2:
        movl    %edx, %esi
        leal    (%rdx,%rcx), %edi
        movl    $1, %eax
        imull   %edx, %esi
        addl    %r8d, %esi
.L6:
        cmpl    %edi, %eax
        jg      .L10
        movl    %eax, %r9d
        imull   %eax, %r9d
        cmpl    %r9d, %esi
        je      .L3
        incl    %eax
        jmp     .L6
.L10:
        incl    %edx
        cmpl    %edx, %ecx
        jge     .L2
        incl    %ecx
        jmp     .L7
.L3:
        pushq   %rax
        movl    $.LC0, %esi
        movl    $1, %edi
        xorl    %eax, %eax
        call    __printf_chk
        movl    $1, %eax
        popq    %rdx
        popq    %rbp
        ret

QUESTION ASM-3A. How many arguments might this function have? Circle all that apply.

0
1
2
3 or more

All

QUESTION ASM-3B. What might this function return? Circle all that apply.

0
1
−1
Its first argument, whatever that argument is
A square number other than 0 or 1
None of the above

Choice #2 (1)

QUESTION ASM-3C. Of these registers, which are callee-saved registers that the function saves and restores? Circle all that apply.

%rbx
%rcx
%rdx
%rbp
%rsi
%rdi
%r12
None of the above

%rbp only; of the others, only %rbx and %r12 are callee-saved

QUESTION 3D. This function handles signed integers. If we changed the C source to use unsigned integers instead, which instructions would change? Circle all that apply.

movl
imull
addl
cmpl
je
jge
popl
None of the above

jge

We accepted circled imull or not! Although x86 imull is signed, in fact as used in C it’s equivalent to mull, and gcc does use imull for unsigned multiplication here. From the Intel manuals:

“[These] forms [of imul] may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to determine if the upper half of the result is non-zero.”

QUESTION 3E. What might this function print? Circle all that apply.

0 0
1 1
3 4
4 5
6 8
None of the above

Choice #3 (3 4) only. The function searches for a solution to x² + y² == z², under the constraint that x ≤ y. When it finds one, it prints x and y and then returns. It always starts from 1 1 and increments x and y one at a time, so it can only print 3 4.

ASM-4. Disassembly II

The questions in this section concern a function called ensmallen, which has the following assembly.

 ensmallen:

1. movzbl (%rsi), %edx 2. testb %dl, %dl 3. movb %dl, (%rdi) 4. jne .L22 5. jmp .L23 6. .L18: 7. addq $1, %rsi 8. .L22: 9. movzbl (%rsi), %eax 10. cmpb %dl, %al 11. je .L18 12. addq $1, %rdi 13. testb %al, %al 14. movb %al, (%rdi) 15. je .L23 16. movl %eax, %edx 17. jmp .L22 18. .L23: 19. ret

QUESTION ASM-4A. How many arguments is this function likely to take? Give line numbers that helped you determine an answer.

2. Lines 1 & 3

QUESTION ASM-4B. Are the argument(s) pointers? Give line numbers that helped you determine an answer.

Yes. Lines 1, 3, 9, 14

QUESTION ASM-4C. What type(s) are the argument(s) likely to have? Give line numbers that helped you determine an answer.

unsigned char*. Lines 1, 3, 9, and 14 are byte-moving instructions. The z in movzbl (Lines 1 and 9) indicates zero-extension, i.e., unsigned char. But char* is possible too; the characters are only compared for equality with each other (Line 10) or zero (Lines 2/4 and 13/15), so we can’t really distinguish signed from unsigned.

QUESTION ASM-4D. Write a likely signature for the function. Use return type void.

void ensmallen(unsigned char* a, unsigned char* b)

QUESTION ASM-4E. Write an alternate likely signature for the function, different from your last answer. Again, use return type void.

void ensmallen(unsigned char* a, const unsigned char* b)
void ensmallen(char* a, char* b)
void ensmallen(void* dst, const void* src)
etc., etc.

QUESTION ASM-4F. Which callee-saved registers does this function use? Give line numbers that helped you determine an answer.

None except possibly %rsp (no callee-saved registers are referenced in the code).

QUESTION ASM-4G. The function has an “input” and an “output”. Give an “input” that would cause the CPU to jump from line 8 to label .L26, and describe what is placed in the “output” for that “input”.

The input is an empty string (""), and the function puts an empty string in the output.

You might think the function’s output was the value of its %eax register what it returned. But remember that functions without return values can also use %eax, and we told you above that this function’s return type is void! ensmallen’s “output” is most likely the string pointed to by its first parameter. In that sense ensmallen is sort of like strcpy or memcpy.

QUESTION ASM-4H. Give an “input” for which the corresponding “output” is not a copy of the “input”. Your answer must differ from the previous answer.

"aaaa" (output is "a"); any string that has adjacent characters that are the same

QUESTION ASM-4I. Write C code corresponding to this function. Make it as compact as you can.

void ensmallen(char* dst, const char* src) {
    while ((*dst = *src)) {
        while (*dst == *src)
            ++src;
        ++dst;
    }
}

Or, a little less compactly:

void ensmallen(char* dst, const char* src) {
    while (*src) {
        *dst = *src;
        while (*src == *dst)
            ++src;
        ++dst;
    }
    *dst = 0;
}

ASM-5. Machine programming

Intel really messed up this time. They’ve released a processor, the Fartium Core Trio, where every instruction is broken except the ones on this list.


1.	`cmpl %ecx, %edx`
2.	`decl %edx`
3.	`incl %eax`
4.	`je L1`
5.	`jl L2`
6.	`jmp L3`
7.	`movl 4(%esp), %ecx` `[movc]`
8.	`movl 8(%esp), %edx` `[movd]`
9.	`movl (%ecx,%eax,4), %ecx` `[movx]`
10.	`ret`
11.	`xchgl %eax, %ecx`
12.	`xorl %eax, %eax`

(In case you forgot, xchgl swaps two values—here, the values in two registers—without modifying condition codes.)

“So what if it’s buggy,” says Andy Grove; “it can still run programs.” For instance, he argues convincingly that this function:

 void do_nothing(void) {
 }

is implemented correctly by this Fartium instruction sequence:

ret

Your job is to implement more complex functions using only Fartium instructions. Your implementations must have the same semantics as the C functions, but may perform much worse than one might expect. You may leave off arguments and write instruction numbers (#1–12) or instruction names (for mov, use the bracketed abbreviations). Indicate where labels L1–L3 point (if you need them). Assume that on function entry, the stack is set up as on a normal x86.

QUESTION ASM-5A.

 int return_zero(void) {
     return 0;
 }

xorl %eax, %eax; ret. (#12, #10)

%eax has unknown value when a function begins, so we need to clear it.

QUESTION ASM-5B.

 int identity(int a) {
     return a;
 }

movl 4(%esp), %ecx; xchgl %eax, %ecx; ret. (#7, #11, #10)

At function entry, the value on the top of the stack, at (%esp) = 0(%esp), is the return address. Arguments are stored immediately above that, so 4(%esp) is the first argument.

QUESTION ASM-5C.

 void infinite_loop(void) {
     while (1)
         /* do nothing */;
 }

L3: jmp L3. (L3: #6)

QUESTION ASM-5D.

 typedef struct point {
     int x;
     int y;
     int z;
 } point;
 
 int extract_z(point *p) {
     return p->z;
 }

 movl 4(%esp), %ecx
 xorl %eax, %eax
 incl %eax
 incl %eax
 movl (%ecx,%eax,4), %ecx
 xchgl %eax, %ecx
 ret

(#7 #12 #3 #3 #9 #11 #10)

The value we want is located 8 bytes after the p pointer. In x86 assembly, this is written 8(%register_containing_p). Only one Fartium instruction could work, namely #9, movl (%ecx,%eax,4), %ecx. (The other indirect loads use %esp as a base, but we aren’t given an instruction that could modify %esp the way we need to.) This format uses the address %ecx + 4*%eax, so we must load %eax with 2.

So much for the easy ones. Now complete one out of the following 3 questions, or more than one for extra credit. (Question 5G is worth more than the others.)

QUESTION ASM-5E.

 int add(int a, int b) {
     return a + b;
 }

     movl 4(%esp), %ecx      # %ecx := a
     movl 8(%esp), %edx      # %edx := b
     xorl %eax, %eax         # %eax := 0
     xchgl %eax, %ecx        # now %eax == a and %ecx == 0
 L3: cmpl %ecx, %edx         # compare %edx and %ecx (which is 0)
     je L1                   # "if %edx == 0 goto L1"
     incl %eax               # ++%eax
     decl %edx               # --%edx
     jmp L3
 L1: ret

(#7 #8 #12 #11 L3: #1 #4 #3 #2 #6 L1: #10)

The loop at L3 executes b times, incrementing %eax each time. Here’s morally equivalent C:

 int add(int a, int b) {
     while (b != 0) {
         ++a; --b;
     }
     return a;
 }

This takes a long time if b < 0, but it does work! We saw many other correct answers. Common errors included comparing incrementing a and decrementing b, something like this:

 int add(int a, int b) {
     int result = a;
     while (b < a) {
         ++result; ++a; --b;
     }
     return a;
 }

But in this design a and b can pass each other, and result is incremented half as many times as it should be.

QUESTION ASM-5F.

 int array_dereference(int *a, int i) {
     return a[i];
 }

     movl 8(%esp), %edx        # %edx := i
     xorl %eax, %eax           # %eax := 0
 L3: xchgl %eax, %ecx
     cmpl %ecx, %edx           # compare %edx and %ecx
     xchgl %eax, %ecx
     je L1                     # "if %eax == i goto L1"
     incl %eax                 # ++%eax
     jmp L3
 L1: movl 4(%esp), %ecx        # %ecx := a
     movl (%ecx,%eax,4), %ecx  # %ecx := a[i]
     xchgl %eax, %ecx
     ret

(#8 #12 L3: #11 #1 #11 #4 #3 #6 L1: #7 #9 #11 #10)

QUESTION ASM-5G.

 int traverse_array_tree(int *a, int x) {
     int i = 0;
     while (1) {
         if (x == a[i])
             return i;
         else if (x < a[i])
             i = a[i+1];
         else
             i = a[i+2];
     }
 }

(This funky function traverses a binary tree that’s represented as an array of ints. It returns the position of the x argument in this “tree.” For example, given the following array:

 int a[] = {100, 3, 6,  50, 9, 12,  150, 0, 0,  25, 0, 0,  80, 0, 0};

the call traverse_array_tree(a, 100) returns 0, because that’s the position of 100 in a. The call traverse_array_tree(a, 80) first examines position 0; since a[0] == 100 and 80 < 100, it jumps to position a[0+1] == 3; since a[3] == 50 and 80 > 50, it jumps to position a[3+2] == 12; and then it returns 12, since a[12] == 80. The code breaks if x isn’t in the tree; don’t worry about that.)

     movl 8(%esp), %edx        # %edx := x
`      xorl %eax, %eax           # %eax := 0 (holds `i`) `
 L3: movl 4(%esp), %ecx        # %ecx := a
     movl (%ecx,%eax,4), %ecx  # %ecx := a[i]
     cmpl %ecx, %edx           # compare x and a[i]
     je L1                     # "if x == a[i] goto L1"
     jl L2                     # "if x < a[i] goto L2"
 
     incl %eax
     movl 4(%esp), %ecx
     movl (%ecx,%eax,4), %ecx
     xchgl %eax, %ecx          # i := a[i+1]
     jmp L3
 
 L2: incl %eax
     incl %eax
     movl 4(%esp), %ecx
     movl (%ecx,%eax,4), %ecx
     xchgl %eax, %ecx          # i := a[i+2]
     jmp L3
 
 L1: ret                       # return i

(#8 #12 L3: #7 #9 #1 #4 #5 #3 #7 #9 #11 #6 L2: #3 #3 #7 #9 #11 #6 L1: #10)

We accepted solutions that misinterpreted the order of arguments for cmpl. (In AT&T syntax, “cmpl %ecx, %edx” performs the subtraction %edx − %ecx, so after such a comparison, jl will branch if %edx < %ecx.)

ASM-6. Program Layout

For the following questions, select the part(s) of memory from the list below that best describes where you will find the object.

heap
stack
between the heap and the stack
in a read-only data segment
in a text segment
in a read/write data segment
in a register

Assume the following code, compiled without optimization.

#include <errno.h>
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// The following is copied from stdio.h for your reference
#define EOF (-1)

 1    unsigned long
 2    fib (unsigned long n)
 3    {
 4        if (n < 2)
 5            return (n);
 6        return (fib(n - 1) + fib(n - 2));
 7    }
 8
 9    int
10    main(int argc, char *argv[])
11    {
12        extern int optind;
13        char ch;
14        unsigned long f, n;
15
16        /* Command line processing. */
17        while ((ch = getopt(argc, argv, "h")) != EOF)
18            switch (ch) {
19            case 'h':
20            case '?':
21            default:
22                return (usage());
23            }
24
25        argc -= optind;
26        argv += optind;
27
28        if (argc != 1)
29            return (usage());
30
31        n = strtoul(strdup(argv[0]), NULL, 10);
32        if (n == 0 && errno == EINVAL)
33            return (usage());
34
35        /* Now call one of the fib routines. */
36        f = fib(n);
37        printf("fib(%lu) = %lu\n", n, f);
38
39        return (0);
40    }

QUESTION ASM-6A. The string "fib(%lu) = %lu\n" (line 37).

Read-only data segment (text segment also acceptable)

QUESTION ASM-6B. optind (line 25).

Read/write data segment

QUESTION ASM-6C. When executing at line 4, where you will find the address to which fib returns.

Stack

QUESTION ASM-6D. Where will you find the value of EOF that is compared to the return value of getopt in line 17.

Register—although this register is likely to be hidden inside the processor, not one of the ones that have programmable names. Alternately, text segment, since the −1 will be encoded into some instruction.

QUESTION ASM-6E. getopt (line 17)

Text segment; alternately: Between the heap and the stack (because that’s where shared libraries tend to be loaded)

QUESTION ASM-6F. fib (lines 1-7)

Text segment

QUESTION ASM-6G. the variable f (line 36)

QUESTION ASM-6H. the string being passed to strtoul (line 31)

Heap

QUESTION ASM-6I. strdup (line 31)

Text segment or between heap & stack (same as getopt)

QUESTION ASM-6J. The value of the fib function when we return from fib (line 6).

ASM-7. Assembly and Data Structures

Consider the following assembly function.

func:
        xorl    %eax, %eax
        cmpb    $0, (%rdi)
        je      .L27
.L26:
        addq    $1, %rdi
        addl    $1, %eax
        cmpb    $0, (%rdi)
        jne     .L26
.L27:
        rep ret

QUESTION ASM-7A. How many parameters does this function appear to have?

QUESTION ASM-7B. What do you suppose the type of that parameter is?

const char\* (or const unsigned char\*, char\*, etc.)

QUESTION ASM-7C. Write C code that corresponds to it.

It’s strlen!

int strlen(const char* x) {
    int n = 0;
    for (; *x; ++x)
        ++n;
    return n;
}

KERN-1. Virtual memory

QUESTION KERN-1A. What is the x86 page size? Circle all that apply.

4096 bytes
64 cache lines
512 words
0x1000 bytes
2¹⁶ bits
None of the above

#1, #2, #4. The cache line size is 64 = 2⁶, and 2⁶×2⁶ = 2¹². The word size is 4; 512×4 = 2048, not 4096. There are 8 bits per byte; 2¹⁶/8 = 2¹³, not 2¹².

The following questions concern the sizes of page tables. Answer the questions in units of pages. For instance, the page directories in 32-bit x86 WeensyOS each contained one level-1 page table page and one level-2 page table page, for a total size of 2 pages per page table.

QUESTION KERN-1B. What is the maximum size (in pages) of a 32-bit x86 page table?

2¹⁰ level-2 page table pages+ 1 level-1 page table page = 1025.

Despite the example above, many people misinterpreted this question as including the physical pages referenced by a page directory, and came up with answers like 2²⁰.

QUESTION KERN-1C. What is the minimum size (in pages) of an x86 page table that would allow a process to access 2²² distinct physical addresses?

One.

Whaaat?! Think about a page directory page where one of the entries referred to the level-1 page table page itself, and the other entries referred to different pages. Like this:

Physical Page	Index	(Physical Address)	Contents
`0x1000`	0	`(0x1000)`	`0x1007`
	1	`(0x1004)`	`0x2007`
	2	`(0x1008)`	`0x3007`
	3	`(0x100C)`	`0x4007`
...
	1023	`(0x1FFC)`	`0x400007`

With this page directory in force, the 2²² virtual addresses 0x0 through 0x3FFFFF (which all have L1INDEX 0) access the 2²² distinct physical addresses 0x1000 through 0x400FFF.

The 32-bit x86 architecture has 32-bit virtual addresses and 32-bit physical addresses. Extensions to the x86 architecture have increased both these limits.

Physical Address Extensions (PAE) allow 32-bit machines to access up to 2⁵² bytes of physical memory (which is about 4000000 GB). That is, virtual addresses are 32 bits, and physical addresses are 52 bits.
The x86-64 architecture evolves the x86 architecture to a 64-bit word size. x86-64 pointers are 64 bits wide instead of 32. However, only 48 of those bits are meaningful: the upper 16 bits of each virtual address are ignored. Thus, virtual addresses are 48 bits. As with PAE, physical addresses are 52 bits.

QUESTION KERN-1D. Which of these two machines would support a higher number of concurrent processes?

x86 with PAE with 100 GB of physical memory.
x86-64 with 20 GB of physical memory.

#1 x86 with PAE. Each concurrent process occupies some space in physical memory, and #1 has more physical memory.

(Real operating systems swap, so either machine could support more processes than fit in virtual memory, but this would cause thrashing. #1 supports more processes before it starts thrashing.)

QUESTION KERN-1E. Which of these two machines would support a higher maximum number of threads per process?

x86 with PAE with 100 GB of physical memory.
x86-64 with 20 GB of physical memory.

#2 x86-64. Each thread in a process needs some address space for its stack, and an x86-64 process address space is much bigger than an x86’s.

KERN-2. Virtual memory and kernel programming

These problems consider implementations of virtual memory features in a WeensyOS-like operating system. Recall the signatures and specifications of the virtual_memory_lookup and virtual_memory_map functions:

// virtual_memory_map(pagetable, va, pa, sz, perm)
` //    Map virtual address range `[va, va+sz)` in `pagetable`. `
` //    When `X >= 0 && X < sz`, the new pagetable will map virtual address `
` //    `va+X` to physical address `pa+X` with permissions `perm`. `
//
//    Preconditions:
` //    * `va`, `pa`, and `sz` must be multiples of PAGESIZE (4096). `
//    * The level-2 pagetables referenced by the virtual address range
` //      must exist and be writable (e.g., `va + sz < MEMSIZE_VIRTUAL`). `
//
` //    Typically `perm` is a combination of `PTE_P` (the memory is Present), `
` //    `PTE_W` (the memory is Writable), and `PTE_U` (the memory may be `
` //    accessed by User applications). If `!(perm & PTE_P)`, `pa` is ignored. `
void virtual_memory_map(pageentry_t* pagetable, uintptr_t va, uintptr_t pa, size_t sz, int perm);

// virtual_memory_lookup(pagetable, va)
` //    Returns information about the mapping of the virtual address `va` in `
` //    `pagetable`. The information is returned as a `vamapping` object, `
//    which has the following components:
typedef struct vamapping {
    int pn;           // physical page number; -1 if unmapped
    uintptr_t pa;     // physical address; (uintptr_t) -1 if unmapped
    int perm;         // permissions; 0 if unmapped
} vamapping;

vamapping virtual_memory_lookup(pageentry_t* pagetable, uintptr_t va);

Also recall that WeensyOS tracks physical memory using an array of pageinfo structures:

typedef struct physical_pageinfo {
    int8_t owner;
    int8_t refcount; // 0 means the page is free
} physical_pageinfo;
static physical_pageinfo pageinfo[PAGENUMBER(MEMSIZE_PHYSICAL)];

The WeensyOS kernel occupies virtual addresses 0 through 0xFFFFF; the process address space starts at PROC_START_ADDR == 0x100000 and goes up to (but not including) MEMSIZE_VIRTUAL == 0x300000.

QUESTION KERN-2A. True or false: On x86 Linux, like on WeensyOS, the kernel occupies low virtual addresses.

False

QUESTION KERN-2B. On WeensyOS, which region of a process’s address space is closest to the kernel’s address space? Choose from code, data, stack, and heap.

Code

QUESTION KERN-2C. On Linux on an x86 machine, which region of a process’s address space is closest to the kernel’s address space? Choose from code, data, stack, and heap.

Stack

Recall that the WeensyOS sys_page_alloc(addr) system call allocates a new physical page at the given virtual address. Here’s an example kernel implementation of sys_page_alloc, taken from the WeensyOS interrupt function:

case INT_SYS_PAGE_ALLOC: {
    uintptr_t addr = current->p_registers.reg_eax; // address is passed to kernel in %eax

// [A]

    int free_pn = find_free_physical_page();
    if (free_pn < 0) { // no free physical pages
        console_printf(CPOS(24, 0), 0x0C00, "Out of physical memory!\n");
        current->p_registers.reg_eax = -1; // return result in %eax
        break; // will call run(current)
    }

// [B]

    // otherwise, allocate the page
    assert(pageinfo[free_pn].refcount == 0);
    pageinfo[free_pn].refcount += 1;
    pageinfo[free_pn].owner = current->p_pid;

// [C]

    // and map it into the user’s address space
    virtual_memory_map(current->p_pagetable, addr, PAGEADDRESS(free_pn), PAGESIZE, PTE_P | PTE_U | PTE_W);
    current->p_registers.reg_eax = 0;

// [D]

    break;
}

QUESTION KERN-2D. Thanks to insufficient checking, this implementation allows a WeensyOS process to crash the operating system or even take it over. This kernel is not isolated. What the kernel should do is return −1 when the calling process supplies bad arguments. Write code that, if executed at slot [A], would preserve kernel isolation and handle bad arguments correctly.

if (addr % PAGESIZE != 0 || addr < PROC_START_ADDR || addr >= MEMSIZE_VIRTUAL) {
    current->p_registers.reg_eax = -1;
    break;
}

QUESTION KERN-2E. This implementation has another problem, which the following process would trigger:

void process_main(void) {
    heap_top = ROUNDUP((uint8_t*) end, PAGESIZE); // first address in heap region
    while (1) {
        sys_page_alloc(heap_top);
        sys_yield();
    }
}

This process code repeatedly allocates a page at the same address. What should happen is that the kernel should repeatedly deallocate the old page and replace it with a newly-allocated page. But that’s not what will happen given the example implementation.

What will happen instead? And what is the name of this kind of problem?

Eventually the OS will run out of physical memory. At least it will print “Out of physical memory!” (that was in the code we provided). This is a memory leak.

QUESTION KERN-2F. Write code that would fix the problem, and name the slot in the INT_SYS_PAGE_ALLOC implementation where your code should go.

vamapping vm = virtual_memory_lookup(current->p_pagetable, addr);
if (vm.perm)
    pageinfo[vm.pn].refcount -= 1;

This goes in slot B or slot C. Slot D is too late; it would free the newly mapped page. Slot A is too early, for a subtle reason. Imagine that the page at addr was shared with another process, so pageinfo[vm.pn].refcount > 1. Then, if there was no free memory, it would be possible for the implementation to dereference the old page, but fail to allocate a new page! This would break the kernel’s invariants, since that the pageinfo reference count would be one off from the actual number of references in page tables.

KERN-3. Kernel programming

WeensyOS processes are quite isolated: the only way they can communicate is by using the console. Let’s design some system calls that will allow processes to explicitly share pages of memory. Then the processes can communicate by writing and reading the shared memory region. Here are two new WeensyOS system calls that allow minimal page sharing; they return 0 on success and –1 on error.

int share(pid_t p, void* addr) Allow process p to access the page at address addr.

int attach(pid_t p, void* remote_addr, void* local_addr) Access the page in process p’s address space at address remote_addr. That physical page is added to the calling process’s address space at address local_addr, replacing any page that was previously mapped there. It is an error if p has not shared the page at remote_addr with the calling process.

Here’s an initial implementation of these system calls, written as clauses in the WeensyOS kernel’s exception function.

case INT_SYS_SHARE: {
    pid_t p = current->p_registers.reg_eax;
    uintptr_t addr = current->p_registers.reg_ecx;

// [A]

    int shindex = current->p_nshared;
    if (shindex >= MAX_NSHARED)
        goto return_error;

// [B]

    ++current->p_nshared; 
    current->p_shared[shindex].sh_addr = addr;
    current->p_shared[shindex].sh_partner = p;
    current->p_registers.reg_eax = 0;
    break;
}

case INT_SYS_ATTACH: {
    pid_t p = current->p_registers.reg_eax;
    uintptr_t remote_addr = current->p_registers.reg_ecx;
    uintptr_t local_addr = current->p_registers.reg_edx;

// [C]

    int shindex = -1;
    for (int i = 0; i < processes[p].p_nshared; ++i)
        if (processes[p].p_shared[i].sh_addr == remote_addr
            && processes[p].p_shared[i].sh_partner == current->p_pid)
            shindex = i;
    if (shindex == -1)
        goto return_error;

// [D]

    vamapping vam = virtual_memory_lookup(processes[p].p_pagetable, remote_addr);

// [E]

    virtual_memory_map(current->p_pagetable, local_addr,
                       vam.pa, PAGESIZE, PTE_P|PTE_W|PTE_U);

// [F]

    current->p_registers.reg_eax = 0;
    break;
}

return_error:
    current->p_registers.reg_eax = -1;
    break;

Some notes:

The implementation stores sharing records in an array. A process may call share successfully at most MAX_NSHARED times. After that, its future share calls will return an error.
processes[p].p_nshared is initialized to 0 for all processes.
Assume that WeensyOS has been implemented as in Problem Set 4 up through step 6 (shared read-only memory).

QUESTION KERN-3A. True or false: Given this implementation, a single WeensyOS process can cause the kernel to crash simply by calling share one or more times (with no process ever calling attach). If true, give an example of a call or calls that would likely crash the kernel.

False

QUESTION KERN-3B. True or false: Given this implementation, a single WeensyOS process can cause the kernel to crash simply by calling attach one or more times (with no process ever calling share). If true, give an example of a call or calls that would likely crash the kernel.

True. If the user supplies an out-of-range process ID argument, the kernel will try to read out of bounds of the processes array. Example call: attach(0x1000000, 0, 0).

QUESTION KERN-3C. True or false: Given this implementation, WeensyOS processes 2 and 3 could work together to obtain write access to the kernel code located at address KERNEL_START_ADDR. If true, give an example of calls that would obtain this access.

True, since the attach and share code don’t check whether the user process is allowed to access its memory. An example:

#2: share(3, KERNEL_START_ADDR)
#3: attach(2, KERNEL_START_ADDR, 0x110000)

QUESTION KERN-3D. True or false: Given this implementation, WeensyOS processes 2 and 3 could work together to obtain write access to any memory, without crashing or modifying kernel code or data. If true, give an example of calls that would obtain access to a page mapped at address 0x110000 in process 5.

The best answer here is false. Processes are able to gain access to any page mapped in one of their page tables. But it’s not clear whether 5’s 0x110000 is mapped in either of the current process’s page tables. Now, 2 and 3 could first read the processes array (via share/attach) to find the physical address of 5’s page table; then, if 2 and 3 are in luck and the page table itself is mapped in their page table, they could read that page table to find the physical address of 0x110000; and then, if 2 and 3 are in luck again, map that page using the VA accessible in one of their page tables (which would differ from 0x110000). But that might not work.

QUESTION KERN-3E. True or false: Given this implementation, WeensyOS child processes 2 and 3 could work together to modify the code run by a their shared parent, process 1, without crashing or modifying kernel code or data. If true, give an example of calls that would obtain write access to process 1’s code, which is mapped at address PROC_START_ADDR.

True; since process code is shared after step 6, the children can map their own code read/write, and this is the same code as the parent’s.

#2: share(3, PROC_START_ADDR)
#3: attach(2, PROC_START_ADDR, PROC_START_ADDR)

QUESTION KERN-3F. Every “true” answer to the preceding questions is a bug in WeensyOS’s process isolation. Fix these bugs. Write code snippets that address these problems, and say where they go in the WeensyOS code (for instance, you could refer to bracketed letters to place your snippets); or for partial credit describe what your code should do.

Here’s one possibility.

Prevent share from sharing an invalid address:

[A] if ((addr & 0xFFF) || addr < PROC_START_ADDR)
        return -1;

NB don’t need to check addr < MEMSIZE_VIRTUAL as long as we check the return value from virtual_memory_lookup below (but that doesn’t hurt).

Prevent attach from accessing an invalid process or mapping at an invalid address:

[C] if (p >= NPROC || (local_addr & 0xFFF) || local_addr < PROC_START_ADDR || local_addr >= MEMSIZE_VIRTUAL)
        return -1;

We do need to check MEMSIZE_VIRTUAL here.

Check the mapping at remote_addr before installing it:

[E] if (!(vam.perm & PTE_U)
        return -1;
In virtual_memory_map: Use vam.perm instead of PTE_U|PTE_P|PTE_W

For greatest justice we would also fix a potential memory leak caused by attaching over an address that already had a page, but this isn’t necessary.

KERN-4. Teensy OS VM System

The folks at Teensy Computers, Inc, need your help with their VM system. The hardware team that developed the VM system abruptly left and the folks remaining aren't quite sure how VM works. I volunteered you to help them.

The Teensy machine has a 16-bit virtual address space with 4 KB pages. The Teensy hardware specifies a single-level page table. Each entry in the page table is 16-bits. Eight of those bits are reserved for the physical page number and 8 of the bits are reserved for flag values. Sadly, the hardware designers did not document what the bits do!

QUESTION KERN-4A. How many pages are in the Teensy virtual address space?

16 (2⁴)

QUESTION KERN-4B. How many bits comprise a physical address?

20 (8 bits of physical page number + 12 bits of page offset)

QUESTION KERN-4C. Is the physical address space larger or smaller than the virtual address space?

Larger!

QUESTION KERN-4D. Write, in hex, a PAGE_OFFSET_MASK (the value that when anded with an address returns the offset of the address on a page).

0xFFF

QUESTION KERN-4E. Write a C expression that takes a virtual address, in the variable vaddr, and returns the virtual page number.

(vaddr >> 12) OR ((vaddr) & 0xF000 >> 12)

You are now going to work with the Teensy engineers to figure out what those other bits in the page table entries mean! Fortunately, they have some engineering notes from the hardware team—they need your help in making sense of them. Each letter below has the contents of a note, state what you can conclude from that note about the lower 8 bits of the page table entries.

QUESTION KERN-4F. “Robin, I ran 8 tests using a kernel that did nothing other than loop infinitely -- for each test I set a different bit in all the PTEs of the page table. All of them ended up in the exception handler except for the one where I set bit 4. Any idea what this means?”

Bit 4 is the “preset/valid bit”, the equivalent of x86 PTE_P.

QUESTION KERN-4G. “Lynn, I'm writing a memory test that iterates over all of memory making sure that I can read back the same pattern I write into memory. If I don't set bit 7 of the page table entries to 1, I get permission faults. Do you know what might be happening?”

Bit 1 is the “writable bit”, the equivalent of x86 PTE_W.

QUESTION KERN-4H. “Pat, I almost have user level processes running! It seems that the user processes take permission faults unless I have both bit 4 and bit 3 set. Do you know why?”

Bit 3 is the “user/unprivileged bit”, the equivalent of x86 PTE_U.

KERN-5. Teensy OS Page Tables

The Teensy engineers are well on their way now, but they do have a few bugs and they need your help debugging the VM system. They hand you the following page table, using the notation we used for Assignment 6 for permissions, and need your help specifying correct behavior for the operations that follow.

Index	Physical Page Number	Permissions
0	0x00	PTE_U
1	0x01	PTE_P
2	0x02	PTE_P PTE_W
3	0x03	PTE_P PTE_U PTE_W
4	0xFF	PTE_U PTE_W
5	0xFE	PTE_U
6	0x80	PTE_W
7	0x92	PTE_P PTE_W PTE_U
8	0xAB	PTE_P PTE_W PTE_U
9	0x09	PTE_P PTE_U
10	0xFE	PTE_P PTE_U
11	0x00	PTE_W
12	0x11	PTE_U
Rest of PTEs follow and are all invalid

For each problem below, write either the physical address of the given virtual address or identify what fault would be produced. The fault types should be one of:

Invalid page access (there is no mapping for the requested page)
Privilege violation (user level process trying to access a supervisor page)
Permission violation (attempt to write a read-only page)

QUESTION KERN-5A. The kernel dereferences a NULL pointer

Invalid page access

QUESTION KERN-5B. A user process dereferences a NULL pointer

Invalid page access

QUESTION KERN-5C. The kernel writes to the address 0x8432

0xAB432

QUESTION KERN-5D. A user process writes to the address 0xB123

Invalid page access (when both PTE_P and PTE_U are missing, it's PTE_P that counts)

QUESTION KERN-5E. The kernel reads from the address 0x9876

0x09876

QUESTION KERN-5F. A user process reads from the address 0x7654

0x92654

QUESTION KERN-5G. A user process writes to the address 0xABCD

Permission violation

QUESTION KERN-5H. A user process writes to the address 0x2321

Privilege violation

IO-1. Cost expressions

In the following questions, you will reason about the abstract costs of various operations, using the following tables of constants.

Table of Basic Costs


S	System call overhead (i.e., entering and exiting the kernel)
F	Page fault cost (i.e., entering and exiting the kernel)
P	Cost of allocating a new physical page
M	Cost of installing a new page mapping
B	Cost of copying a byte

Table of Sizes


n_k		Number of memory pages allocated to the kernel
Per-process sizes (defined for each process p)
n_p		Number of memory pages allocated to p
r_p		Number of read-only memory pages allocated to p
w_p	= n_p − r_p	Number of writable memory pages allocated to p
m_p		Number of memory pages actually modified by p after the previous `fork()`

One of our tiny operating systems from class (OS02) included a program that called a recursive function. When the recursive function’s argument grew large enough, the stack pointer moved beyond the memory actually allocated for the stack, and the program crashed.

QUESTION IO-1A. In our first solution for this problem, the process called the sys_page_alloc(void *addr) system call, which allocated and mapped a single new page at address addr (the new stack page). Write an expression for the cost of this sys_page_alloc() system call in terms of the constants above.

S + P + M

QUESTION IO-1B. Our second solution for this problem changed the operating system’s page fault handler. When a fault occurred in a process’s stack region, the operating system allocated a new page to cover the corresponding address and restarted the process. Write an expression for the cost of such a fault in terms of the constants above.

F + P + M

QUESTION IO-1C. Design a revised version of sys_page_alloc that supports batching. Give its signature and describe its behavior.

Example: sys_page_alloc(void *addr, int npages)

QUESTION IO-1D. Write an expression for the cost of a call to your batching allocation API.

Can vary; for this example, S + npages*(P + M)

In the remaining questions, a process p calls fork(), which creates a child process, c.

Assume that the base cost of performing a fork() system call is Φ. This cost includes the fork() system call overhead (S), the overhead of allocating a new process, the overhead of allocating a new page directory with kernel mappings, and the overhead of copying registers. But it does not include overhead from allocating, copying, or mapping other memory.

QUESTION IO-1E. Consider the following implementations of fork():


A.	Naive fork: Copy all process memory (WeensyOS, Step 5).
B.	Eager fork: Copy all writable process memory; share read-only process memory, such as code (WeensyOS, Step 6).
C.	Copy-on-write fork: initially share all memory as read-only. Create writable copies later, on demand, in response to write faults (WeensyOS extra credit).

Which expression best represents the total cost of the fork() system call in process p, for each of these fork implementations? Only consider the system call itself, not later copy-on-write faults.

(Note: Per-process variables, such as n, are defined for each process. So, for example, n_p is the number of pages allocated to the parent process p, and n_c is the number of pages allocated to the child process c.)

Φ
Φ + n_p × M
Φ + (n_p + w_p) × M
Φ + n_p × 2¹² × (B + F)
Φ + n_p × (2¹²B + P + M)
Φ + n_p × (P + M)
Φ + w_p × (2¹²B + P + M)
Φ + n_p × (2¹²B + P + M) − r_p × (2¹²B + P)
Φ + n_p × M + m_c × (P + F)
Φ + n_p × M + m_c × (2¹²B + F + P)
Φ + n_p × M + (m_p+m_c) × (P + F)
Φ + n_p × M + (m_p+m_c) × (2¹²B + F + P)

A: #5, B: #8 (good partial credit for #7), C: #2

QUESTION IO-1F. When would copy-on-write fork be more efficient than eager fork (meaning that the sum of all fork-related overheads, including faults for pages that were copied on write, would be less for copy-on-write fork than eager fork)? Circle the best answer.

When n_p < n_k.
When w_p × F < w_p × (M + P).
When m_c × (F + M + P) < w_p × (M + P).
When (m_p+m_c) × (F + M + P + 2¹²B) < w_p × (P + 2¹²B).
When (m_p+m_c) × (F + P + 2¹²B) < w_p × (P + M + 2¹²B).
When m_p < m_c.
None of the above.

SH-1. Processes

This question builds versions of the existing system calls based on new abstractions. Here are three system calls that define a new abstraction called a rendezvous.

int newrendezvous(void) Returns a rendezvous ID that hasn’t been used yet.

int rendezvous(int rid, int data) Blocks the calling process P1 until some other process P2 calls rendezvous() with the same rid (rendezvous ID). Then, both of the system calls return, but P1’s system call returns P2’s data and vice versa. Thus, the two processes swap their data. Rendezvous acts pairwise; if three processes call rendezvous, then two of them will swap values and the third will block, waiting for a fourth.

void freezerendezvous(int rid, int freezedata) Freezes the rendezvous rid. All future calls to rendezvous(rid, data) will immediately return freezedata.

Here's an example. The two columns represent two processes. Assume they are the only processes using rendezvous ID 0.


`int result = rendezvous(0, 5);`	`printf("About to rendezvous\n");`
	`int result = rendezvous(0, 600);`
/* The processes swap data;	both become runnable */
`printf("Process A got %d\n", result);`	`printf("Process B got %d\n", result);`

This code will print

About to rendezvous
Process B got 5
Process A got 600

(the last 2 lines might appear in either order).

QUESTION SH-1A. How might you implement pipes in terms of rendezvous? Try to figure out analogues for the pipe(), close(), read(), and write() system calls (perhaps with different signatures), but only worry about reading and writing 1 character at a time.

Here’s one mapping.

pipe(): newrendezvous(). We use a rendezvous ID as the equivalent of a pipe file descriptor.
write(p, &ch, 1): To write a single character ch to the “pipe” p (that is, the rendezvous with ID p), call rendezvous(p, ch).
read(p, &ch, 1): To read a single character ch from the “pipe” p, call ch = rendezvous(p, -1).
close(p): To close the “pipe” p, call freezerendezvous(p, -1). Now all future read and write calls will return -1.

Most mappings will have these features.

QUESTION SH-1B. Can a rendezvous-pipe support all pipe features?

No. For example, a rendezvous-pipe doesn’t deliver a signal when a process tries to write to a closed pipe. Since the rendezvous-pipe doesn’t distinguish between read and write ends, and since rendezvous aren’t reference-counted like file descriptors, if a “writer” process exits without closing the rendezvous-pipe, a reader won’t get EOF when they read—it will instead block indefinitely. Unlike pipes, which like all file descriptors are protected from access by unrelated processes, rendezvous aren’t protected; anyone who can guess the rendezvous ID can use the rendezvous. Etc.

SH-2. Process management

Here’s the skeleton of a shell function implementing a simple two-command pipeline, such as “cmd1 | cmd2”.

void simple_pipe(const char* cmd1, char* const* argv1, const char* cmd2, char* const* argv2) {
    int pipefd[2], r, status;

[A]

    pid_t child1 = fork();
    if (child1 == 0) {

[B]

        execvp(cmd1, argv1);
    }
    assert(child1 > 0);

[C]

    pid_t child2 = fork();
    if (child2 == 0) {

[D]

        execvp(cmd2, argv2);
    }
    assert(child2 > 0);

[E]

And here is a grab bag of system calls.

[1] close(pipefd[0]);
[2] close(pipefd[1]);
[3] dup2(pipefd[0], STDIN_FILENO);
[4] dup2(pipefd[0], STDOUT_FILENO);
[5] dup2(pipefd[1], STDIN_FILENO);
[6] dup2(pipefd[1], STDOUT_FILENO);
[7] pipe(pipefd);
[8] r = waitpid(child1, &status, 0);
[9] r = waitpid(child2, &status, 0);

Your task is to assign system call IDs, such as “1”, to slots, such as “A”, to achieve several behaviors, including a correct pipeline and several incorrect pipelines. For each question:

You may use each system call ID once, more than once, or not at all.
You may use zero or more system call IDs per slot. Write them in the order they should appear in the code.
You may assume that no signals are delivered to the shell process (so no system call ever returns an EINTR error).
The function should wait for both commands in the pipeline to complete before returning.
It may help to detach the last “Reference material” page of the exam.

QUESTION SH-2A. Implement a correct foreground pipeline.

`A`	`B (child1)`	`C`	`D (child2)`	`E`
7	6, 1, 2 or 6, 2, 1 or 1, 6, 2		3, 1, 2 or 3, 2, 1 or 2, 3, 1	1, 2, 8, 9 or 2, 1, 9, 8, etc.
OR
7	6, 1, 2 or 6, 2, 1 or 1, 6, 2	2	3, 1	1, 8, 9 or 1, 9, 8

QUESTION SH-2B. Implement a pipeline so that, given arguments corresponding to “echo foo | wc -c”, the wc process reads “foo” from its standard input but does not exit thereafter. For partial credit describe in words how this might happen.

Anything that doesn’t close the pipe’s write end will do it. Below we leave both ends of the pipe open in the shell. We could also enter just “3” in slot D.

`A`	`B (child1)`	`C`	`D (child2)`	`E`
7	6, 1, 2		3, 1, 2	8, 9

QUESTION SH-2C. Implement a pipeline so that, given arguments corresponding to “echo foo | wc -c”, “foo” is printed to the shell’s standard output and the wc process prints “0”. (In a correctly implemented pipeline, “wc” would print 4, which is the number of characters in “foo\n”.) For partial credit describe in words how this might happen.

Anything that doesn’t redirect the left-hand side’s standard output will do it. It is important that the read end of the pipe be properly redirected, or wc would block reading from the shell’s standard input.

`A`	`B (child1)`	`C`	`D (child2)`	`E`
7	1, 2 (anything without 6)		3, 1, 2	1, 2, 8, 9

QUESTION SH-2D. Implement a pipeline that appears to work correctly on “echo foo | wc -c”, but always blocks forever if the left-hand command outputs more than 65536 characters. For partial credit describe in words how this might happen.

This happens when we execute the two sides of the pipe in series: first the left-hand side, then the right-hand side. Since the pipe contains 64KiB of buffering, this will often appear to work.

`A`	`B (child1)`	`C`	`D (child2)`	`E`
7	6, 1, 2	8	3, 1, 2	1, 2, 9

QUESTION SH-2E. Implement a pipeline so that, given arguments corresponding to “echo foo | wc -c”, both echo and wc report a “Bad file descriptor” error. (This error, which corresponds to EBADF, is returned when a file descriptor is not valid or does not support the requested operation.) For partial credit describe in words how this might happen.

Given these system calls, the only way to make this happen is to redirect the wrong ends of the pipe into stdin/stdout.

`A`	`B (child1)`	`C`	`D (child2)`	`E`
7	4, 1, 2		5, 1, 2	1, 2, 8, 9

SH-3. Processes

Consider the two programs shown below.

// Program 1
#include <stdio.h>
#include <unistd.h>
int
main(void)
{
    printf("PID %d running prog1\n", getpid());
}

// Program 2
#include <stdio.h>
#include <unistd.h>
int
main(void)
{
    char *argv[2];
    argv[0] = "prog1";
    argv[1] = NULL;
    printf("PID %d running prog2\n", getpid());
    int r = execv("./prog1", argv);
    printf("PID %d exiting from prog2\n", getpid());
}

QUESTION SH-3A. How many different PIDs will print out if you run Program 2?

1. exec doesn’t change the process’s PID.

QUESTION SH-3B. How many lines of output will you see?

2: “PID xxx running prog2” and “PID xxx running prog1”

Now, let's assume that we change Program 2 to the following:

// Program 2B
#include <stdio.h>
#include <unistd.h>
int
main(void)
{
    char* argv[2];
    argv[0] = "prog1";
    argv[1] = NULL;

    printf("PID %d running prog2\n", getpid());
    pid_t p = fork();
    if (p == 0) {
        int r = execv("./prog1", argv);
    } else {
        printf("PID %d exiting from prog2\n", getpid());
    }
}

QUESTION SH-3C. How many different PIDs will print out if you run Program 2B?

2, one for the parent and one for the child.

QUESTION SH-3D. How many lines of output will you see?

3: “PID xxx running prog2”, “PID yyy running prog1”, and “PID xxx exiting from prog2”.

Finally, consider this version of Program 2.

// Program 2C
#include <stdio.h>
#include <unistd.h>
int
main(void)
{
    char *argv[2];
    argv[0] = "prog1";
    argv[1] = NULL;

    printf("PID %d running prog2\n", getpid());
    pid_t p = fork();
    pid_t q = fork();
    if (p == 0 || q == 0) {
        int r = execv("./prog1", argv);
    } else {
        printf("PID %d exiting from prog2\n", getpid());
    }
}

QUESTION SH-3E. How many different PIDs will print out if you run Program 2C?

The initial ./prog2 prints its PID.
The ./prog2 will fork twice, creating a p-child and a q-child.
The p-child forks once more, creating a p/q-child.
All three children exec ./prog1, which prints their PIDs.

QUESTION SH-3F. How many lines of output will you see?

SH-4. Be a CS61 TF!

You are a CS61 teaching fellow. A student working on A4 is having difficulty getting pipes working. S/he comes to you for assistance. The function below is intended to traverse a linked list of commands, fork/exec the indicated processes, and hook up the pipes between commands correctly. The student has commented it reasonably, but is quite confused about how to finish writing the code. Can you help? Figure out what code to add at points A, B, and C.

#include "sh61.h"

typedef struct command command;
struct command {
    command *next; // Next in sequence of commands
    int argc;      // number of arguments
    int ispipe;    // pipe symbol follows this command
    char** argv;   // arguments, terminated by NULL
    pid_t pid;     // pid running this command
};

void
do_pipes(command *c)
{
    pid_t newpid;
    int havepipe = 0;   // We had a pipe on the previous command
    int lastpipe[2]= {-1, -1};
    int curpipe[2];

    do {
        if (c->ispipe)
            assert(pipe(curpipe) == 0);

        newpid = fork();
        switch (newpid) {
            case 0:
                if (havepipe) {
                    // There was a pipe on the last command; It's stored
                    // in lastpipe; I need to hook it up to this process???
                    // **** PART A ****
                }
                if (c->ispipe) {
                    // The current command is a pipe -- how do I hook it up???
                    // **** PART B ****
                }

                execvp(c->argv[0], c->argv);
                
                fprintf(stderr, "Exec failed\n");
                c->pid = -1;
                break;
            case -1:
                c->pid = newpid;
                break;
            default:
                // I bet there is some cleanup I have to do here!?
                // **** PART C ****

                // Set up for the next command
                havepipe = c->ispipe;
                if (c->ispipe) {
                    lastpipe[0] = curpipe[0];
                    lastpipe[1] = curpipe[1];
                }
                c->pid = newpid;
                c = c->next;
                break;
        }
    } while (newpid != -1 && havepipe);
}

QUESTION SH-4A. What should go in the Part A space above, in anything?

close(lastpipe[1]);
dup2(lastpipe[0], STDIN_FILENO);
close(lastpipe[0]);

QUESTION SH-4B. What should go in the Part B space above, in anything?

close(curpipe[0]);
dup2(curpipe[1], STDOUT_FILENO);
close(curpipe[1]);

QUESTION SH-4C. What should go in the Part C space above, in anything?

if (havepipe) {
    close(lastpipe[0]);
    close(lastpipe[1]);
}

NET-1. Networking

QUESTION NET-1A. Which of the following system calls should a programmer expect to sometimes block (i.e., to return after significant delay)? Circle all that apply.


1. `socket`		5. `connect`
2. `read`		6. `write`
3. `accept`		7. `usleep`
4. `listen`		8. None of these

#2 read, #3 accept, #5 connect, (#6 write), #7 usleep. (write can definitely block if the reading end hasn’t caught up, but we didn’t emphasize this in class, so no points off.)

QUESTION NET-1B. Below are seven message sequence diagrams demonstrating the operation of a client–server RPC protocol. A request such as “get(X)” means “fetch the value of the object named X”; the response contains that value. Match each network property or programming strategy below with the diagram with which it best corresponds. You will use every diagram once.


1. Loss	4. Duplication	7. Exponential backoff
2. Delay	5. Batching
3. Reordering	6. Prefetching

#1—B, #2—C, #3—F, #4—D, #5—G, #6—A, #7—E (A—#6, B—#1, C—#2, D—#4, E—#7, F—#3, G—#5)

While G could also represent prefetching, A definitely does not represent batching at the RPC level—each RPC contains one request—so under the rule that each diagram is used once, we must say G is batching and A is prefetching.

Finalfig2012_1.gif	Finalfig2012_2.gif	Finalfig2012_3.gif	Finalfig2012_4.gif
A	B	C	D

Finalfig2012_5.gif	Finalfig2012_6.gif	Finalfig2012_7.gif
E	F	G

NET-2. Making Network Servers Robust

QUESTION NET-2A. You've built a network server, list the resources that you might run out of if someone launched a DoS attack on you.

At least: file descriptors, memory (stack), processes. There’re a lot of correct answers, though! You can run out of virtual memory or even physical memory.

QUESTION NET-2B. Sam suggests that you just create a separate thread to handle each incoming connection. Why isn't this necessarily going to work?

Each thread requires a stack and it's easy to run out of space for those stacks. Threads also occupy other resources—in Linux, each thread even has a PID!

QUESTION NET-2C. A server sets up a socket to listen on a connection. When a client wants to establish a connection, how does the server manage the multiple clients? In your answer indicate what system call or calls are used and what they do.

You call accept, which creates a new fd that is particular to the connection with a particular client. That is, the server uses a different fd for each client.

QUESTION NET-2D. Which of the following system calls might block?

accept
bind
connect
listen
setsockopt
select
socket

accept, connect, select

SYNCH-1. Threads

The following code performs a matrix multiplication, c = ab, where a, b, and c are all square matrices of dimension sz. It uses the cache-friendly ikj index ordering.

#define MELT(matrix, sz, row, col) (matrix)[(row)*(sz) + (col)]

void matrix_multiply(double* c, const double* a, const double* b, size_t sz) {
    for (size_t i = 0; i < sz; ++i)
        for (size_t j = 0; j < sz; ++j)
            MELT(c, sz, i, j) = 0;
    for (size_t i = 0; i < sz; ++i)
        for (size_t k = 0; k < sz; ++k)
            for (size_t j = 0; j < sz; ++j)
                MELT(c, sz, i, j) += MELT(a, sz, i, k) * MELT(b, sz, k, j);
}

But matrix multiplication is a naturally parallelizable problem. Here’s some code that uses threads to multiply even faster on a multicore machine. We use sz parallel threads, one per row of c.

     typedef struct matrix_args {
         double* c;
         const double* a;
         const double* b;
         size_t sz;
         size_t i;
     } matrix_args;
 
     void* matrix_multiply_ikj_thread(void* arg) {
 (α)     matrix_args* m = (matrix_args*) arg;
 (β)     for (size_t j = 0; j < m->sz; ++j)
 (γ)         MELT(m->c, m->sz, m->i, j) = 0;
 (δ)     for (size_t k = 0; k < m->sz; ++k)
 (ε)         for (size_t j = 0; j < m->sz; ++j)
 (ζ)             MELT(m->c, m->sz, m->i, j) += MELT(m->a, m->sz, m->i, k) * MELT(m->b, m->sz, k, j);
 (η)     return NULL;
      }
 
     void matrix_multiply_ikj(double* c, const double* a, const double* b, size_t sz) {
 (1)     pthread_t* threads = (pthread_t*) malloc(sizeof(pthread_t) * sz);
 (2)     for (size_t i = 0; i < sz; ++i) {
 (3)         matrix_args m = { c, a, b, sz, i };
 (4)         int r = pthread_create(&threads[i], NULL, &matrix_multiply_ikj_thread, &m);
 (5)         assert(r == 0);
 (6)     }
 (7)     for (size_t i = 0; i < sz; ++i)
 (8)         pthread_join(threads[i], NULL);
 (9)     free(threads);
     }

But when run, this code gives wildly incorrect results.

QUESTION SYNCH-1A. What is wrong? Describe why the problem is a synchronization issue.

The matrix_multiply_ikj function starts many threads, each with its own logically different set of matrix_args. But matrix_multiply_ikj allocates these matrix_args structures on the stack! The m variable is initialized on each loop iteration, but then the variable’s stack space is immediately reused for the next loop iteration. It is very likely that during a thread’s execution, its matrix_args have been replaced by other arguments. This is a synchronization issue because the code would work correctly if matrix_multiply_ikj_thread was called serially, rather than concurrently. The matrix_multiply_ikj_thread function must synchronize with matrix_multiply_ikj’s reuse of m.

QUESTION SYNCH-1B. Write C code showing how the problem could be fixed with changes only to matrix_multiply_ikj. Refer to the numbered lines to indicate replacements and/or insertions. Use one or more additional heap allocations and no additional calls to pthread functions. Free any memory you allocate once it is safe to do so.

There are many solutions, but here’s one: we place each thread’s arguments in different, heap-allocated memory. This solves the synchronization issue by eliminating the memory reuse. New and changed lines are marked with ***.

     void matrix_multiply_ikj(double *c, const double *a, const double *b, size_t sz) {
 (1)     pthread_t *threads = (pthread_t *) malloc(sizeof(pthread_t) * sz);
   *     matrix_args *marr = (matrix_args *) malloc(sizeof(matrix_args) * sz);         
 (2)     for (size_t i = 0; i < sz; ++i) {
 (3)         matrix_args m = { c, a, b, sz, i };
   *         marr[i] = m;
   *         int r = pthread_create(&threads[i], NULL, &matrix_multiply_ikj_thread,
   *                                &marr[i]);
 (5)         assert(r == 0);
 (6)     }
 (7)     for (size_t i = 0; i < sz; ++i)
 (8)         pthread_join(threads[i], NULL);
 (9)     free(threads);
   *     free(marr);
     }

On single-core machines, the kij order performs almost as fast as the ikj order. Here’s a version of the parallel matrix multiplication code that uses kij.

     typedef struct matrix_args_kij {
         double* c;
         const double* a;
         const double* b;
         size_t sz;
         size_t k;
     } matrix_args_kij;
 
     void* matrix_multiply_kij_thread(void* arg) {
 (α)     matrix_args_kij* m = (matrix_args_kij*) arg;
 (β)     for (size_t i = 0; i < m->sz; ++i)
 (γ)         for (size_t j = 0; j < m->sz; ++j)
 (δ)             MELT(m->c, m->sz, i, j) += MELT(m->a, m->sz, i, m->k) * MELT(m->b, m->sz, m->k, j);
 (ε)     return NULL;
      }
 
     void matrix_multiply_kij(double* c, const double* a, const double* b, size_t sz) {
 (1)     pthread_t* threads = (pthread_t*) malloc(sizeof(pthread_t) * sz);
 (2)     for (size_t i = 0; i < sz; ++i)
 (3)         for (size_t j = 0; j < sz; ++j)
 (4)             MELT(c, sz, i, j) = 0;
 (5)     for (size_t k = 0; k < sz; ++k) {
 (6)         matrix_args_kij m = { c, a, b, sz, k };
 (7)         int r = pthread_create(&threads[k], NULL, &matrix_multiply_kij_thread, &m);
 (8)         assert(r == 0);
 (9)     }
(10)     for (size_t k = 0; k < sz; ++k)
(11)         pthread_join(threads[k], NULL);
(12)     free(threads);
     }

This problem has the same problem as the previous version, plus another problem. Even after your fix from 8A–8B is applied, this version produces incorrect results.

QUESTION SYNCH-1C. What is the new problem? Describe why it is a synchronization issue.

The new problem is that now, different matrix_multiply_kij_thread functions might try to modify the same matrix element at the same time. This is a synchronization issue because the concurrent access to a matrix element might cause some updates to get lost. In the previous version, each matrix_multiply_ikj_thread thread worked on a different matrix row, so no synchronization was required.

QUESTION SYNCH-1D. Write pseudocode or C code that fixes this problem. You should refer to pthread functions. For full credit your solution should have low contention.

The simplest solutions introduce mutual exclusion locking. This means different threads can’t modify the same matrix element simultaneously. To reduce contention, the locking should be fine-grained—for instance, there could be one lock per matrix element. But other tradeoffs are possible; one lock per matrix element is a lot; you could instead have one lock per matrix row or column.

Here’s working code, including the fix for the matrix_args reuse problem. (We didn’t require working code.)

     typedef struct matrix_args_kij {
         double *c;
         const double *a;
         const double *b;
  *      pthread_mutex_t *locks;
         size_t sz;
         size_t k;
     } matrix_args;
 
     void *matrix_multiply_kij_thread(void *arg) {
         matrix_args_kij *m = (matrix_args_kij *) arg;
         for (size_t i = 0; i < m->sz; ++i)
             for (size_t j = 0; j < m->sz; ++j) {
  *              pthread_mutex_lock(&m->locks[i * m->sz + j]);
                 MELT(m->c, m->sz, i, j) += MELT(m->a, m->sz, i, m->k) * MELT(m->b, m->sz, m->k, j);
  *              pthread_mutex_unlock(&m->locks[i * m->sz + j]);
             }
         return NULL;
      }
 
     void matrix_multiply_kij(double *c, const double *a, const double *b, size_t sz) {
         pthread_t *threads = (pthread_t *) malloc(sizeof(pthread_t) * sz);
  *      matrix_args_kij *marr = (matrix_args_kij *) malloc(sizeof(matrix_args_kij) * sz);
  *      pthread_mutex_t *locks = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t) * sz * sz);
         for (size_t i = 0; i < sz; ++i)
             for (size_t j = 0; j < sz; ++j) {
                 MELT(c, sz, i, j) = 0;
  *              pthread_mutex_init(&locks[i * sz + j], NULL);
             }
         for (size_t k = 0; k < sz; ++k) {
  *          matrix_args_kij m = { c, a, b, locks, sz, k };
  *          marr[k] = m;
  *          int r = pthread_create(&threads[k], NULL, &matrix_multiply_kij_thread,
                                    &marr[k]);
             assert(r == 0);
         }
         for (size_t k = 0; k < sz; ++k)
             pthread_join(threads[k], NULL);
         free(threads);
  *      free(marr);
  *      free(locks);
     }

SYNCH-2. Synchronization and concurrency

Most synchronization objects have at least two operations. Mutual-exclusion locks support lock and unlock; condition variables support wait and signal; and from section notes you may remember the semaphore synchronization object, one of the earliest synchronization objects ever invented, which supports P and V.

In this problem, you’ll work with a synchronization object with only one operation, which we call a hemiphore. Hemiphores behave like the following; it is very important that you understand this pseudocode.

typedef struct hemiphore {
    int value;
} hemiphore;

// Initialize the hemiphore to value 0.
void hemiphore_init(hemiphore* h) {
    h->value = 0;
}

` // Block until the hemiphore has value >= `bound`, then  ``**`atomically`**``  increment its value by `delta`. `
void H(hemiphore* h, int bound, int delta) {
    // This is pseudocode; a real hemiphore implementation would block, not spin, and would
    // ensure that the test and the increment happen in one atomic step.
    while (h->value < bound)
        sched_yield();
    h->value += delta;
}

Once a hemiphore is initialized with hemiphore_init, application code should access the hemiphore only through the H operation.

QUESTION SYNCH-2A. Use hemiphores to implement mutual-exclusion locks. Fill out the code below. (You may not need to fill in every empty slot. You may use standard C constants; for example, INT_MIN is the smallest possible value for a variable of type int, which on a 32-bit machine is −2147483648.)

typedef struct mutex {
    hemiphore h;
} mutex;

void mutex_init(mutex* m) {
    hemiphore_init(&m->h);
}

void mutex_lock(mutex* m) {

H(&m->h, 0, -1);

void mutex_unlock(mutex* m) {

H(&m->h, -1, 1);

QUESTION SYNCH-2B. Use hemiphores to implement condition variables. Fill out the code below. You may assume that the implementation of mutex is your hemiphore-based implementation from above (so, for instance, cond_wait may access the hemiphore m->h). See the Hints at the end of the question.

typedef struct condvar {
    mutex m;
    hemiphore h;

int nwaiting;

} condvar;

void cond_init(condvar* c) {
    mutex_init(&c->m);
    hemiphore_init(&c->h);

c->nwaiting = 0;

void cond_signal(condvar* c) {

mutex_lock(&c->m); if (c->nwaiting > 0) { H(&c->h, INT_MIN, 1); --c->nwaiting; } mutex_unlock(&c->m);

void cond_wait(condvar* c, mutex* m) {

mutex_lock(&c->m); ++c->nwaiting; mutex_unlock(&c->m); mutex_unlock(m); H(&c->h, 0, -1); mutex_lock(m);

The nwaiting variable ensures that cond_signal(c) does nothing if no one is waiting. The mutex_unlock(m) must happen before H (which can block); it must happen after mutex_lock(&c->m) (to avoid sleep-wakeup races).

The following code is broken if no one is waiting when signal is called, because it “stores” the signal for later. It works otherwise, though—it even avoids sleep-wakeup races.

typedef struct condvar {
    hemiphore h;
} condvar;

void cond_init(condvar* c) {
    hemiphore_init(&c->h);
}

void cond_signal(condvar* c) {

H(&c->h, INT_MIN, 1);

void cond_wait(condvar* c, mutex* m) {

mutex_unlock(m); H(&c->h, 0, -1); mutex_lock(m);

Hints. For full credit:

If no thread is waiting on condition variable c, then cond_signal(c) should do nothing.
Assume N threads are waiting on condition variable c. Then N calls to cond_signal(c) are both necessary and sufficient to wake them all up.
Your solution must not add new sleep-wakeup race conditions to the user’s code. (That is, no sleep-wakeup race conditions unless the user uses mutexes incorrectly.)

QUESTION SYNCH-2C. Use pthread mutexes and condition variables to implement hemiphores. Fill out the code below. See the hints after the question.

typedef struct hemiphore {
    pthread_mutex_t m;
    int value;
    pthread_cond_t c;
} hemiphore;

void hemiphore_init(hemiphore* h) {
    pthread_mutex_init(&h->m);
    h->value = 0;
    pthread_cond_init(&h->c);
}

void H(hemiphore* h, int bound, int delta) {

pthread_mutex_lock(&h->m); while (h->value < bound) pthread_cond_wait(&h->c, &h->m); h->value += delta; pthread_cond_broadcast(&h->c); pthread_mutex_unlock(&h->m);

The pthread_mutex_locks protect h->value from access conflicts. It is not correct to simply pthread_cond_signal(&h->c), since different waiters might be waiting for different bounds (-1). You don’t need to broadcast/signal each time; if delta <= 0 there’s no point.

Hints. The pthread mutex and condition variable operations have the following signatures. You should pass NULL for any attributes arguments. Don’t worry about the pthread_mutex_destroy and pthread_cond_destroy operations, and feel free to abbreviate (e.g. “lock” instead of “pthread_mutex_lock”).

pthread_mutex_init(pthread_mutex_t* m, const pthread_mutexattr_t* attributes)
pthread_mutex_lock(pthread_mutex_t* m)
pthread_mutex_unlock(pthread_mutex_t* m)
pthread_cond_init(pthread_cond_t* c, const pthread_condattr_t* attributes)
pthread_cond_signal(pthread_cond_t* c) (wakes up at most one thread waiting on c)
pthread_cond_broadcast(pthread_cond_t* c) (wakes up all threads waiting on c)
pthread_cond_wait(pthread_cond_t* c, pthread_mutex_t* m)

QUESTION SYNCH-2D. Consider the following two threads, which use a shared hemiphore h with initial value 0.

Thread 1 Thread 2

H(&h, 1000, 1);               while (1) {
printf("Thread 1 done\n");        H(&h, 0, 1);
                                  H(&h, 0, -1);
                              }

Thread 2 will never block, and the hemiphore’s value will alternate between 1 and 0. Thread 1 will never reach the printf, because the hemiphore’s value never reaches 1000. However, in most people’s first implementation of hemiphores using pthread mutexes and condition variables, Thread 1 will not block. Every call to H in Thread 2 will effectively wake up Thread 1. Though Thread 1 will then check the hemiphore’s value and immediately go back to sleep, doing so wastes CPU time.

Design an implementation of hemiphores using pthread mutexes and condition variables that solves this problem. In your revised implementation, Thread 1 above should block forever. For full credit, write C code. For partial credit, write pseudocode or English describing your design.

Hint. One working implementation constructs a linked list of “waiter” objects, where each waiter object is on a different thread’s stack, as initially sketched below. You can use such objects or not as you please.

This is a tough one.

typedef struct hemiphore_waiter {
    struct hemiphore_waiter* next;

int bound; pthread_cond_t c;

} hemiphore_waiter;

typedef struct hemiphore {
    pthread_mutex_t m;
    int value;
    hemiphore_waiter* waiters;
} hemiphore;

void hemiphore_init(hemiphore* h) {
    pthread_mutex_init(&h->m);
    h->value = 0;
    h->waiters = NULL;
}

void H(hemiphore* h, int bound, int delta) {
    hemiphore_waiter w;

w.bound = bound;

    `**`pthread_cond_init(&w.c);`**`                // no points off if missing

pthread_mutex_lock(&h->m); while (h->value < bound) { w.next = h->waiters; h->waiters = &w; pthread_cond_wait(&w.c, &h->m); } h->value += delta;

    // no points off for linked list messups

for (hemiphore_waiter** ww = &h->waiters; *ww; ) if (h->value >= (*ww)->bound) { pthread_cond_signal(&(*ww)->c); *ww = (*ww)->next; } else ww = &(*ww)->next; pthread_mutex_unlock(&h->m);

Here’s a substantial-partial-credit solution that tracks the lowest bound anyone cares about.

typedef struct hemiphore {
    pthread_mutex_t m;
    int value;

int max_bound; pthread_cond_t c;

} hemiphore;

void hemiphore_init(hemiphore* h) {
    pthread_mutex_init(&h->m);
    h->value = 0;

h->max_bound = INT_MIN; pthread_cond_init(&h->c);

void H(hemiphore* h, int bound, int delta) {

pthread_mutex_lock(&h->m); while (h->value < bound) { if (h->max_bound < bound) h->max_bound = bound; pthread_cond_wait(&h->c, &h->m); } h->value += delta; if (h->value >= h->max_bound) { h->max_bound = INT_MIN; pthread_cond_broadcast(&h->c); } pthread_mutex_unlock(&h->m);

SYNCH-3. Pipes and synchronization

In the following questions, you will implement a mutex using a pipe, and a limited type of pipe using a mutex.

The definitions of the pthread mutex and condition variable operations are as follows.

int pthread_mutex_init(pthread_mutex_t* m, const pthread_mutexattr_t* attr) Create a new mutex with attributes defined by attr. (For this question, attr is ignored.)

int pthread_mutex_lock(pthread_mutex_t* m) Locks m. If the mutex is already locked, the calling thread will block until the mutex becomes available.

int pthread_mutex_unlock(pthread_mutex_t* m) Unlocks m. Calling pthread_mutex_unlock with a mutex that the calling thread does not hold will result in undefined behavior.

int pthread_cond_init(pthread_cond_t* c, const pthread_condattr_t* attr) Create a new condition variable with attributes defined by attr. (For this question, attr is ignored.)

int pthread_cond_signal(pthread_cond_t* c) Unblocks one thread waiting for c.

int pthread_cond_wait(pthread_cond_t* c, pthread_mutex_t* m) Atomically unlocks m and blocks the calling thread on the condition c. When the condition is signaled, the thread locks m and returns. Calling pthread_cond_wait with an unlocked mutex will result in undefined behavior.

The operations return 0 on success. Although errors are possible (for instance, ENOMEM if there’s not enough memory to allocate a new mutex) you may assume that they don’t occur.

QUESTION SYNCH-3A. In this question, you are to implement mutex functionality using a pipe. Fill in the definitions of pipe_mutex_init, pipe_mutex_lock, and pipe_mutex_unlock. You should be able to implement the same functionality as the pthread versions (assuming no other code accesses the pipe).

typedef struct pipe_mutex {
    int fd[2];
} pipe_mutex;

int pipe_mutex_init(pipe_mutex* m) {     
    if (pipe(&m->fd) < 0)
        return -1;

    write(m->fd[1], "X", 1);

    return 0;
}

int pipe_mutex_lock(pipe_mutex* m) {

    char buf;
    read(m->fd[0], &buf, 1);

A while loop would be in some ways even better, but you were allowed to assume no rando error returns.

int pipe_mutex_unlock(pipe_mutex* m) {

    write(m->fd[1], "X", 1);

In the next questions, you will help implement pipe functionality using an in-memory buffer and a mutex. This “mutex pipe” will only work between threads of the same process (in contrast to a regular pipe, which also works between processes). An initial implementation of mutex pipes is as follows; you will note that it contains no mutexes.

    typedef struct mutex_pipe {

1. char buf[BUFSIZ]; 2. size_t head; 3. size_t sz;

    } mutex_pipe;

    int mutex_pipe_init(mutex_pipe* p) {

6. p->head = p->sz = 0; 7. memset(&p->buf[0], 0, sizeof(p->buf)); 8. return 0;

`     // Read up to `sz` bytes from the mutex_pipe into `buf` and return the number of bytes `
    // read. If no bytes are available, wait until at least one byte can be read.
    ssize_t mutex_pipe_read(mutex_pipe* p, char* buf, size_t sz) {

10. size_t n = 0; 11. while (n < sz && (p->sz != 0 || n == 0)) { 12. size_t ncopy = p->sz; 13. if (ncopy > sizeof(p->buf) - p->head) 14. ncopy = sizeof(p->buf) - p->head; 15. if (ncopy > sz - n) 16. ncopy = sz - n; 17. memcpy(&buf[n], &p->buf[p->head], ncopy); 18. n += ncopy; 19. p->head += ncopy; 20. p->head = p->head % sizeof(p->buf); 21. p->sz -= ncopy; 22. } 23. return n;

`     // Write up to `sz` bytes from `buf` into the mutex_pipe and return the number of bytes `
    // written. If no space is available, wait until at least one byte can be written.
    ssize_t mutex_pipe_write(mutex_pipe* p, const char* buf, size_t sz) {

30. size_t n = 0; 31. while (n < sz && (p->sz != sizeof(p->buf) || n == 0)) { 32. size_t tail = p->head + p->sz; 33. tail = tail % sizeof(p->buf); 34. size_t ncopy = sizeof(p->buf) - p->sz; 35. if (ncopy > sizeof(p->buf) - tail) 36. ncopy = sizeof(p->buf) - tail; 37. if (ncopy > sz - n) 38. ncopy = sz - n; 39. memcpy(&p->buf[tail], &buf[n], ncopy); 40. n += ncopy; 41. p->sz += ncopy; 42. } 43. return n;

The last page of this exam has a copy of that code that you can remove and keep.

NOT A QUESTION. It would be wise to work through an example. For example, assume BUFSIZ == 4, and figure out how the following calls would behave.

mutex_pipe_write(p, "Hi", 2);
mutex_pipe_read(p, buf, 4);
mutex_pipe_write(p, "Test", 4);
mutex_pipe_read(p, buf, 3);

First let’s reason about this code in the absence of threads.

QUESTION SYNCH-3B. Which of the following changes could, if made in isolation, result in undefined behavior when a mutex pipe was used? Circle all that apply.

Eliminating line 6
Eliminating line 7
Eliminating lines 13–14
Eliminating lines 15–16
Eliminating line 18
Eliminating line 19

6, 13–14, and 15–16.

6: Accesses to uninitialized variables cause undefined behavior. 13–14: Could cause accesses off the end of p->buf. 15–16: Could cause accesses off the end of buf.

7: If this is the only change, no problem; the existing code never accesses bytes that were not written by mutex_pipe_write. 18: This causes mutex_pipe_read to spin forever (since n is not increased), but that’s not undefined. 19: This causes mutex_pipe_read to read the same data over and over again (since p->head never advances), but that’s not undefined.

QUESTION SYNCH-3C. Which of the following changes could, if made in isolation, cause a mutex_pipe_read to return incorrect data (that is, the byte sequence produced by read will not equal the byte sequence passed to write)? Circle all that apply.

Eliminating line 33
Eliminating lines 35–36
Eliminating lines 37–38
Eliminating line 39
Eliminating line 40
Eliminating line 41

35–36, 37–38, and 39.

35–36: Copies some of the written data past the end of the buffer. Not only does this cause undefined behavior, the data’s lost to the reader. 37–38: Copies some unwritten data (data past the end of the write buffer) into the pipe. 39: Take this away and nothing gets written to the buffer! But the size still grows so the reader thinks there’s data there. 33: This is a tricky one. Here’s what happens: No mutex_pipe_write can write data that “wraps around” the buffer. Assume BUFSIZ == 4, p->head == 3, p->sz == 1, and mutex_pipe_write(p, "X", 1) is called. Basically tail is set to 4, and lines 35–36 will set tail = 0! So then mutex_pipe_write will spin forever; but in the meantime mutex_pipe_read will not read anything incorrect. 40: n never advances, so the same portion of the data (the first portion) is written over and over again into the pipe until it fills up. This leaves the pipe in a bad state—containing data that was never written in that order—but then mutex_pipe_write spins forever, so the reader can never read it. 41: Doesn’t advance sz: the buffer always appears empty, so the reader never observes incorrect data.

It should be considered OK to select #40.

QUESTION SYNCH-3D. Which of the following changes could, if made in isolation, cause a call to mutex_pipe_write to never return (when a correct implementation would return)? Circle all that apply.

Eliminating line 33
Eliminating lines 35–36
Eliminating lines 37–38
Eliminating line 39
Eliminating line 40
Eliminating line 41

33, 35–36, 37–38, and 40.

33 and 40: covered above. 35–36 and 37–38: undefined behavior can cause anything to happen! Maybe no points off for this, though.

QUESTION SYNCH-3E. Write an invariant for p->sz. An invariant is a statement about the value of p->sz that is always true. Write your invariant in the form of an assertion; for full credit give the most specific true invariant you can. (“p->sz is an integer” is unspecific, but true; “p->sz == 4” is specific, but false.)

assert(  p->sz >= 0 && p->sz <= BUFSIZ  );

But in fact p->sz is a size_t, so p->sz >= 0 is a tautology and assert(p->sz <= BUFSIZ) works too.

QUESTION SYNCH-3F. Write an invariant for p->head. For full credit give the most specific true invariant you can.

assert(  p->head >= 0 && p->head < BUFSIZ  );

Again, assert(p->head < BUFSIZ) is equivalent.

In the remaining questions, you will add synchronization objects and operations to make your mutex pipe work in a multithreaded program. Here is your starting point:

    typedef struct mutex_pipe {

1. char buf[BUFSIZ]; 2. size_t head; 3. size_t sz; 4. pthread_mutex_t m;

    } mutex_pipe;

    int mutex_pipe_init(mutex_pipe* p) {

5. pthread_mutex_init(&p->m, NULL); 6. p->head = p->sz = 0; 7. memset(&p->buf[0], 0, sizeof(p->buf)); 8. return 0;

    (the rest of the code as in the prior questions)

QUESTION SYNCH-3G. Add calls to “lock” (pthread_mutex_lock) and “unlock” (pthread_mutex_unlock) that protect the mutex pipe from race condition bugs. Write one or more snippets of C code and give line numbers after which the snippets should appear. For full credit, your solution must not deadlock—if one thread is reading from a pipe and another thread is writing to the pipe, then both threads must eventually make progress.

Add pthread_mutex_lock(&p->m); after lines: 10, 30
Add pthread_mutex_unlock(&p->m); after lines: 22, 42
Other changes (if any):

After #17 & #39 (or anywhere between #17-#21 and #39-#41):

if (ncopy == 0) {
    pthread_mutex_unlock(&p->m);   // or just "unlock"
    sched_yield();   // optional
    pthread_mutex_lock(&p->m);     // or just "lock"
}

Some people might put a sched_yield() in: nice!

Alternately, you could add the following code after #21 & #41:

if (n == 0) {
    pthread_mutex_unlock(&p->m);   // or just "unlock"
    sched_yield();   // optional
    pthread_mutex_lock(&p->m);     // or just "lock"
}

But this n == 0 test doesn’t work if placed immediately after #17 or #39. It is important that the pipe be in a consistent state before the mutex is released, meaning that all data copied to/from the pipe must be reflected in the pipe data structure. For instance, if we unlocked in read before updating p-\>head or p-\>sz, there’s a risk that some pipe data would be read twice. It’s safe to unlock immediately when ncopy == 0 because in that case the pipe data structure remains unchanged.

QUESTION SYNCH-3H. Your solution to the last question has poor utilization. For instance, a thread that calls mutex_pipe_read on an empty mutex pipe will spin forever, rather than block. Introduce one or more condition variables so that mutex_pipe_read will block until data is available. Write one or more snippets of C code and give line numbers after which the snippets should appear.

Add to struct mutex_pipe: pthread_cond_t c;
Add to mutex_pipe_init after line 7: pthread_cond_init(&c, NULL);
Other changes:

After #17, instead of the code above:

if (ncopy == 0)
    pthread_cond_wait(&p->c, &p->m);

After #39:

if (n != 0)
    pthread_cond_signal(&p->c);

(This can go anywhere after n is calculated.)

SYNCH-4. Race conditions

Most operating systems support process priority levels, where the kernel runs higher-priority processes more frequently than lower-priority processes. A hypothetical Unix-like operating system called “Boonix” has two priority levels, normal and batch. A Boonix parent process changes the priority level of one of its children with this system call:

int setbatch(pid_t p) Sets process p to have batch priority. All future children of p will also have batch priority. Returns 0 on success, –1 on error. Errors include ESRCH, if p is not a child of the calling process.

Note that a process cannot change its own batch status.

You’re writing a Boonix shell that can run commands with batch priority. If c->isbatch is nonzero, then c should run with batch priority, as should its children. Your start_command function looks like this:

    pid_t start_command(command* c) {

1. c->pid = fork(); 2. if (c->pid == 0) { 3. handle_pipes(c); 4. handle_redirections(c); 5. (void) execvp(c->argv[0], c->argv); 6. // if we get here, execvp failed 7. perror("execvp"); 8. exit(1); 9. } 10. assert(c->pid > 0); 11. if (c->isbatch) 12. setbatch(c->pid); 13. return c->pid;

This shell has two race conditions, one more serious.

QUESTION SYNCH-4A. In some cases, c will change to batch priority after it starts running. Draw a dependency diagram demonstrating this race condition, or briefly describe it.

This happens if the child manages to execvp before the parent calls setbatch.

QUESTION SYNCH-4B. In some cases, c or one of its children could run forever with normal priority. Draw a dependency diagram demonstrating this race condition, or briefly describe it.

This happens if the child execvps, and then forks another child, before the parent calls setbatch. The grandchild will stick at normal priority.

In the remaining questions, you will fix these race conditions in three different ways. The first uses a new system call:

int isbatch() Returns 1 if the calling process has batch priority, 0 if it has normal priority.

QUESTION SYNCH-4C. Use isbatch to prevent both race conditions. Write a snippet of C code and give the line number after which it should appear. You should need one code snippet.

After #2:

while (c->isbatch && !isbatch())
    /* spin */;

QUESTION SYNCH-4D. Use the pipe system call and friends to prevent both race conditions. Write snippets of C code and give the line numbers after which they should appear. You should need several snippets. Make sure you clean up any extraneous file descriptors before running the command or returning from start_command.

A lot of different ways to do this. Here we create the pipe always but use it only if c->isbatch.

Before #1:

int pipefd[2];
pipe(pipefd);

After #3:

if (c->isbatch) {
    char buf;
    read(pipefd[0], &buf, 1);
}
close(pipefd[0]);
close(pipefd[1]);

After #12:

write(pipefd[1], "X", 1);
close(pipefd[0]);
close(pipefd[1]);

This alternate solution relies on end-of-file.

Before #1:

int pipefd[2];
pipe(pipefd);

After #3:

close(pipefd[1]);
if (c->isbatch)
    read(pipefd[0], &pipefd, 1);  // won’t ever read any chars, so this is safe
close(pipefd[0]);

After #12:

close(pipefd[0]);
close(pipefd[1]);

QUESTION SYNCH-4E. Why should the pipe solution be preferred to the isbatch solution? A sentence, or the right single word, will suffice.

Utilization. The pipe will block; the setbatch polls.

QUESTION SYNCH-4F. Suggest a change to the setbatch system call’s behavior that could fix both race conditions, and say how to use this new setbatch in start_command. Write one or more snippets of C code and give the line numbers after which they should appear.

Simple: Allow a process to set its own batchness. Then get rid of the call in the parent. In the child, after #2:

if (c->isbatch)
    setbatch(getpid());

MISC-1. Debugging

In the following short-answer questions, you have access to five debugging tools: top, strace, gdb, valgrind, and man. You can’t change program source code or use other tools. Answer the questions briefly (a couple sentences at most).

QUESTION MISC-1A. You are given a program that appears to “get stuck” when run. How would you distinguish whether the program blocked forever (e.g., made a system call that never returned) or entered an infinite loop?

You can use top: does it report the process is using 100% of the CPU?

You can use strace: is the last thing in the strace an incomplete system call?

QUESTION MISC-1B. You are given a program that uses a lot of memory. How would you tell whether the program leaks memory?

Use valgrind and check if it reports any memory leaks.

QUESTION MISC-1C. You are given a program that produces weird answers. How would you check if it invoked undefined behavior?

Use valgrind and check if it reports undefined behavior. GDB is also acceptable here.

QUESTION MISC-1D. You are given a program that blocks forever. How would you tell where the program blocked (which function called the blocking system call)?

Run it under gdb. When it blocks, hit Ctrl-C and then enter backtrace/bt to get a backtrace.

QUESTION MISC-1E. You are given a program that takes a long time to produce a result. How would you tell whether the program was using system calls unintelligently?

Run it under strace and look for stupidity, such as many system calls that report errors, many system calls that are redundant, lots of reads that return short counts, etc.

QUESTION MISC-1F. You are given a program that exits with a system call error, but doesn’t explain what happened in detail. How would you find what error condition occurred and understand the conditions that could cause that error?

Run it under strace to find the error condition: look for a system call that returned the error. Then use man on that system call and read about the error (the errno description).

MISC-2. Miscellany

QUESTION MISC-2A. True or false in conventional Unix systems?

File descriptors are often used to communicate among processes on the same machine.

True
File descriptors are often used to communicate among processes on different machines.

True
File descriptors are often used to communicate with persistent storage.

True
File descriptors are often used to access primary memory.

False
File descriptors are often used to create child processes.

False

QUESTION MISC-2B. Match the process isolation feature on the left with the hardware feature that helps enforce it on the right. Use each hardware feature once (make the best match you can).

Protected control transfer (processes can transfer control to the kernel only at defined entry points)
Memory protection (one process cannot modify another process’s memory)
Interrupt protection (only the kernel can disable interrupts)
CPU protection (the kernel always regains control of the CPU eventually)

Traps
Privileged mode (dangerous instructions fault unless the CPU is in privileged mode)
Timer interrupts
Page tables

1—a, 2—d, 3—b, 4—c

The remaining questions refer to the following lines of code.

1. close(fd); 2. connect(fd, sockaddr, socklen); 3. listen(fd); 4. mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd, 0); 5. read(fd, buf, 4096); 6. write(fd, buf, 4096);

QUESTION MISC-2C. If a program executes the following line without error, which lines could be executed next without error? List all numbers that apply.

fd = open("/home/cs61user/cs61-psets/pset6/pong61.c", O_RDWR);

1, 4, 5, 6

QUESTION MISC-2D. If a program executes the following line without error, which lines could be executed next without error? List all numbers that apply.

fd = socket(AF_INET, SOCK_STREAM, 0);

1, 2, 3

QUESTION MISC-2E. If a program executes the following lines without error, which lines could be executed next without error? List all numbers that apply.

pipe(pipefd); fd = pipefd[0];

1, 5

MISC-3. More Miscellany

QUESTION MISC-3A. True or false: Any C arithmetic operation has a well-defined result.

False

QUESTION MISC-3B. True or false: Any x86 processor instruction has a well-defined result.

True

QUESTION MISC-3C. True or false: By executing a trap instruction, a process can force an operating system kernel to execute arbitrary code.

False

QUESTION MISC-3D. True or false: By manipulating process memory and registers, an operating system kernel can force a process to execute arbitrary instructions.

True

QUESTION MISC-3E. True or false: All signals are sent explicitly via the kill() system call.

False

QUESTION MISC-3F. True or false: An operating system’s buffer cache is generally fully associative.

True

QUESTION MISC-3G. True or false: The least-recently-used eviction policy is more useful for very large files that are read sequentially than it is for stacks.

False

QUESTION MISC-3H. True or false: Making a cache bigger can lower its hit rate for a given workload.

True

QUESTION MISC-3I. True or false: x86 processor caches are coherent (i.e., always appear to contain the most up-to-date values).

True

QUESTION MISC-3J. True or false: A socket file descriptor supports either reading or writing, but not both.

False; it supports both

MISC-4. Pot Pourri

Parts A-D pertain to the data structures and hexdump output shown here.

struct x {
    unsigned long ul;
    unsigned short us;
    unsigned char uc;
} *sp;

// Hexdump output of some program running on the appliance
08c1b008  e9 11 cf d0 0d d0 3f f3  63 61 74 00 0d f0 fe ca  |......?.cat.....|
08c1b018  5e ea 15 0d de c0 ad de                           |^.......|

You are told that sp = 0x08c1b008.

QUESTION MISC-4A. What is the value (in hex) of sp->ul?

0xd0cf11e9

QUESTION MISC-4B. What is the value (in hex) of sp->uc?

0x3f

QUESTION MISC-4C. At what address will you find the string "cat"?

0x08c1b010

QUESTION MISC-4D. You think that the bytes after the string "cat" comprise an array of 3 integers; what is the value (in hex) of the middle one of those?

0x0d15ea5e

QUESTION MISC-4E. What is the following binary value expressed in hexadecimal: 01011010?

0x5a

QUESTION MISC-4F. What is the value of the hex number 0x7FF in decimal?

255 + 7*256 == 8*256 − 1 == 2*4*256 − 1 == 2*1024 − 1 == 2047

QUESTION MISC-4G. Is 0x98765432 a valid return from malloc?

No, because it isn’t aligned properly—malloc will always return a pointer whose alignment could work for any basic type, which on x86-64, means the last digit must be either 0 or 8 (and most x86-64 mallocs actually align their data to 16-byte boundaries!)

QUESTION MISC-4H. What is the minimum number of x86 instruction bytes you need to write an infinite loop?

Two bytes: 0xeb 0xfe

QUESTION MISC-4I. True or False: Every declaration in C code allocates space for an object.

False. Extern declarations, such as function declarations or extern int x;, don’t allocate space.

QUESTION MISC-4J. True or False: Processes cannot share memory.

False; immediately after fork the parent and child processes share all physical memory!

For parts K–O, assume we are running on the appliance and we initialize ival, p, and q as shown below. Write the value of the expression -- you may express the values in hex if that's simpler, just be sure to prefix them with 0x to make it clear that you are doing so. For True/False questions, there is no need to correct or provide a counterexample for any statements that are false.

int ival[4] = {0x12345678, 0x9ABCDEF0, 0x13579BDF,0x2468ACE0};
int* p = &ival[0];
int* q = &ival[3];
int* x = p + 1;
char* cp = (char*) (q - 2);

QUESTION MISC-4K. q - p

QUESTION MISC-4L. ((char \*)q - (char \*)p)

QUESTION MISC-4M. x - p

QUESTION MISC-4N. \*((short \*)((char \*)x+2))

0x9ABC

QUESTION MISC-4O. \*cp

0xF0

QUESTION MISC-4P. What system call allows you to block on a collection of file descriptors?

select (also poll, pselect, epoll, …)

QUESTION MISC-4Q. What system call creates a communication channel that can only be used among related processes?

pipe

QUESTION MISC-4R. What system call can change the attributes of a file descriptor so you can poll on it rather than block?

fcntl

QUESTION MISC-4S. What system call produces a file descriptor on which a server can exchange messages with a client?

socket

QUESTION MISC-4T. True or False: A program and a process are the same thing.

False