2014/MidtermSolutions

From CS61
Jump to: navigation, search

CS61 Midterm 2014 Solutions

Rules

You have 83 minutes to complete this midterm.

Do your work in the exam booklet. Use the backs of pages if you need to, but write your answers on the fronts.

Some problems are harder than others and they are not ordered by difficulty. Look over all the problems and use your time wisely. You have about 2.3 minutes per problem part.

The exam is open book, open note, open computer. You may access the book, and your own notes in paper form. You may also use a computer or equivalent to access class materials. However, you may not access non-class materials except as explicitly allowed below, and you may not write programs to answer questions. Specifically:

  • You may access a browser and a PDF reader.
  • You may access your own notes electronically.
  • You may access an Internet site on which your own notes are stored.
  • You may access system manual pages (man).
  • You may access the cs61.seas.harvard.edu/wiki/2014 site.

But:

  • You may not access Google or Wikipedia or anything else except as directly linked from the cs61.seas.harvard.edu/wiki/2014 site.
  • You may not access Piazza.
  • You may not access course videos.
  • You may not run a C compiler, assembler, on-line disassembler, calculator, or anything similar. Simple reading applications only.
  • You may not access other students’ notes.
  • You absolutely may not contact other humans via IM or anything like it.
  • You may not access solutions from any previous exam, by paper or computer, except for those linked on the course wiki.

Any violations of this policy, or the spirit of this policy, are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Name

Your name: ____________________________________________________________________
Your student ID (HUID/DCE ID): ____________________________________________________________________

You must sign the following statement for us to grade the exam:

I have neither given nor received inappropriate help on this exam.

_________________________________________________________________ __________________________
Signature Date

0. Ground rules

Assume a 32-bit x86 architecture unless explicitly told otherwise.

Write your assumptions if a problem seems unclear, and write down your reasoning for partial credit.

1. Data representation (12 points)

QUESTION 1A. Arrange the following values in increasing numeric order. Assume that x is an int with value 8192.

  1. EOF
  2. x & ~x
  3. (signed char) 0x47F
  4. x | ~x
  1. 1000
  2. (signed char) 65535
  3. The size of the stdio cache
  4. -0x80000000

A possible answer might be “a < b < c = d < e < f < g < h.”

4 points

h < a = d = f < b < c < e < g

For each of the remaining questions, write one or more arguments that, when passed to the provided function, will cause it to return the integer 61 (which is 0x3d hexadecimal). Write the expected number of arguments of the expected types.

QUESTION 1B.

int f1(int n) {
    return 0x11 ^ n;
}
2 points. 0x2c == 44

QUESTION 1C.

int f2(const char* s) {
    return strtol(s, NULL, 0);
}
2 points. "61", " 0x3d", " 61 ", etc.

QUESTION 1D. Your answer should be different from the previous answer.

int f3(const char* s) {
    return strtol(s, NULL, 0);
}
2 points

QUESTION 1E. For this problem, you will also need to define a global variable. Give its type and value.

f4:
    movl 4(%esp), %eax    # 4(%esp) is the first argument to the function
    andl $5, %eax
    addl %eax, %eax
    addl 8(%esp), %eax    # 8(%esp) is the second argument to the function
    movzbl y, %edx
    subl %edx, %eax
    ret
2 points. This code was compiled from:
int f4(int a, int b) {
    extern unsigned char y;
    return (a & 5) * 2 + b - y;
}

A valid solution is a=0, b=61, unsigned char y=0.

2. Sizes and alignments (12 points)

QUESTION 2A. Use the following members to create a struct of size 16. You will use each member exactly once, and char a comes first.

  1. char a; (we’ve written this for you)
  2. unsigned char b;
  3. short c;
  4. int d;
struct size_16 {
    char a;



};

Impossible! Bad question; we should have worded it to allow “impossible” to be an expected answer.

QUESTION 2B. Repeat Question 2A, but create a struct with size 12.

struct size_12 {
    char a;



};

2 points. abdc, acbd, acdb

QUESTION 2C. Repeat Question 2A, but create a struct with size 8.

struct size_8 {
    char a;



};

4 points. abcd

QUESTION 2D. Consider the following structs:

struct x {
    T x1;
    U x2;
};
struct y {
    struct x y1;
    V y2;
};

Give definitions for T, U, and V so that there is one byte of padding in struct x after x2, and two bytes of padding in struct y after y1.

4 points. Example: T = short[2]; U = char; V = int

3. Dynamic memory allocation (16 points)

QUESTION 3A. True or false?

  1. free(NULL) is an error.
  2. malloc(0) can never return NULL.
3 points. False, False

QUESTION 3B. Give values for sz and nmemb so that calloc(sz, nmemb) will always return NULL (on a 32-bit x86 machine), but malloc(sz * nmemb) might or might not return null.

3 points. (size_t) -1, (size_t) -1—anything that causes an overflow

Consider the following 8 statements. (p and q have type char*.)

  1. free(p);
  2. free(q);
  3. p = q;
  4. q = NULL;
  5. p = (char*) malloc(12);
  6. q = (char*) malloc(8);
  7. p[8] = 0;
  8. q[4] = 0;

QUESTION 3C. Put the statements in an order that would execute without error or evoking undefined behavior. Memory leaks count as errors. Use each statement exactly once. Sample answer: “abcdefgh.”

POINT TOTALS: 3C-F are graded together.
  • 10 points for 4/4 correct
  • 9 points for 3/4 correct
  • 7 points for 2/4 correct
  • 4 points for 1/4 correct
  • Additional partial credit is allowed
cdefghab (and others). Expect “OK”

QUESTION 3D. Put the statements in an order that would cause one double-free error, and no other error or undefined behavior (except possibly one memory leak). Use each statement exactly once.

efghbcad (and others). Expect “double-free + memory leak”

QUESTION 3E. Put the statements in an order that would cause one memory leak (one allocated piece of memory is not freed), and no other error or undefined behavior. Use each statement exactly once.

efghadbc (and others). Expect “memory leak”

QUESTION 3F. Put the statements in an order that would cause one boundary write error, and no other error or undefined behavior. Use each statement exactly once.

eafhcgbd (and others). Expect “out of bounds write”

4. Pointers and debugging allocators (12 points)

You are debugging some students’ m61 code from Problem Set 1. The codes use the following metadata:

typedef struct meta { ...
    struct meta* next;
    struct meta* prev;
} meta;

meta* mhead;    // head of active allocations list

Their linked-list manipulations in m61_malloc are similar.

void* m61_malloc(size_t sz, const char* file, int line) {
    ...
    meta* m = (meta*) ptr;
    m->next = mhead;
    m->prev = NULL;
    if (mhead)
        mhead->prev = m;
    mhead = m;
    ...
}

But their linked-list manipulations in m61_free differ.

Alice’s code:

void m61_free(void* ptr, ...) { ...  
    meta* m = (meta*) ptr - 1;
    if (m->next != NULL)
        m->next->prev = m->prev;
    if (m->prev == NULL)
        mhead = NULL;
    else
        m->prev->next = m->next;
    ...
}

Bob’s code:

void m61_free(void* ptr, ...) { ...  
    meta* m = (meta*) ptr - 1;
    if (m->next)
        m->next->prev = m->prev;
    if (m->prev)
        m->prev->next = m->next;
    ...
}

Chris’s code:

void m61_free(void* ptr, ...) { ...  
    meta* m = (meta*) ptr - 1;
    m->next->prev = m->prev;
    m->prev->next = m->next;
    ...
}

Donna’s code:

void m61_free(void* ptr, ...) { ...  
    meta* m = (meta*) ptr - 1;
    if (m->next)
        m->next->prev = m->prev;
    if (m->prev)
        m->prev->next = m->next;
    else
        mhead = m->next;
    ...
}

You may assume that all code not shown is correct.

QUESTION 4A. Whose code will segmentation fault on this input? List all students that apply.

int main() {
    void* ptr = malloc(1);
    free(ptr);
}
Grade 4A-4D together. Point assignments:
  • 4/4 correct: 12 points
  • 3/4 correct: 10 points
  • 2/4 correct: 7 points
  • 1/4 correct: 4 points
Chris

QUESTION 4B. Whose code might report something like “invalid free of pointer [ptr1], not allocated” on this input? (Because a list traversal starting from mhead fails to find ptr1.) List all students that apply. Don’t include students whose code would segfault before the report.

int main() {
    void* ptr1 = malloc(1);
    void* ptr2 = malloc(1);
    free(ptr2);
    free(ptr1);   // <- message printed here
}
Alice

QUESTION 4C. Whose code would improperly report something like “LEAK CHECK: allocated object [ptr1] with size 1” on this input? (Because the mhead list appears not empty, although it should be.) List all students that apply. Don’t include students whose code would segfault before the report.

int main() {
    void* ptr1 = malloc(1);
    free(ptr1);
    m61_printleakreport();
}
Bob

QUESTION 4D. Whose linked-list code is correct for all inputs? List all that apply.

Donna

5. Arena allocation (15 points)

Chimamanda Ngozi Adichie is a writing a program that needs to allocate and free a lot of nodes, where a node is defined as follows:

typedef struct node {
    int key;
    void* value;
    struct node* left;
    struct node* right;      // also used in free list
} node;

She uses a variant of the arena allocator we saw in Lectures 5 and 6. Here’s part of the code.

typedef struct arena_group {
    struct arena_group* next_group;
    node nodes[1024];
} arena_group;

typedef struct arena {
    node* frees;
    arena_group* groups;
} arena;

node* node_alloc(arena* a) {
    if (!a->frees) {
        arena_group* g = (arena_group*) malloc(sizeof(arena_group));
        // ... link g to a->groups ...
        for (size_t i = 0; i != 1023; ++i)
            g->nodes[i].right = &g->nodes[i + 1];
        g->nodes[1023].right = NULL;
        a->frees = &g->nodes[0];
    }
    node* n = a->frees;
    a->frees = n->right;
    return n;
}

void node_free(arena* a, node* n) {
    n->right = a->frees;
    a->frees = n;
}

QUESTION 5A. True or false?

  1. This allocator never has external fragmentation.
  2. This allocator never has internal fragmentation.
3 points. True, True

QUESTION 5B. Chimamanda’s frenemy Paul Auster notices that if many nodes are allocated right in a row, every 1024th allocation seems much more expensive than the others. The reason is that every 1024th allocation initializes a new group, which in turn adds 1024 nodes to the free list. Chimamanda decides instead to allow a single element of the free list to represent many contiguous free nodes. The average allocation might get a tiny bit slower, but no allocation will be much slower than average. Here’s the start of her idea:

node* node_alloc(arena* a) {
    if (!a->frees) {
        arena_group* g = (arena_group*) malloc(sizeof(arena_group));
        // ... link g to a->groups ...
        g->nodes[0].key = 1024;   // g->nodes[0] is the 1st of 1024 contiguous free nodes
        g->nodes[0].right = NULL;
        a->frees = &g->nodes[0];
    }
    node* n = a->frees;
    // ???
    return n;
}

Complete this function by writing code to replace // ???.

3 points

if (n->key == 1)
    a->frees = n->right;
else {
    a->frees = n + 1;
    a->frees->key = n->key - 1;
    a->frees->right = n->right;
}

Another solution:

if (n->right)
    a->frees = n->right;
else if (n->key == 1)
    a->frees = NULL;
else {
    a->frees = n + 1;
    a->frees->key = n->key - 1;
}

QUESTION 5C. Write a node_free function that works with the node_alloc function from the previous question.

void node_free(arena* a, node* n) {

3 points

    n->right = a->frees;
    n->key = 1;
    a->frees = n;

Or, if you use the solution above:

    n->right = a->frees;
    a->frees = n;
}

QUESTION 5D. Complete the following new function.

// Return the arena_group containing node n. n must be a node returned by
// a previous call to node_alloc(a).
arena_group* node_find_group(arena* a, node* n) {
    for (arena_group* g = a->groups; g; g = g->next_group) {

3 points

        if (&g->nodes[0] <= n && n <= &g->nodes[1023])
            return g;
    }
    return NULL;
}

QUESTION 5E. Chimamanda doesn’t like that the node_find_group function from Question 5D takes O(G) time, where G is the number of allocated arena_groups. She remembers a library function that might help, posix_memalign:

int posix_memalign(void** memptr, size_t alignment, size_t size);

The function posix_memalign() allocates size bytes and places the address of the allocated memory in *memptr. The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). ...

“Cool,” she says, “I can use this to speed up node_find_group!” She now allocates a new group with the following code:

arena_group* g;
int r = posix_memalign(&g, 32768, sizeof(arena_group));
assert(r == 0); // posix_memalign succeeded

Given this allocation strategy, write a version of node_find_group that takes O(1) time.

arena_group* node_find_group(arena* a, node* n) {

3 points

    uintptr_t n_addr = (uintptr_t) n;
    return (arena_group*) (n_addr - n_addr % 32768);
}

6. IO caching and strace (18 points)

Elif Batuman is investigating several program executables left behind by her ex-roommate Fyodor. She runs each executable under strace in the following way:

strace -o strace.txt ./EXECUTABLE files/text1meg.txt > files/out.txt

Help her figure out properties of these programs based on their system call traces.

QUESTION 6A. Program ./mysterya:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x8193000
brk(0x81b5000)                          = 0x81b5000
read(3, "A", 1)                         = 1
write(1, "A", 1)                        = 1
read(3, "\n", 1)                        = 1
write(1, "\n", 1)                       = 1
read(3, "A", 1)                         = 1
write(1, "A", 1)                        = 1
read(3, "'", 1)                         = 1
write(1, "'", 1)                        = 1
read(3, "s", 1)                         = 1
write(1, "s", 1)                        = 1
...

Circle at least one option in each column.

  1. Sequential IO
  2. Reverse sequential IO
  3. Strided IO
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other
3 points. 1, a, i, D

QUESTION 6B. Program ./mysteryb:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x96c5000
brk(0x96e6000)                          = 0x96e6000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
read(3, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
write(1, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
...

Circle at least one option in each column.

  1. Sequential IO
  2. Reverse sequential IO
  3. Strided IO
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other
3 points. 1, b/c, ii, B

QUESTION 6C. Program ./mysteryc:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9064000
brk(0x9085000)                          = 0x9085000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1046528, SEEK_SET)             = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1044480, SEEK_SET)             = 1044480
read(3, "Quinton\nQuinton's\nQuirinal\nQuisl"..., 2048) = 2048
write(1, "ehR\neehR\naehR\ns'hR\nhR\nsdlonyeR\ns"..., 2048) = 2048
lseek(3, 1042432, SEEK_SET)             = 1042432
read(3, "emyslid's\nPrensa\nPrensa's\nPrenti"..., 2048) = 2048
write(1, "\ns'nailitniuQ\nnailitniuQ\nnniuQ\ns"..., 2048) = 2048
lseek(3, 1040384, SEEK_SET)             = 1040384
read(3, "Pindar's\nPinkerton\nPinocchio\nPin"..., 2048) = 2048
write(1, "rP\ndilsymerP\ns'regnimerP\nregnime"..., 2048) = 2048
...

Circle at least one option in each column.

  1. Sequential IO
  2. Reverse sequential IO
  3. Strided IO
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other
3 points. 2, c, ii, B

QUESTION 6D. Program ./mysteryd:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9a0e000
brk(0x9a2f000)                          = 0x9a2f000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1048575, SEEK_SET)             = 1048575
read(3, "o", 2048)                      = 1
lseek(3, 1048574, SEEK_SET)             = 1048574
read(3, "Ro", 2048)                     = 2
lseek(3, 1048573, SEEK_SET)             = 1048573
read(3, "\nRo", 2048)                   = 3
...
lseek(3, 1046528, SEEK_SET)             = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1046527, SEEK_SET)             = 1046527
read(3, "eingau\nRheingau's\nRhenish\nRhiann"..., 2048) = 2048
lseek(3, 1046526, SEEK_SET)             = 1046526
read(3, "heingau\nRheingau's\nRhenish\nRhian"..., 2048) = 2048
...

Circle at least one option in each column.

  1. Sequential IO
  2. Reverse sequential IO
  3. Strided IO
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other
3 points. 2, b, ii, B

QUESTION 6E. Program ./mysterye:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x93e5000
brk(0x9407000)                          = 0x9407000
read(3, "A", 1)                         = 1
read(3, "\n", 1)                        = 1
read(3, "A", 1)                         = 1
...
read(3, "A", 1)                         = 1
read(3, "l", 1)                         = 1
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 1024) = 1024
read(3, "t", 1)                         = 1
read(3, "o", 1)                         = 1
read(3, "n", 1)                         = 1
...

Circle at least one option in each column.

  1. Sequential IO
  2. Reverse sequential IO
  3. Strided IO
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other

3 points. 1, a, ii, C

Some people will circle C and D, because there’s no read cache, so the read cache is “other.” That’s OK.

QUESTION 6F. Program ./mysteryf:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9281000
brk(0x92a3000)                          = 0x92a3000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 4096) = 4096
write(1, "A", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, "A", 1)                        = 1
...
write(1, "A", 1)                        = 1
write(1, "l", 1)                        = 1
read(3, "ton's\nAludra\nAludra's\nAlva\nAlvar"..., 4096) = 4096
write(1, "t", 1)                        = 1
write(1, "o", 1)                        = 1
write(1, "n", 1)                        = 1
...

Circle at least one option in each column.

  1. Sequential IO
  2. Reverse sequential IO
  3. Strided IO
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other
3 points. 1, b/c, i, A

7. Processor cache (15 points)

The following questions use the following C definition for an NxM matrix (the matrix has N rows and M columns).

typedef struct matrix {
    unsigned N;
    unsigned M;
    double elt[0];
} matrix;

matrix* matrix_create(unsigned N, unsigned M) {
    matrix* m = (matrix*) malloc(sizeof(matrix) + N * M * sizeof(double));
    m->N = N;
    m->M = M;
    for (size_t i = 0; i < N * M; ++i)
        m->elt[i] = 0.0;
    return m;
}

Typically, matrix data is stored in row-major order: element mij (at row i and column j) is stored in m->elt[i*m->M + j]. We might write this in C using an inline function:

inline double* melt1(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i * m->M + j];
}

But that’s not the only possible method to store matrix data. Here are several more.

inline double* melt2(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i + j * m->N];
}

inline double* melt3(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i + ((m->N - i + j) % m->M) * m->N];
}

inline double* melt4(matrix* m, unsigned i, unsigned j) {
    return &m->elt[i + ((i + j) % m->M) * m->N];
}

inline double* melt5(matrix* m, unsigned i, unsigned j) {
    assert(m->M % 8 == 0);
    unsigned k = (i/8) * (m->M/8) + (j/8);
    return &m->elt[k*64 + (i % 8) * 8 + j % 8];
}

QUESTION 7A. Which method (of melt1melt5) will have the best processor cache behavior if most matrix accesses use loops like this?

for (unsigned j = 0; j < 100; ++j)
    for (unsigned i = 0; i < 100; ++i)
        f(*melt(m, i, j));
7A-7D are graded together.
  • 4/4 correct: 9 points
  • 3/4 correct: 8 points
  • 2/4 correct: 6 points
  • 1/4 correct: 4 points
melt2

QUESTION 7B. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

for (unsigned i = 0; i < 100; ++i)
    f(*melt(m, i, i));
melt3

QUESTION 7C. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

for (unsigned i = 0; i < 100; ++i)
    for (unsigned j = 0; j < 100; ++j)
        f(*melt(m, i, j));
melt1 (but melt5 is almost as good!)

QUESTION 7D. Which method will have the best processor cache behavior if most matrix accesses use loops like this?

for (int di = -3; di <= 3; ++di)
    for (int dj = -3; dj <= 3; ++dj)
        f(*melt(m, I + di, J + dj));
melt5

QUESTION 7E. Here is a matrix-multiply function in ikj order.

matrix* matrix_multiply(matrix* a, matrix* b) {
    assert(a->M == b->N);
    matrix* c = matrix_create(a->N, b->M);
    for (unsigned i = 0; i != a->N; ++i)
        for (unsigned k = 0; k != a->M; ++k)
            for (unsigned j = 0; j != b->M; ++j)
                *melt(c, i, j) += *melt(a, i, k) * *melt(b, k, j);
}

This loop order is cache-optimal when data is stored in melt1 order. What loop order is cache-optimal for melt2?

3 points. jki is best; kji is a close second.

QUESTION 7F. You notice that accessing a matrix element using melt1 is very slow. After some debugging, it seems like the processor on which you are running code has a very slow multiply instruction. Briefly describe a change to struct matrix that would let you write a version of melt1 with no multiply instruction. You may add members, change sizes, or anything you like.

3 points. Example answers:
  1. Add a double** rows member that points to each row so you don't need to multiply
  2. Round M up to a power of 2 and use shifts