Ancient CS 61 Content Warning!!!!!1!!!
This is not the current version of the class.
This site was automatically translated from a wiki. The translation may have introduced mistakes (and the content might have been wrong to begin with).

Midterm

We’ve heard a lot of thoughts and despair on the midterm, and want to share with you how we think about exams in this class.

Processor cache performance

As you have seen already, accessing values that are stored in the processor cache is much faster than accessing values in memory. In the following examples (brought to you by CS165), assume that a is a large array with n elements.

Between (1) and (2), which is faster?

(1):

for (int i = 0; i < n; i++) {
    min = a[i] < min ? a[i] : min;
}
for (int i = 0; i < n ; i++) {
    max = a[i] > max ? a[i] : max;
}

(2):

for (int i = 0; i < n; i++) {
    min = a[i] < min ? a[i] : min;
    max = a[i] > max ? a[i] : max;
}

Now let's also use an array b, which has the same number of elements as a. Between (3) and (4), which is faster? Are there any drawbacks to using the solution that better utilizes the processor cache?

(3):

for (int i = 0; i < n; i++) {
    min = a[i] < min ? a[i] : min;
}
for(int i = 0; i < n; i++) {
    max = b[i] > max ? b[i] : max;
}

(4):

for (int i = 0; i < n; i++) {
    min = a[i] < min ? a[i] : min;
    max = b[i] > max ? b[i] : max;
}

Data structure cache performance

It is important to use data structures that will take full advantage of the cache. Let’s look at some popular data structures and analyze their performance with a processor cache.

A “key-value” data structure, or “KV store”, maps keys to values. Many key-value structures support these operations:

Often a key-value data structure uses strings for keys and values, but there are other possibilities. For example, you can imagine one that maps ints to ints.

Consider the following int-based key-value data structures. In each case, we have written the get function for you.

(1) A 1D array:

#define MAXKEY …
struct kv1 {
    int v[MAXKEY];
};
int get(struct kv1* kv, int key) {
    if (key >= 0 && key < MAXKEY) {
        return kv->v[key];
    } else {
        return NONEXISTENT_VALUE;
    }
}

(2) A 2D array stored in “row-major” order (this data structure uses two ints for its key):

#define NROWS …
#define NCOLS …
struct kv2 {
    int v[NROWS * NCOLS];
};
int get(struct kv2* kv, int key_row, int key_col) {
    if (key_row >= 0 && key_row < NROWS && key_col >= 0 && key_col < NCOLS) {
        return kv->v[key_row * NCOLS + key_col];
    } else {
        return NONEXISTENT_VALUE;
    }
}

(3) A singly-linked list:

struct kv3 {
    int k;
    int v;
    struct kv3* next;
};
int get(struct kv3* kv, int key) {
    for (; kv; kv = kv->next) {
        if (kv->k == key) {
            return kv->v;
        }
    }
    return NONEXISTENT_VALUE;
}

(4) A chained hash table: In a chained hash table every bucket contains a linked list of all the elements which hash to that bucket. If you try to add an element to a bucket, then you just add it to the linked list in that bucket.

typedef struct kv4_node {
    int k;
    int v;
    struct kv4_node* next;
} kv4_node;
struct kv4 {
    kv4_node* buckets[NBUCKETS];
};
unsigned hash(int key) { /* return hash code */ }
int get(struct kv4* kv, int key) {
    int h = hash(key) % NBUCKETS;
    for (kv4_node* n = kv->buckets[h]; n; n = n->next) {
        if (n->k == key) {
            return n->v;
        }
    }
    return NONEXISTENT_VALUE;
}

(5) An open-coded hash table: In an open-coded hash table, every bucket only holds one value. If a collision happens, then you just find another bucket that doesn't yet have a value in it. In this case we are using linear probing. This means that when a collision happens, then we look at the next bucket and see if that one is free. If it is, then we will store the value there and if it is not, then we will repeat this process and look at the next one.

#define INVALID_KEY INT_MIN
typedef struct kv5_slot {
    int k;
    int v;
} kv5_slot;
struct kv5 {
    kv5_slot buckets[NBUCKETS];
};
unsigned hash(int key) { /* return hash code */ }
int get(struct kv5* kv, int key) {
    unsigned h = hash(key) % NBUCKETS;
    // assume at least 1 bucket is empty
    while (1) {
        if (kv->buckets[h].k == key) {
            return kv->buckets[k].v;
        } else if (kv->buckets[h].k == INVALID_KEY) {
            return NONEXISTENT_VALUE;
        } else {
            h = (h + 1) % NBUCKETS;
        }
    }
}

(6) A binary tree:

struct kv6 {
    int k;
    int v;
    struct kv6* left_child;
    struct kv6* right_child;
};
int get(struct kv6* kv, int key) {
    while (kv) {
        if (kv->k == key) {
            return kv->v;
        } else if (kv->k < key) {
            kv = kv->left_child;
        } else {
            kv = kv->right_child;
        }
    }
    return NONEXISTENT_VALUE;
}

Draw a picture of each of these structures so you understand their design.

Now assume that each key-value structure holds N elements, where N is much larger than the processor cache; and imagine a program that repeatedly calls get on a structure. How many processor cache misses will get cause? Specifically, for each structure:

Now give examples of workloads that would be best for each key-value structure.

Matrix multiplication

Pull the latest version of the sections repository by entering in the terminal

git pull

The cs61-section/s07 directory contains a program called matrixmultiply (source in matrixmultiply.c). This program creates two random square matrices, multiplies them together, prints some information about their product, then exits.

Your job is to transform the matrix_multiply function so that it runs as fast as possible, while producing the same results as the original program (which you can run as ./base-matrixmultiply) for all arguments.

Matrix multiplication background

A matrix is a 2D array of numbers. An mxn matrix has m rows and n columns; the value in row i and column j is written mij, or, in more C-like notation, m[ij] or m[i][j].

A square matrix has the same number of rows and columns: m = n.

The product of two matrices a and b, written a×b or a*b, is calculated by the following rules.

Some pictures might help. Here’s how the dimensions work:

File:Matmult002.png

And here’s the values used to compute element cij of the destination matrix:

File:Matmult003.png

Matrixes in most languages are stored in row-major order. That means that we store the matrix in a single array, where the values for each row are grouped together. For instance, if a were a 4x4 square matrix, we’d store its elements in this order:

| a[0,0] | a[0,1] | a[0,2] | a[0,3] | a[1,0] | a[1,1] | a[1,2] | a[1,3] | a[2,0] | a[2,1] | a[2,2] | a[2,3] | a[3,0] | a[3,1] | a[3,2] | a[3,3] |

Thus, element aij is stored in array index a[i\*NCOLS + j], where NCOLS is the number of columns (here, 4).

Code

Once you get the “aha” insight you should be able to make ./matrixmultiply take no more than 0.1x the time of ./base-matrixmultiply, while producing statistics that are 0% off.

Try to explain why the performance improvement is as big as it is.