# 2016/Storage1X

## Contents

# Matrix multiplication

The `cs61-exercises/storage1x`

directory contains a program called `matrixmultiply`

(source in `matrixmultiply.c`

). This program creates two random square matrices, multiplies them together, prints some information about their product, then exits.

Your job is to transform the `fast_matrix_multiply`

function so that it runs *as fast as possible*, while producing almost exactly the same results as the original `base_matrix_multiply`

*for all arguments.*

## Matrix multiplication background

A matrix is a 2D array of numbers. An *m*x*n* matrix has *m* rows and *n* columns; the value in row *i* and column *j* is written *m*_{ij}, or, in more C-like notation, *m*[*i*, *j*] or *m*[*i*][*j*].

A *square* matrix has the same number of rows and columns: *m* = *n*.

The *product* of two matrices *a* and *b*, written *a*×*b* or *a***b*, is calculated by the following rules.

*a*must have the same number of columns as*b*has rows.- If
*a*has dimensions*m*x*n*, and*b*has dimensions*n*x*p*, then*a*×*b*has dimensions*m*x*p*: it has*a*’s row count and*b*’s column count. - Let
*c*=*a*×*b*. Then we compute*c*[*i*,*j*] as:*c*_{ij}= ∑_{0≤k<n}*a*_{ik}**b*_{kj}.

Some pictures might help. Here’s how the dimensions work:

And here’s the values used to compute element *c*_{ij} of the destination matrix:

Matrixes in most languages are stored in *row-major order*. That means that we store the matrix in a single array, where the values for each row are grouped together. For instance, if `a`

were a 4x4 square matrix, we’d store its elements in this order:

| a[0,0] | a[0,1] | a[0,2] | a[0,3] | a[1,0] | a[1,1] | a[1,2] | a[1,3] | a[2,0] | a[2,1] | a[2,2] | a[2,3] | a[3,0] | a[3,1] | a[3,2] | a[3,3] |

Thus, element *a*_{ij} is stored in array index `a[i*NCOLS + j]`

, where `NCOLS`

is the number of columns (here, 4).

## Code

- The function
`me`

is used to access an element of a matrix. - You should only modify function
`fast_matrix_multiply`

. *Think about the cache!*

## Advanced work

Once you get the “aha” insight you should be able to make `fast_matrix_multiply`

take no more than 0.1x the time of `base_matrix_multiply`

, while producing statistics that are 0% off.

We will crown a winner.

If you have more time to work, complete the following.

- Find the
*fastest*matrix multiplier you can. - Find the
*slowest*matrix multiplier you can. - Explain
*why*these multipliers are fast and slow. - Find an
*even faster*matrix multiplier in a library. We found one that runs in less than 0.01x the time of`base_matrix_multiply`

(though the statistics are very slightly off). How is this library producing even faster results? - If you run
`./matrixmultiply -n N`

several times with the same`N`

, you should notice that the corner statistics all stay about the same—even though the input matrices are being initialized randomly, and differently, each time. Why?

## Survey

Please fill out the survey!

## Solution video