# Matrix multiplication

The `cs61-exercises/storage1x` directory contains a program called `matrixmultiply` (source in `matrixmultiply.c`). This program creates two random square matrices, multiplies them together, prints some information about their product, then exits.

Your job is to transform the `fast_matrix_multiply` function so that it runs as fast as possible, while producing almost exactly the same results as the original `base_matrix_multiply` for all arguments.

## Matrix multiplication background

A matrix is a 2D array of numbers. An mxn matrix has m rows and n columns; the value in row i and column j is written mij, or, in more C-like notation, m[ij] or m[i][j].

A square matrix has the same number of rows and columns: m = n.

The product of two matrices a and b, written a×b or a*b, is calculated by the following rules.

• a must have the same number of columns as b has rows.
• If a has dimensions mxn, and b has dimensions nxp, then a×b has dimensions mxp: it has a’s row count and b’s column count.
• Let c = a×b. Then we compute c[i,j] as:
cij = ∑0≤k<n aik * bkj.

Some pictures might help. Here’s how the dimensions work:

And here’s the values used to compute element cij of the destination matrix:

Matrixes in most languages are stored in row-major order. That means that we store the matrix in a single array, where the values for each row are grouped together. For instance, if `a` were a 4x4 square matrix, we’d store its elements in this order:

```| a[0,0] | a[0,1] | a[0,2] | a[0,3] | a[1,0] | a[1,1] | a[1,2] | a[1,3] | a[2,0] | a[2,1] | a[2,2] | a[2,3] | a[3,0] | a[3,1] | a[3,2] | a[3,3] |
```

Thus, element aij is stored in array index `a[i*NCOLS + j]`, where `NCOLS` is the number of columns (here, 4).

## Code

• The function `me` is used to access an element of a matrix.
• You should only modify function `fast_matrix_multiply`.

Once you get the “aha” insight you should be able to make `fast_matrix_multiply` take no more than 0.1x the time of `base_matrix_multiply`, while producing statistics that are 0% off.

We will crown a winner.

If you have more time to work, complete the following.

• Find the fastest matrix multiplier you can.
• Find the slowest matrix multiplier you can.
• Explain why these multipliers are fast and slow.
• Find an even faster matrix multiplier in a library. We found one that runs in less than 0.01x the time of `base_matrix_multiply` (though the statistics are very slightly off). How is this library producing even faster results?
• If you run `./matrixmultiply -n N` several times with the same `N`, you should notice that the corner statistics all stay about the same—even though the input matrices are being initialized randomly, and differently, each time. Why?