2016/Storage1X
Contents
Matrix multiplication
The cs61-exercises/storage1x
directory contains a program called matrixmultiply
(source in matrixmultiply.c
). This program creates two random square matrices, multiplies them together, prints some information about their product, then exits.
Your job is to transform the fast_matrix_multiply
function so that it runs as fast as possible, while producing almost exactly the same results as the original base_matrix_multiply
for all arguments.
Matrix multiplication background
A matrix is a 2D array of numbers. An mxn matrix has m rows and n columns; the value in row i and column j is written m_{ij}, or, in more C-like notation, m[i, j] or m[i][j].
A square matrix has the same number of rows and columns: m = n.
The product of two matrices a and b, written a×b or a*b, is calculated by the following rules.
- a must have the same number of columns as b has rows.
- If a has dimensions mxn, and b has dimensions nxp, then a×b has dimensions mxp: it has a’s row count and b’s column count.
- Let c = a×b. Then we compute c[i,j] as:
c_{ij} = ∑_{0≤k<n} a_{ik} * b_{kj}.
Some pictures might help. Here’s how the dimensions work:
And here’s the values used to compute element c_{ij} of the destination matrix:
Matrixes in most languages are stored in row-major order. That means that we store the matrix in a single array, where the values for each row are grouped together. For instance, if a
were a 4x4 square matrix, we’d store its elements in this order:
| a[0,0] | a[0,1] | a[0,2] | a[0,3] | a[1,0] | a[1,1] | a[1,2] | a[1,3] | a[2,0] | a[2,1] | a[2,2] | a[2,3] | a[3,0] | a[3,1] | a[3,2] | a[3,3] |
Thus, element a_{ij} is stored in array index a[i*NCOLS + j]
, where NCOLS
is the number of columns (here, 4).
Code
- The function
me
is used to access an element of a matrix. - You should only modify function
fast_matrix_multiply
. - Think about the cache!
Advanced work
Once you get the “aha” insight you should be able to make fast_matrix_multiply
take no more than 0.1x the time of base_matrix_multiply
, while producing statistics that are 0% off.
We will crown a winner.
If you have more time to work, complete the following.
- Find the fastest matrix multiplier you can.
- Find the slowest matrix multiplier you can.
- Explain why these multipliers are fast and slow.
- Find an even faster matrix multiplier in a library. We found one that runs in less than 0.01x the time of
base_matrix_multiply
(though the statistics are very slightly off). How is this library producing even faster results? - If you run
./matrixmultiply -n N
several times with the sameN
, you should notice that the corner statistics all stay about the same—even though the input matrices are being initialized randomly, and differently, each time. Why?
Survey
Please fill out the survey!
Solution video