Ancient CS 61 Content Warning!!!!!1!!!
This is not the current version of the class.
This site was automatically translated from a wiki. The translation may have introduced mistakes (and the content might have been wrong to begin with).

Final Review Solutions

Group 1

A. Here’s the sequence with accesses labeled as hit (h), cold miss (m), capacity miss (M), or conflict miss (µ):

 40m 44m 48m 4Cm 70m 74m 78m 7Cm

At this point the cache contains all 8 words seen so far.

 40h 44h 80m 84m 88m 8Cm 90m 94m 98m 9Cm 00m 04m 08m 0Cm

At this point we’ve seen a total of 20 words, so 4 words must have been evicted. But which 4 words? Since this is LRU, we can determine the current contents of the cache by collecting the 16 most recently accessed lines, going backwards from the current point. Here, these are 0C 08 04 00 9C 98 94 90 8C 88 84 80 44 40 7C 78. So the next accesses are:

 40h 44h 48M

Why M? Because we’ve seen 48 before, it is not a cold miss. Since this cache is fully associative, it is not a conflict miss.

 10m 14m 18m 1Cm 20m

That's a total of 4 hits, 25 cold misses, and 1 capacity miss. The miss rate is 26/30.

B. A similar analysis gives

 40m 44m 48m 4Cm 70m 74m 78m 7Cm 40h 44h 80m 84m 88m 8Cm 90m 94m 98m 9Cm 00m 04m 08m 0Cm

But what does the cache contain now? In FIFO, it has 70 74 78 7C 80 84 88 8C 90 94 98 9C 00 04 08 0C.

 40M 44M 48M 10m 14m 18m 1Cm 20m

2 hits, 25 cold misses, 3 capacity misses. 28/30 miss rate.

C. Batching! Increase the cache line size. For instance, with cache line size 16 bytes (= 4 words) and LRU, we’d see

 40m 44h 48h 4Ch 70m 74h 78h 7Ch 40h 44h 80m 84h 88h 8Ch 90m 94h 98h 9Ch 00m 04h 08h 0Ch 40h 44h 48h 10m 14h 18h 1Ch 20m

23 hits, 7 cold misses, 0 capacity misses, 7/30 miss rate.

D. Primary memory is a cache for the disk in that it contains the buffer cache.

The buffer cache is (obviously) a cache, for the disk.

The disk is not typically a cache, but it can act as a cache—for instance, for computers located elsewhere on the network: your browser probably stores some local copies of remote web pages on disk.

General-purpose registers are often a cache for primary memory.

E. Strided access patterns.

F. True, true, true, false.

Group 2

A. transfer2, transfer4, and transfer5 are correctly synchronized.

transfer3 will never result in an incorrect balance, but it can cause deadlock.

B. transfer1. Imagine two concurrent calls, transfer1(1, 2, 10) and transfer1(1, 3, 10). These calls could simultaneously modify accounts[1].amount, which will cause undefined behavior.

C. transfer3. Consider concurrent calls transfer3(1, 2, 10) and transfer3(2, 1, 10) and you will see the deadlock.

D. transfer4 and transfer5 have higher utilization because they use fine-grained locking, so transfers involving joint accounts can proceed in parallel. We doubt there’s any significant difference between their utilizations.

E. transfer2: the coarse-grained lock means that independent transfers cannot proceed in parallel.

Group 3

This is a real think problem. You should think about it some more.

Group 4

Assume that bar has 4 elements and has start address 0x1000. We can write the addresses of every element in bar, as follows:

bar[0].x[0] 0x1000
bar[0].x[1] 0x1004
bar[0].x[2] 0x1008
bar[0].y[0] 0x100C
bar[0].y[1] 0x1010
bar[0].y[2] 0x1014
bar[1].x[0] 0x1018
bar[1].x[1] 0x101C
bar[1].x[2] 0x1020
bar[1].y[0] 0x1024
bar[1].y[1] 0x1028
bar[1].y[2] 0x102C
bar[2].x[0] 0x1030
...
bar[2].y[2] 0x1044
...
bar[3].y[2] 0x105C

snippet2() is better than snippet1() because it accesses memory in a more sequential manner, which has better spatial locality and is friendlier to the processor cache. snippet1()’s access order for the array above is 0x1000, 0x100C, 0x1018, 0x1024, .... snippet2’s access order is 0x1000, 0x100C, 0x1004, 0x1010, 0x1008, 0x1014, 0x1018, ....

The same logic applies to snippet3(), whose access order is 0x1000, 0x1004, 0x1008, 0x100C, ....: pure sequential.

snippet4() may be faster than snippet3() because it evaluates the bar.size() function once, rather than bar.size() times. If that function call has significant overhead, then it’s useful to avoid the repeated overhead. In practice the C++ compiler may be able to do this automatically, depending on whether it can determine that doSomething() leaves bar.size() unchanged.