Final Review Solutions
Group 1
A. Here’s the sequence with accesses labeled as hit (h), cold miss (m), capacity miss (M), or conflict miss (µ):
40m 44m 48m 4Cm 70m 74m 78m 7Cm
At this point the cache contains all 8 words seen so far.
40h 44h 80m 84m 88m 8Cm 90m 94m 98m 9Cm 00m 04m 08m 0Cm
At this point we’ve seen a total of 20 words, so 4 words must have been evicted. But which 4 words? Since this is LRU, we can determine the current contents of the cache by collecting the 16 most recently accessed lines, going backwards from the current point. Here, these are 0C 08 04 00 9C 98 94 90 8C 88 84 80 44 40 7C 78. So the next accesses are:
40h 44h 48M
Why M? Because we’ve seen 48 before, it is not a cold miss. Since this cache is fully associative, it is not a conflict miss.
10m 14m 18m 1Cm 20m
That's a total of 4 hits, 25 cold misses, and 1 capacity miss. The miss rate is 26/30.
B. A similar analysis gives
40m 44m 48m 4Cm 70m 74m 78m 7Cm 40h 44h 80m 84m 88m 8Cm 90m 94m 98m 9Cm 00m 04m 08m 0Cm
But what does the cache contain now? In FIFO, it has 70 74 78 7C 80 84 88 8C 90 94 98 9C 00 04 08 0C.
40M 44M 48M 10m 14m 18m 1Cm 20m
2 hits, 25 cold misses, 3 capacity misses. 28/30 miss rate.
C. Batching! Increase the cache line size. For instance, with cache line size 16 bytes (= 4 words) and LRU, we’d see
40m 44h 48h 4Ch 70m 74h 78h 7Ch 40h 44h 80m 84h 88h 8Ch 90m 94h 98h 9Ch 00m 04h 08h 0Ch 40h 44h 48h 10m 14h 18h 1Ch 20m
23 hits, 7 cold misses, 0 capacity misses, 7/30 miss rate.
D. Primary memory is a cache for the disk in that it contains the buffer cache.
The buffer cache is (obviously) a cache, for the disk.
The disk is not typically a cache, but it can act as a cache—for instance, for computers located elsewhere on the network: your browser probably stores some local copies of remote web pages on disk.
General-purpose registers are often a cache for primary memory.
E. Strided access patterns.
F. True, true, true, false.
Group 2
A. transfer2
, transfer4
, and transfer5
are correctly
synchronized.
transfer3
will never result in an incorrect balance, but it can cause
deadlock.
B. transfer1
. Imagine two concurrent calls, transfer1(1, 2, 10)
and transfer1(1, 3, 10)
. These calls could simultaneously modify
accounts[1].amount
, which will cause undefined behavior.
C. transfer3
. Consider concurrent calls transfer3(1, 2, 10)
and
transfer3(2, 1, 10)
and you will see the deadlock.
D. transfer4
and transfer5
have higher utilization because they
use fine-grained locking, so transfers involving joint accounts can
proceed in parallel. We doubt there’s any significant difference between
their utilizations.
E. transfer2
: the coarse-grained lock means that independent
transfers cannot proceed in parallel.
Group 3
This is a real think problem. You should think about it some more.
Group 4
Assume that bar
has 4 elements and has start address 0x1000. We can
write the addresses of every element in bar
, as follows:
bar[0].x[0] | 0x1000 |
bar[0].x[1] | 0x1004 |
bar[0].x[2] | 0x1008 |
bar[0].y[0] | 0x100C |
bar[0].y[1] | 0x1010 |
bar[0].y[2] | 0x1014 |
bar[1].x[0] | 0x1018 |
bar[1].x[1] | 0x101C |
bar[1].x[2] | 0x1020 |
bar[1].y[0] | 0x1024 |
bar[1].y[1] | 0x1028 |
bar[1].y[2] | 0x102C |
bar[2].x[0] | 0x1030 |
... | |
bar[2].y[2] | 0x1044 |
... | |
bar[3].y[2] | 0x105C |
snippet2()
is better than snippet1()
because it accesses memory in a
more sequential manner, which has better spatial locality and is
friendlier to the processor cache. snippet1()
’s access order for the
array above is 0x1000, 0x100C, 0x1018, 0x1024, .... snippet2
’s access
order is 0x1000, 0x100C, 0x1004, 0x1010, 0x1008, 0x1014, 0x1018, ....
The same logic applies to snippet3()
, whose access order is 0x1000,
0x1004, 0x1008, 0x100C, ....: pure sequential.
snippet4()
may be faster than snippet3()
because it evaluates the
bar.size()
function once, rather than bar.size()
times. If that
function call has significant overhead, then it’s useful to avoid the
repeated overhead. In practice the C++ compiler may be able to do this
automatically, depending on whether it can determine that
doSomething()
leaves bar.size()
unchanged.