Storage 3: Eviction policies and coherence

Overview

In this lecture, we talk about some metaphors for caching, present some eviction policies and anomalies, and discuss different types of file.

Full lecture notes on storage — Textbook readings

What is a cache?

A small amount of fast storage used to speed up access to slower storage

Food
- Stomach → fridge → supermarket → farm
Books
- Desk → library → book depository → manuscript
Brain
- Memories → experiences

What makes caches different in computer systems?

Ease of duplication
- Bytes are cheap to copy
- Copies indistinguishable from originals
So a systems cache (fast storage) often contains a copy of data from slower storage
Orders of magnitude
- Hard disk may be 10,000,000x slower to access than a register
- Snails are < 1000x slower than the fastest human ever

Caches in our units so far

Processor caches (unit 1)
- Fast cache memory contains copies of primary memory
Translation lookaside buffer (TLB) (unit 3)
- Fast memory contains results of page table lookups
Buffer cache (unit 4)
- Primary memory contains copies of drive storage
Stdio cache (unit 4)
- Application memory contains copies of buffer cache
These caches build on one another!
Each uses similar strategies: batching, prefetching, write coalescing, parallelism

Types of file

The operating system kernel uses file descriptors, and the read and write system calls, to standardize access to many types of storage
Most storage fits into one of two categories
Random access
- Finite length
- Seekable (can read from a specific position)
- Often mappable (can use the mmap system call)
- Example: drive storage; finite files on the file system: your desktop, your documents
Streaming
- Unknown, possibly-infinite length
- Not seekable (can only read from current position)
- Not mappable
- Examples: keyboard, network, output of another program
Cache concepts apply to both!
- Example: pipe buffer from WeensyOS pipe supports batching and write coalescing for streaming communication

Coherence and invalidation

Caches aim to copy slower storage, but they can get out of sync
- Cache might contain older data than slower storage
- Cache might contain newer data than slower storage
A cache is coherent when access via cache cannot be distinguished from direct access
- Processor cache is coherent
- TLB cache is not coherent
- Buffer cache is coherent
- Stdio cache is not coherent
- Browser cache is not coherent
Coherent caches generally mediate all access
- Processor cache: Processor ensures all memory accesses use cache
- Buffer cache: Kernel ensures all file accesses use cache

Cache slots and access patterns

A cache slot is a region of the cache that can cache a contiguous range of data from underlying storage
- Example for stdio cache: A buffer caching a range of file data
- Example for processor cache: A block of cache memory caching a range of physical memory
An access pattern is a sequence of accesses to underlying storage
- Example for stdio cache: read 3 bytes; read 1 byte; seek to offset 12; …
- Example for processor cache: read 1 byte at address A; read 8 bytes at address B; …

Hits and misses

A hit is an access that can be satisfied without contacting underlying storage
A miss is an access that must contact underlying storage
- A read miss requires filling the cache from underlying storage
- A write miss may require flushing data from the cache to underlying storage
The hit rate is the percentage of accesses that are hits
- Bigger hit rates are better

How many slots should a cache have?

One (single-slot cache) or more than one (multi-slot cache)?
What access patterns work well on single-slot caches?
- Repeated access
- Sequential access
- Reverse-sequential-access?
What access patterns work poorly on single-slot caches?
- Random access
- Strided access
Which of the specific caches we’ve discussed should use multiple slots?

Multi-slot cache abstraction

Many aspects of multi-slot cache design are independent of addresses and the type of underlying data
Abstract these away into a reference string
- A list of addresses, each representing one access
Multi-slot cache diagram: reference string on top, each slot is a row
- Column gives a snapshot of the cache’s contents after that access

Refstr→	1	2	3	4	1	2	5	1	2	3	4	5
Slot A	①	1	1	④	4	4	⑤	5	5	5	5	5
B		②	2	2	①	1	1	1	1	③	3	3
C			③	3	3	②	2	2	2	2	④	4
Time→	1	2	3	4	5	6	7	8	9	10	11	12

FIFO, 3 slots, hit rate 3/12

Associativity

Some multi-slot caches restrict how slots are assigned
Direct-mapped cache
- An address A can fit in exactly one slot S(A)
Fully associative cache
- Any address can fit in any slot
- Software must choose a slot
Set-associative cache
- An address A can fit in one of a subset of slots SS(A)

Eviction

Freeing a slot for reuse is called eviction
An eviction policy chooses which slot to evict
- For instance, if new data needs to be cached
- Direct-mapped caches have a trivial eviction policy
A good eviction policy gets rid of data that isn’t needed in the near future

Eviction and writes

A write cache may have dirty slots
- A dirty slot contains data newer than underlying storage
To evict a dirty slot, it must be flushed first
- Write modified data to underlying storage

Eviction policies for fully-associative caches

When a slot is needed, evict the slot that:

FIFO (First In, First Out)
- …was least recently filled (or assigned)
Are there others?

Eviction policies for fully-associative caches

When a slot is needed, evict the slot that:

FIFO (First In, First Out)
- …was least recently filled (or assigned)
LIFO (Last In, First Out)
- …was most recently filled
LRU (Least Recently Used)
- …was least recently used
MRU (Most Recently Used)
- …was most recently used
LFU (Least Frequently Used)
- …was used least often
Priority, aging, …

Question

Can you come up with reference strings that work well for each of those policies?

The optimal eviction algorithm

Bélády’s Optimal Algorithm
- When a slot is needed, evict the slot that:
- …contains the data that will be used farthest in the future

Example

	1	2	3	4
A	①	1	1	?
B		②	2	?
C			③	?

Optimal

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	1	1	1	③	3	3
B		②	2	2	2	2	2	2	2	2	④	4
C			③	④	4	4	⑤	5	5	5	5	5

Optimal, 3 slots, hit rate 5/12

When can you predict the future?

More slots means better hit rate!

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	④	4	4	⑤	5	5	5	5	5
B		②	2	2	①	1	1	1	1	③	3	3
C			③	3	3	②	2	2	2	2	④	4

FIFO, 3 slots, hit rate 3/12

	1	2	3	4
A	①	1	1	1
B		②	2	2
C			③	3
D				④

FIFO, 4 slots, hit rate ?

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	⑤	5	5	5	④	4
B		②	2	2	2	2	2	①	1	1	1	⑤
C			③	3	3	3	3	3	②	2	2	2
D				④	4	4	4	4	4	③	3	3

FIFO, 4 slots, hit rate 2/12

Well, maybe not.
This is called Bélády’s anomaly

More slots means better hit rate for LRU

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	④	4	4	⑤	5	5	③	3	3
B		②	2	2	①	1	1	1	1	1	④	4
C			③	3	3	②	2	2	2	2	2	⑤

LRU, 3 slots, hit rate 2/12

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	1	1	1	1	1	⑤
B		②	2	2	2	2	2	2	2	2	2	2
C			③	3	3	3	⑤	5	5	5	④	4
D				④	4	4	4	4	4	③	3	3

LRU, 4 slots, hit rate 4/12

`coherence.cc`

Forking, piping, and stdio

Special files

/dev/null: Contains nothing
- Any read returns end of file
- Any write succeeds
/dev/urandom: Contains random data
- Any read returns random bytes
/dev/zero: Contains zero bytes
- Any read returns all zeros

Streams

program1 | program2
Network connections

Write semantics

Always safe to read more data than requested
Not necessarily safe to write more data than requested!
- Safe with coherent caches, not safe with incoherent caches
- ./w-stdiobyterev vs. ./r-stdiobyterev

Memory mapping

mmap system call
r-mmapbyte, w-mmapbyte

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	⑤	5	5	5	④	4
B		②	2	2	2	2	2	①	1	1	1	⑤
C			③	3	3	3	3	3	②	2	2	2
D				④	4	4	4	4	4	③	3	3

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	1	1	1	1	1	⑤
B		②	2	2	2	2	2	2	2	2	2	2
C			③	3	3	3	⑤	5	5	5	④	4
D				④	4	4	4	4	4	③	3	3

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	⑤	5	5	5	④	4
B		②	2	2	2	2	2	①	1	1	1	⑤
C			③	3	3	3	3	3	②	2	2	2
D				④	4	4	4	4	4	③	3	3

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	1	1	1	1	1	⑤
B		②	2	2	2	2	2	2	2	2	2	2
C			③	3	3	3	⑤	5	5	5	④	4
D				④	4	4	4	4	4	③	3	3

Storage 3: Eviction policies and coherence

Overview

What is a cache?

Related phenomena in life

What makes caches different in computer systems?

Caches in our units so far

Types of file

Coherence and invalidation

Cache slots and access patterns

Hits and misses

How many slots should a cache have?

Multi-slot cache abstraction

Associativity

Eviction

Eviction and writes

Eviction policies for fully-associative caches

Eviction policies for fully-associative caches

Question

The optimal eviction algorithm

Example

When can you predict the future?

More slots means better hit rate!

More slots means better hit rate for LRU

coherence.cc

Forking, piping, and stdio

Special files

Streams

Write semantics

Memory mapping

`coherence.cc`

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	⑤	5	5	5	④	4
B		②	2	2	2	2	2	①	1	1	1	⑤
C			③	3	3	3	3	3	②	2	2	2
D				④	4	4	4	4	4	③	3	3

	1	2	3	4	1	2	5	1	2	3	4	5
A	①	1	1	1	1	1	1	1	1	1	1	⑤
B		②	2	2	2	2	2	2	2	2	2	2
C			③	3	3	3	⑤	5	5	5	④	4
D				④	4	4	4	4	4	③	3	3