Overview
In this lecture, we talk about some metaphors for caching, present some
eviction policies and anomalies, and discuss different types of file.
Full lecture notes on storage — Textbook readings
What is a cache?
- A small amount of fast storage used to speed up access to slower storage
- Food
- Stomach → fridge → supermarket → farm
- Books
- Desk → library → book depository → manuscript
- Brain
What makes caches different in computer systems?
- Ease of duplication
- Bytes are cheap to copy
- Copies indistinguishable from originals
- So a systems cache (fast storage) often contains a copy of data from slower storage
- Orders of magnitude
- Hard disk may be 10,000,000x slower to access than a register
- Snails are < 1000x slower than the fastest human ever
Caches in our units so far
- Processor caches (unit 1)
- Fast cache memory contains copies of primary memory
- Translation lookaside buffer (TLB) (unit 3)
- Fast memory contains results of page table lookups
- Buffer cache (unit 4)
- Primary memory contains copies of drive storage
- Stdio cache (unit 4)
- Application memory contains copies of buffer cache
- These caches build on one another!
- Each uses similar strategies: batching, prefetching, write coalescing,
parallelism
Types of file
- The operating system kernel uses file descriptors, and the
read and
write system calls, to standardize access to many types of storage
- Most storage fits into one of two categories
- Random access
- Finite length
- Seekable (can read from a specific position)
- Often mappable (can use the
mmap system call)
- Example: drive storage; finite files on the file system: your desktop,
your documents
- Streaming
- Unknown, possibly-infinite length
- Not seekable (can only read from current position)
- Not mappable
- Examples: keyboard, network, output of another program
- Cache concepts apply to both!
- Example: pipe buffer from WeensyOS pipe supports batching and write
coalescing for streaming communication
Coherence and invalidation
- Caches aim to copy slower storage, but they can get out of sync
- Cache might contain older data than slower storage
- Cache might contain newer data than slower storage
- A cache is coherent when access via cache cannot be distinguished from
direct access
- Processor cache is coherent
- TLB cache is not coherent
- Buffer cache is coherent
- Stdio cache is not coherent
- Browser cache is not coherent
- Coherent caches generally mediate all access
- Processor cache: Processor ensures all memory accesses use cache
- Buffer cache: Kernel ensures all file accesses use cache
Cache slots and access patterns
- A cache slot is a region of the cache that can cache a contiguous
range of data from underlying storage
- Example for stdio cache: A buffer caching a range of file data
- Example for processor cache: A block of cache memory caching a range of physical memory
- An access pattern is a sequence of accesses to underlying storage
- Example for stdio cache: read 3 bytes; read 1 byte; seek to offset 12; …
- Example for processor cache: read 1 byte at address A; read 8 bytes at address B; …
Hits and misses
- A hit is an access that can be satisfied without contacting underlying storage
- A miss is an access that must contact underlying storage
- A read miss requires filling the cache from underlying storage
- A write miss may require flushing data from the cache to underlying storage
- The hit rate is the percentage of accesses that are hits
- Bigger hit rates are better
How many slots should a cache have?
- One (single-slot cache) or more than one (multi-slot cache)?
- What access patterns work well on single-slot caches?
- Repeated access
- Sequential access
- Reverse-sequential-access?
- What access patterns work poorly on single-slot caches?
- Random access
- Strided access
- Which of the specific caches we’ve discussed should use multiple slots?
Multi-slot cache abstraction
- Many aspects of multi-slot cache design are independent of addresses and the
type of underlying data
- Abstract these away into a reference string
- A list of addresses, each representing one access
- Multi-slot cache diagram: reference string on top, each slot is a row
- Column gives a snapshot of the cache’s contents after that access
FIFO, 3 slots, hit rate 3/12
Associativity
- Some multi-slot caches restrict how slots are assigned
- Direct-mapped cache
- An address A can fit in exactly one slot S(A)
- Fully associative cache
- Any address can fit in any slot
- Software must choose a slot
- Set-associative cache
- An address A can fit in one of a subset of slots SS(A)
Eviction
- Freeing a slot for reuse is called eviction
- An eviction policy chooses which slot to evict
- For instance, if new data needs to be cached
- Direct-mapped caches have a trivial eviction policy
- A good eviction policy gets rid of data that isn’t needed in the near future
Eviction and writes
- A write cache may have dirty slots
- A dirty slot contains data newer than underlying storage
- To evict a dirty slot, it must be flushed first
- Write modified data to underlying storage
Eviction policies for fully-associative caches
When a slot is needed, evict the slot that:
- FIFO (First In, First Out)
- …was least recently filled (or assigned)
- Are there others?
Eviction policies for fully-associative caches
When a slot is needed, evict the slot that:
- FIFO (First In, First Out)
- …was least recently filled (or assigned)
- LIFO (Last In, First Out)
- …was most recently filled
- LRU (Least Recently Used)
- MRU (Most Recently Used)
- LFU (Least Frequently Used)
- Priority, aging, …
Question
- Can you come up with reference strings that work well for each of those
policies?
The optimal eviction algorithm
- Bélády’s Optimal Algorithm
- When a slot is needed, evict the slot that:
- …contains the data that will be used farthest in the future
Example
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
? |
|
|
|
|
|
|
|
|
| B |
|
② |
~2~ |
? |
|
|
|
|
|
|
|
|
| C |
|
|
③ |
? |
|
|
|
|
|
|
|
|
Optimal
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
~1~ |
1 |
~1~ |
~1~ |
1 |
~1~ |
③ |
~3~ |
~3~ |
| B |
|
② |
~2~ |
~2~ |
~2~ |
2 |
~2~ |
~2~ |
2 |
~2~ |
④ |
~4~ |
| C |
|
|
③ |
④ |
~4~ |
~4~ |
⑤ |
~5~ |
~5~ |
~5~ |
~5~ |
5 |
Optimal, 3 slots, hit rate 5/12
When can you predict the future?
More slots means better hit rate!
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
④ |
~4~ |
~4~ |
⑤ |
~5~ |
~5~ |
~5~ |
~5~ |
5 |
| B |
|
② |
~2~ |
~2~ |
① |
~1~ |
~1~ |
1 |
~1~ |
③ |
~3~ |
~3~ |
| C |
|
|
③ |
~3~ |
~3~ |
② |
~2~ |
~2~ |
2 |
~2~ |
④ |
~4~ |
FIFO, 3 slots, hit rate 3/12
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
~1~ |
|
|
|
|
|
|
|
|
| B |
|
② |
~2~ |
~2~ |
|
|
|
|
|
|
|
|
| C |
|
|
③ |
~3~ |
|
|
|
|
|
|
|
|
| D |
|
|
|
④ |
|
|
|
|
|
|
|
|
FIFO, 4 slots, hit rate ?
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
~1~ |
1 |
~1~ |
⑤ |
~5~ |
~5~ |
~5~ |
④ |
~4~ |
| B |
|
② |
~2~ |
~2~ |
~2~ |
2 |
~2~ |
① |
~1~ |
~1~ |
~1~ |
⑤ |
| C |
|
|
③ |
~3~ |
~3~ |
~3~ |
~3~ |
~3~ |
② |
~2~ |
~2~ |
~2~ |
| D |
|
|
|
④ |
~4~ |
~4~ |
~4~ |
~4~ |
~4~ |
③ |
~3~ |
~3~ |
FIFO, 4 slots, hit rate 2/12
- Well, maybe not.
- This is called Bélády’s anomaly
More slots means better hit rate for LRU
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
④ |
~4~ |
~4~ |
⑤ |
~5~ |
~5~ |
③ |
~3~ |
~3~ |
| B |
|
② |
~2~ |
~2~ |
① |
~1~ |
~1~ |
1 |
~1~ |
~1~ |
④ |
~4~ |
| C |
|
|
③ |
~3~ |
~3~ |
② |
~2~ |
~2~ |
2 |
~2~ |
~2~ |
⑤ |
LRU, 3 slots, hit rate 2/12
|
1 |
2 |
3 |
4 |
1 |
2 |
5 |
1 |
2 |
3 |
4 |
5 |
| A |
① |
~1~ |
~1~ |
~1~ |
1 |
~1~ |
~1~ |
1 |
~1~ |
~1~ |
~1~ |
⑤ |
| B |
|
② |
~2~ |
~2~ |
~2~ |
2 |
~2~ |
~2~ |
2 |
~2~ |
~2~ |
~2~ |
| C |
|
|
③ |
~3~ |
~3~ |
~3~ |
⑤ |
~5~ |
~5~ |
~5~ |
④ |
~4~ |
| D |
|
|
|
④ |
~4~ |
~4~ |
~4~ |
~4~ |
~4~ |
③ |
~3~ |
~3~ |
LRU, 4 slots, hit rate 4/12
coherence.cc
Forking, piping, and stdio
Special files
/dev/null: Contains nothing
- Any read returns end of file
- Any write succeeds
/dev/urandom: Contains random data
- Any read returns random bytes
/dev/zero: Contains zero bytes
- Any read returns all zeros
Streams
program1 | program2
- Network connections
Write semantics
- Always safe to read more data than requested
- Not necessarily safe to write more data than requested!
- Safe with coherent caches, not safe with incoherent caches
./w-stdiobyterev vs. ./r-stdiobyterev
Memory mapping
mmap system call
r-mmapbyte, w-mmapbyte