Systems Programming and Machine Organization
We think a lot in computer science about costs: time cost, space cost, memory efficiency. And these costs fundamentally shape the kinds of solutions we build. But you may not think as often about financial costs! These also fundamentally shape the systems we build—in some ways more fundamentally—and the costs of storage technologies have changed at least as dramatically as their capacities and speeds.
Costs per megabyte of storage technologies (constant 2010 dollars)
Cost per byte relative to that of a hard disk in ~2017:
These are initial purchase costs & don’t include maintenance and power. DRAM uses more energy than hard disks, which use more than flash/SSD. (Prices from here and here. $1.00 in 1955 is $8.14 in 2010 dollars.)
Latency and throughput
Latency means the time it takes to complete an request.
Throughput means the number of requests that can be completed per unit of time.
When we’re talking about storage, latency is typically measured in seconds and throughput in (mega)bytes per second, or requests per second.
Latency and throughput clearly have a relationship, but they are not simple inverses: a system can have high latency and high throughput. For example, consider a disk that can handle 1 request at a time, and takes 1 second to handle a request. This disk has latency 1 sec and throughput 1 B/sec. But we can improve its throughput without changing its latency in several ways. For instance:
- We can let a request handle up to 5000 bytes of data. Latency doesn’t change, but throughput is now 5000 B/sec. This strategy—allowing a single request to do multiple requests’ worth of work—is called batching.
- We can let the disk handle up to 5000 simultaneous requests. This has the same non-effect on latency, and the same effect on throughput. This strategy—allowing multiple requests to happen at the same time—is called parallelism.
The storage hierarchy
This is the list of storage media typically available to modern computers. Sizes vary; these sizes are based on those for my desktop iMac. Storage closer to the top of the list is faster, smaller, and more expensive; storage closer to the bottom is larger, slower, and less expensive.
- Registers: ~1KB. An x86-64 has tens of registers, with ~1KB of aggregate storage (each register is of course smaller).
- Processor caches: ~9MB. This uses SRAM technology, which is faster and more expensive than primary memory. It’s typically divided into levels; for instance:
- Level 1 cache (128KB total, 32KB/core) is divided into separate units, one per core. This restriction makes access faster, since there's no need to arbitrate between cores.
- Level 2 cache (512KB total, 256KB/socket) is shared across pairs of cores.
- Level 3 cache (8MB) is shared all cores.
- Primary memory: 8GB. This is the main storage location of data for running programs. It uses DRAM technology.
- Solid-state drives (SSDs), also called flash drives (since they use flash memory): ~512GB.
- Hard disk drives (HDDs): ~2TB or more.
And then there are yet lower levels in distributed systems. For instance, Google and Facebook have petabytes or exabytes of data, distributed among many computers. You can think of this as another storage layer, networked storage, that is typically slower to access than HDDs.
The two lowest levels of this hierarchy, SSDs and hard disks, are called stable storage, because they keep their data when the power goes out. The upper levels of the hierarchy (primary memory, processor caches, and registers) do not preserve data across power outages, and are thus called volatile. Watch the Mona Lisa disappear from memory after the power is cut!
Rough speeds of storage technologies
A typical recent disk might rotate at 7200 RPM, have ~9ms average seek time, ~4ms average rotational latency, and 210MB/s maximum sustained transfer rate. (CS:APP3e)
To load 4 pages (that is, 4x4KB = 16KB) from disk takes ~8ms.
In general terms, a cache is a small amount of fast storage, used to optimize access to slower storage.
We see many caches in the course. We’ve already discussed the processor cache. This unit focuses on the buffer cache, which is the operating system’s cache in primary memory of data from disks, and the stdio cache, which is a process’s cache in its primary memory of data from the buffer cache.
The strace tool lets us look at a program’s system calls, including their arguments and return values. Example use:
strace -o strace.out PROGRAM ARG...
The blktrace tool lets us examine how the operating system manipulates a disk. Its output is not human readable; the blkparse tool prints out a (barely) human-readable version of a given block trace. Example use:
$ sudo blktrace DISKNAME & $ PROGRAM ... $ fg [hit Control-C to quit blktrace] $ blkparse DISKNAME | less