Ancient CS 61 Content Warning!!!!!1!!!
This is not the current version of the class.
This site was automatically translated from a wiki. The translation may have introduced mistakes (and the content might have been wrong to begin with).

Storage 1

We think a lot in computer science about costs: time cost, space cost, memory efficiency. And these costs fundamentally shape the kinds of solutions we build. But you may not think as often about financial costs! These also fundamentally shape the systems we build—in some ways more fundamentally—and the costs of storage technologies have changed at least as dramatically as their capacities and speeds.

Costs per megabyte of storage technologies (constant 2010 dollars)

Year Memory Flash/SSD Hard disk
~1955 $411,000,000.00 $6,233.00
1970 $734,000.00 $260.00
1990 $148.20 $5.45
2003 $0.09 $0.305 $0.00132
2010 $0.019 $0.00244 $0.000073
~2017 $0.0049 $0.00023 $0.000022

Cost per byte relative to that of a hard disk in ~2017:

Year Memory Flash/SSD Hard disk
~1955 18,600,000,000,000 280,000,000
1970 33,000,000,000 12,000,000
1990 6,700,000 250,000
2003 4,100 14,000 60
2010 860 110 3.3
~2017 220 10 1

These are initial purchase costs & don’t include maintenance and power. DRAM uses more energy than hard disks, which use more than flash/SSD. (Prices from here and here. $1.00 in 1955 is $8.14 in 2010 dollars.)

Latency and throughput

Latency means the time it takes to complete an request.

Throughput means the number of requests that can be completed per unit of time.

When we’re talking about storage, latency is typically measured in seconds and throughput in (mega)bytes per second, or requests per second.

Latency and throughput clearly have a relationship, but they are not simple inverses: a system can have high latency and high throughput. For example, consider a disk that can handle 1 request at a time, and takes 1 second to handle a request. This disk has latency 1 sec and throughput 1 B/sec. But we can improve its throughput without changing its latency in several ways. For instance:

The storage hierarchy

This is the list of storage media typically available to modern computers. Sizes vary; these sizes are based on those for my desktop iMac. Storage closer to the top of the list is faster, smaller, and more expensive; storage closer to the bottom is larger, slower, and less expensive.

And then there are yet lower levels in distributed systems. For instance, Google and Facebook have petabytes or exabytes of data, distributed among many computers. You can think of this as another storage layer, networked storage, that is typically slower to access than HDDs.

The two lowest levels of this hierarchy, SSDs and hard disks, are called stable storage, because they keep their data when the power goes out. The upper levels of the hierarchy (primary memory, processor caches, and registers) do not preserve data across power outages, and are thus called volatile. Watch the Mona Lisa disappear from memory after the power is cut!

Rough speeds of storage technologies

Type

Latency

Throughput
(sequential access)

Throughput
(random access)

Register

0.5ns

SRAM

4ns

DRAM

60ns

Flash (read)

50µs

550MB/s

365MB/s

Flash (write)

60µs

470MB/s

303MB/s

Hard disk

~4-9ms

156MB/s

~1MB/s

A typical recent disk might rotate at 7200 RPM, have ~9ms average seek time, ~4ms average rotational latency, and 210MB/s maximum sustained transfer rate. (CS:APP3e)

To load 4 pages (that is, 4x4KB = 16KB) from disk takes ~8ms.

Latency Numbers Every Programmer Should Know

Caching

In general terms, a cache is a small amount of fast storage, used to optimize access to slower storage.

We see many caches in the course. We’ve already discussed the processor cache. This unit focuses on the buffer cache, which is the operating system’s cache in primary memory of data from disks, and the stdio cache, which is a process’s cache in its primary memory of data from the buffer cache.

Tools

The strace tool lets us look at a program’s system calls, including their arguments and return values. Example use: strace -o strace.out PROGRAM ARG...

The blktrace tool lets us examine how the operating system manipulates a disk. Its output is not human readable; the blkparse tool prints out a (barely) human-readable version of a given block trace. Example use:

$ sudo blktrace DISKNAME &
$ PROGRAM ...
$ fg
[hit Control-C to quit blktrace]
$ blkparse DISKNAME | less