Data representation 4: Performance, computer arithmetic

Overview

We look more deeply into the performance mystery of intsort, then explore some properties of computer integer arithmetic, including undefined behavior.

The `intsort` conundrum

datarep-intsort/intsort.cc reads integers on standard input, and inserts them into a sorted container
- For instance, a linked list or a vector
- The container is sorted at all times
Compare linked list and vector performance
Linked list: one insertion takes O(N) + O(1) time
- O(N) time to find the insertion position
- O(1) time to insert
Vector: one insertion takes O(N) + O(N) time
- O(N) time to find the insertion position
- O(N) time to insert
Yet linked list is radically slower!
- 100,000 integers, random order, SAN=0, CS 61 grading server
- Vector 1 sec, linked list 42 sec
- 1,000,000 integers: vector 121 sec, linked list 9111 sec

An investigation

Let’s print out the integers’ addresses

for (auto& value : ints) {
    print_object(value);
}

Sample start of output for std::list and ./intgen -n 50000:

5626'6ed5'09d0  00 00 00 00
5626'6eda'8d50  01 00 00 00
5626'6ed5'6b90  02 00 00 00
5626'6edd'18f0  03 00 00 00
5626'6ec6'f090  04 00 00 00

Sample start of output for std::vector and ./intgen -n 50000:

7fe0'eb6c'6010  00 00 00 00
7fe0'eb6c'6014  01 00 00 00
7fe0'eb6c'6018  02 00 00 00
7fe0'eb6c'601c  03 00 00 00
7fe0'eb6c'6020  04 00 00 00

Analysis

std::vector elements’ memory addresses are contiguous: next to one another, no gaps
std::list elements’ memory address are not contiguous: big jumps, bigger the more integers there are
Contiguous memory is faster to access!
- Cache memory (unit 4)
But why do std::list addresses bounce around so much?
- Discuss!

Insertion order

Data structure	Insertion order	Elapsed times
`std::list`	`-r` (random)	27.48 sec	27.45 sec	27.42 sec
`std::vector`	`-r`	0.73 sec	0.73 sec	0.75 sec
`std::list`	`-u` (increasing)	7.43 sec	7.43 sec	7.36 sec
`std::vector`	`-u`	1.16 sec	1.14 sec	1.15 sec
`std::list`	`-d` (decreasing)	0.01 sec	0.01 sec	0.01 sec
`std::vector`	`-d`	0.35 sec	0.33 sec	0.34 sec

The elements of a std::list are laid out in memory in increasing order by insertion time!
- This means that when the input integers arrive in random order, traversing the list to find an insertion position involves many jumps to increasingly far-away points in memory—very bad for performance.
- But when input integers arrive in sequential order, then adjacent integers have nearby addresses—good for performance!
- Best for performance, though, is when no traversal is necessary at all (-d).
The elements of a std::vector are laid out in memory in increasing order by position.
- This requires moving elements around when inserting new elements, but we make up for that with efficient memory traversal.
- Only when inserting in descending order does std::list win.
Can we do better still?

Integer questions

How fast is integer arithmetic on modern computers?
There’s a correct answer and a fun answer
Correct answer: it is fast enough
Fun answer: it depends on the operation!
- And understanding integer representation lets you speed it up!

Benchmarking operations

datarep-arith/opbench.cc

int main(int argc, char* argv[]) {
    assert(argc == 2);
    unsigned long b = std::stoul(argv[1]);
    unsigned long n = 1'000'000'000;

    volatile unsigned long sum = 0;  // `volatile` inhibits optimizations
    for (unsigned long a = 0; a != n; ++a) {
      sum += a + b;
    }

    printf("%lu\n", sum);
}

Benchmark results

Addition +, subtraction -, multiplication * take about 1 ns
Division / and remainder % take ~5 ns!
- Division is inherently more complex than the other operations
- Integer division uses iterative algorithms that are hard to parallelize
But is division always difficult?

Bitwise arithmetic

Digital computers represent numbers in binary (base 2)
In addition to the normal arithmetic operations (+, -, *, /, %), computers support efficient operations that operate on bits independently
C and C++ bitwise operators:
- ~ (unary): complement
- &: bitwise and
- |: bitwise or
- ^: bitwise exclusive or
- <<: left shift
- >>: right shift

Truth tables

`~`	output
0	1
1	0

`&`	0	1
0	0	0
1	0	1

`\|`	0	1
0	0	1
1	1	1

`^`	0	1
0	0	1
1	1	0

Beware precedence

C did not pick the right precedence for bitwise operators
x & 1 != 0 is parsed as x & (1 != 0)
Most uses of bitwise operators require parentheses

From conventional to bitwise

u * 2?
- u << 1
u / 2?
- u >> 1
u % 2?
- u & 1

Fun with bits

What’s the mathematical intuition behind the expression u - (u & 15)?
- u rounded down to the nearest multiple of 16
- Like (u / 16) * 16, but likely faster
- Equivalent to u & ~15!
What can you learn about u from the value u & (u - 1)?
- Whether x is a power of two. (u & (u - 1)) == 0 iff u is 0 or a power of 2

Integer overflow

Computer integers, unlike mathematical integers, have limited size
- int is 32 bits, unsigned long 64 bits
- Very important for performance (computer as assembly line)
What happens when we add 1 to the largest representable integer?
- Overflow: the result is unrepresentable
- But the computer returns something anyway
- Different consequences for unsigned and signed integers

Unsigned integer overflow

UINT_MAX + 1 == 0

Negative unsigned numbers

-x is the number where x + (-x) == 0
So, in unsigned arithmetic, UINT_MAX == -1!

Negation and bitwise

Can you define unary negation -u using other operators (bitwise and +)?
- -u ≡ ~u + 1

Two’s-complement representation

Modern computers represent signed integers using two’s complement
Then unsigned and signed addition and subtraction can use the same instructions
So -1 in signed arithmetic has the same representation as UINT_MAX in unsigned arithmetic!

Signed integer overflow

INT_MAX + 1 == …
INT_MAX + 1 == 💩
- Undefined behavior
- Sanitizer reports error
- Optimizer can assume it never happens
- Computer hardware acts like unsigned overflow

More examples of undefined behaviors

Dividing an integer by 0
Using an uninitialized variable
Violating the live object rule
Modifying a const object
Accessing data beyond the bounds of an array
Comparing pointers to unrelated objects using < <= > >=
Accessing an invalid iterator
Accessing an object using a pointer of incompatible type

Using bits

Bitwise operators allow us to efficiently represent small sets
A set is a collection of elements, each of which might be present or absent
An integer is a collection of bits, each of which might be 1 or 0
Set operations, such as union, intersection, and membership-test, correspond to bitwise operations

Bitset example

Consider sets taken from the universe {e, f, g, h}
Assign each element m a distinct bit value ℬ(m) (which must be a power of two)
- For example, ℬ(e) = 1, ℬ(f) = 2, ℬ(g) = 4, ℬ(h) = 8
A set of elements is represented as the sum of the ℬ values
- ℬ({}) =
- ℬ({}) = 0
- ℬ({e, g}) =
- ℬ({e, g}) = ℬ(e) + ℬ(g) = 5
- ℬ({e, f, g, h}) =
- ℬ({e, f, g, h}) = 15
Union and intersection
- ℬ(X ∪ Y) =
- ℬ(X ∪ Y) = (ℬ(X) | ℬ(Y))
- ℬ(X ∩ Y) =
- ℬ(X ∩ Y) = (ℬ(X) & ℬ(Y))
- Map to bitwise & and |!
- For example, {e, f} ∪ {e, h} = {e, f, h}, and indeed (3 | 9) == 11
Set difference maps to & with ~
- ℬ(X ∖ Y) = (ℬ(X) & ~ℬ(Y))
Membership testing maps to &
- m ∈ X ⟺ (ℬ(m) & ℬ(X)) != 0