Overview
We look more deeply into the performance mystery of intsort
, then explore
some properties of computer integer arithmetic, including undefined behavior.
The intsort
conundrum
datarep-intsort/intsort.cc
reads integers on standard input, and inserts
them into a sorted container
- For instance, a linked list or a vector
- The container is sorted at all times
- Compare linked list and vector performance
- Linked list: one insertion takes O(N) + O(1) time
- O(N) time to find the insertion position
- O(1) time to insert
- Vector: one insertion takes O(N) + O(N) time
- O(N) time to find the insertion position
- O(N) time to insert
- Yet linked list is radically slower!
- 100,000 integers, random order,
SAN=0
, CS 61 grading server
- Vector 1 sec, linked list 42 sec
- 1,000,000 integers: vector 121 sec, linked list 9111 sec
An investigation
-
Let’s print out the integers’ addresses
for (auto& value : ints) {
print_object(value);
}
-
Sample start of output for std::list
and ./intgen -n 50000
:
5626'6ed5'09d0 00 00 00 00
5626'6eda'8d50 01 00 00 00
5626'6ed5'6b90 02 00 00 00
5626'6edd'18f0 03 00 00 00
5626'6ec6'f090 04 00 00 00
-
Sample start of output for std::vector
and ./intgen -n 50000
:
7fe0'eb6c'6010 00 00 00 00
7fe0'eb6c'6014 01 00 00 00
7fe0'eb6c'6018 02 00 00 00
7fe0'eb6c'601c 03 00 00 00
7fe0'eb6c'6020 04 00 00 00
Analysis
std::vector
elements’ memory addresses are contiguous: next to one
another, no gaps
std::list
elements’ memory address are not contiguous: big jumps,
bigger the more integers there are
- Contiguous memory is faster to access!
- But why do
std::list
addresses bounce around so much?
Insertion order
Data structure |
Insertion order |
Elapsed times |
|
|
std::list |
-r (random) |
27.48 sec |
27.45 sec |
27.42 sec |
std::vector |
-r |
0.73 sec |
0.73 sec |
0.75 sec |
std::list |
-u (increasing) |
7.43 sec |
7.43 sec |
7.36 sec |
std::vector |
-u |
1.16 sec |
1.14 sec |
1.15 sec |
std::list |
-d (decreasing) |
0.01 sec |
0.01 sec |
0.01 sec |
std::vector |
-d |
0.35 sec |
0.33 sec |
0.34 sec |
- The elements of a
std::list
are laid out in memory in increasing order
by insertion time!
- This means that when the input integers arrive in random order,
traversing the list to find an insertion position involves many jumps to
increasingly far-away points in memory—very bad for performance.
- But when input integers arrive in sequential order, then adjacent
integers have nearby addresses—good for performance!
- Best for performance, though, is when no traversal is necessary at all (
-d
).
- The elements of a
std::vector
are laid out in memory in increasing order
by position.
- This requires moving elements around when inserting new elements, but we make up
for that with efficient memory traversal.
- Only when inserting in descending order does
std::list
win.
- Can we do better still?
Integer questions
- How fast is integer arithmetic on modern computers?
- There’s a correct answer and a fun answer
- Correct answer: it is fast enough
- Fun answer: it depends on the operation!
- And understanding integer representation lets you speed it up!
Benchmarking operations
int main(int argc, char* argv[]) {
assert(argc == 2);
unsigned long b = std::stoul(argv[1]);
unsigned long n = 1'000'000'000;
volatile unsigned long sum = 0; // `volatile` inhibits optimizations
for (unsigned long a = 0; a != n; ++a) {
sum += a + b;
}
printf("%lu\n", sum);
}
Benchmark results
- Addition
+
, subtraction -
, multiplication *
take about 1 ns
- Division
/
and remainder %
take ~5 ns!
- Division is inherently more complex than the other operations
- Integer division uses iterative algorithms that are hard to parallelize
- But is division always difficult?
Bitwise arithmetic
- Digital computers represent numbers in binary (base 2)
- In addition to the normal arithmetic operations (
+
, -
, *
, /
, %
),
computers support efficient operations that operate on bits
independently
- C and C++ bitwise operators:
~
(unary): complement
&
: bitwise and
|
: bitwise or
^
: bitwise exclusive or
<<
: left shift
>>
: right shift
Truth tables
Beware precedence
- C did not pick the right precedence for bitwise operators
x & 1 != 0
is parsed as x & (1 != 0)
- Most uses of bitwise operators require parentheses
From conventional to bitwise
Fun with bits
- What’s the mathematical intuition behind the expression
u - (u & 15)
?
u
rounded down to the nearest multiple of 16
- Like
(u / 16) * 16
, but likely faster
- Equivalent to
u & ~15
!
- What can you learn about
u
from the value u & (u - 1)
?
- Whether
x
is a power of two. (u & (u - 1)) == 0
iff u
is 0 or a power of 2
Integer overflow
- Computer integers, unlike mathematical integers, have limited size
int
is 32 bits, unsigned long
64 bits
- Very important for performance (computer as assembly line)
- What happens when we add
1
to the largest representable integer?
- Overflow: the result is unrepresentable
- But the computer returns something anyway
- Different consequences for unsigned and signed integers
Unsigned integer overflow
Negative unsigned numbers
-x
is the number where x + (-x) == 0
- So, in unsigned arithmetic,
UINT_MAX == -1
!
Negation and bitwise
- Can you define unary negation
-u
using other operators (bitwise and +
)?
Two’s-complement representation
- Modern computers represent signed integers using two’s complement
- Then unsigned and signed addition and subtraction can use the same instructions
- So
-1
in signed arithmetic has the same representation as UINT_MAX
in unsigned arithmetic!
Signed integer overflow
INT_MAX + 1 == …
INT_MAX + 1 == 💩
- Undefined behavior
- Sanitizer reports error
- Optimizer can assume it never happens
- Computer hardware acts like unsigned overflow
More examples of undefined behaviors
- Dividing an integer by 0
- Using an uninitialized variable
- Violating the live object rule
- Modifying a
const
object
- Accessing data beyond the bounds of an array
- Comparing pointers to unrelated objects using
< <= > >=
- Accessing an invalid iterator
- Accessing an object using a pointer of incompatible type
Using bits
- Bitwise operators allow us to efficiently represent small sets
- A set is a collection of elements, each of which might be present or absent
- An integer is a collection of bits, each of which might be 1 or 0
- Set operations, such as union, intersection, and membership-test, correspond
to bitwise operations
Bitset example
- Consider sets taken from the universe {
e
, f
, g
, h
}
- Assign each element
m
a distinct bit value ℬ(m
) (which must be a power of two)
- For example, ℬ(
e
) = 1
, ℬ(f
) = 2
, ℬ(g
) = 4
, ℬ(h
) = 8
- A set of elements is represented as the sum of the ℬ values
- ℬ({}) =
- ℬ({}) =
0
- ℬ({
e
, g
}) =
- ℬ({
e
, g
}) = ℬ(e
) + ℬ(g
) = 5
- ℬ({
e
, f
, g
, h
}) =
- ℬ({
e
, f
, g
, h
}) = 15
- Union and intersection
- ℬ(X ∪ Y) =
- ℬ(X ∪ Y) = (ℬ(X)
|
ℬ(Y))
- ℬ(X ∩ Y) =
- ℬ(X ∩ Y) = (ℬ(X)
&
ℬ(Y))
- Map to bitwise
&
and |
!
- For example, {
e
, f
} ∪ {e
, h
} = {e
, f
, h
}, and indeed (3 | 9) == 11
- Set difference maps to
&
with ~
- ℬ(X ∖ Y) = (ℬ(X)
&
~
ℬ(Y))
- Membership testing maps to
&
m
∈ X ⟺ (ℬ(m
) &
ℬ(X)) != 0