Overview
Last time, we discussed memory and addresses, introduced hexdump
for looking
at objects, and introduced a performance mystery. This time, we resolve the
mystery and then investigate object representations: sizes,
alignments, and layout rules.
Performance mystery
Recall the mystery we saw last time: a simple integer sorting program
#include <cstdio>
#include <cassert>
#include <list>
#include <vector>
#include "hexdump.hh"
int main() {
std::list<int> ls; // or std::vector<int>
// read integers from stdin, storing them in sorted order
int val;
while (fscanf(stdin, "%d", &val) == 1) {
auto it = ls.begin();
while (it != ls.end() && *it < val) {
++it;
}
ls.insert(it, val);
}
// print integers in sorted order
for (auto v : ls) {
fprintf(stdout, "%d\n", v);
}
}
runs much faster with std::vector
(a contiguous array) than std::list
(a
linked list)!
How much faster? Here are some results from several runs of ./intgen -n 100000 | time ./intsort > /dev/null
(on my desktop in Docker):
Data structure | Elapsed times | ||
---|---|---|---|
std::list |
27.48 sec | 27.45 sec | 27.42 sec |
std::vector |
0.73 sec | 0.73 sec | 0.75 sec |
An investigation
- Let’s print out the integers’ addresses as well!
- The current foreach loop,
for (auto v : ls)
, makes a copy of each element - To print the actual addresses, we use iterators, which represent
positions into the data structure
for (auto it = ls.begin(); it != ls.end(); ++it) { hexdump_object(*it); }
- Sample output for
std::list
and./intgen -n 5
:556014072f50 00 00 00 00 |....| 556014072ef0 01 00 00 00 |....| 556014072ed0 02 00 00 00 |....| 556014072f30 03 00 00 00 |....| 556014072f10 04 00 00 00 |....|
- Sample start of output for
std::list
and./intgen -n 50000
:56266ed509d0 00 00 00 00 |....| 56266eda8d50 01 00 00 00 |....| 56266ed56b90 02 00 00 00 |....| 56266edd18f0 03 00 00 00 |....| 56266ec6f090 04 00 00 00 |....|
- Sample output for
std::vector
and./intgen -n 5
:5621d7572f00 00 00 00 00 |....| 5621d7572f04 01 00 00 00 |....| 5621d7572f08 02 00 00 00 |....| 5621d7572f0c 03 00 00 00 |....| 5621d7572f10 04 00 00 00 |....|
- Sample start of output for
std::vector
and./intgen -n 50000
:7fe0eb6c6010 00 00 00 00 |....| 7fe0eb6c6014 01 00 00 00 |....| 7fe0eb6c6018 02 00 00 00 |....| 7fe0eb6c601c 03 00 00 00 |....| 7fe0eb6c6020 04 00 00 00 |....|
Analysis
std::vector
elements’ memory addresses are contiguous: next to one another, no gapsstd::list
elements’ memory address are not contiguous: big jumps, bigger the more integers there are- Contiguous memory is faster to access!
- Cache memory (unit 4)
- But why do
std::list
addresses bounce around so much?- Discuss!
Insertion order
Data structure | Insertion order | Elapsed times | ||
---|---|---|---|---|
std::list |
-r (random) |
27.48 sec | 27.45 sec | 27.42 sec |
std::vector |
-r |
0.73 sec | 0.73 sec | 0.75 sec |
std::list |
-u (increasing) |
7.43 sec | 7.43 sec | 7.36 sec |
std::vector |
-u |
1.16 sec | 1.14 sec | 1.15 sec |
std::list |
-d (decreasing) |
0.01 sec | 0.01 sec | 0.01 sec |
std::vector |
-d |
0.35 sec | 0.33 sec | 0.34 sec |
- The elements of a
std::list
are laid out in memory in increasing order by insertion time!- This means that when the input integers arrive in random order, traversing the list to find an insertion position involves many jumps to increasingly far-away points in memory—very bad for performance.
- But when input integers arrive in sequential order, then adjacent integers have nearby addresses—good for performance!
- Best for performance, though, is when no traversal is necessary at all (
-d
).
- The elements of a
std::vector
are laid out in memory in increasing order by position.- This requires moving elements around when inserting new elements, but we make up for that with efficient memory traversal.
- Only when inserting in descending order does
std::list
win.
- Can we do better still?
Primitive types
We now switch to ground-up exploration of C++ data representation.
C++’s primitive values include integers, floating point numbers, and pointers.
This program, sizes.cc
, prints out some of these. What do you see?
#include <cstdio>
#include "hexdump.hh"
int main() {
char c1 = 61;
int i1 = 61;
float f1 = 61;
int* p1 = &i1;
printf("c1: %d\n", (int) c1);
printf("i1: %d\n", i1);
printf("f1: %g\n", f1);
printf("p1: %p\n", p1);
hexdump_object(c1);
hexdump_object(i1);
hexdump_object(f1);
hexdump_object(p1);
}
Primitive type observations
char
is a kind of integerchar
is also special in C and C++: any object in memory can be observed as an array ofchar
1, any many objects can be manipulated as if they are arrays ofchar
- We call a
char
a byte hexdump
prints the contents of memory by treating it as an array ofchar
s- In modern implementations, a
char
holds exactly 8 bits
int
is a kind of integer- When
char
andint
variables hold the same values, their representations look similar int
takes more memory to represent
- When
float
is a kind of floating-point number- Same-valued
float
andint
have very different representations
- Same-valued
int*
is a kind of pointer- Takes yet more memory to represent
- Value is an address
Sizes and sizeof
- Every C++ object has a size
- Every object with the same type has the same size
- The size of an object is the number of bytes required to store the object
- For instance, memory returned by
malloc(SIZE)
can hold an object of sizeSIZE
- For instance, memory returned by
- The C++
sizeof
operator returns the size of an object
Sizes of primitive types on x86-64
Type |
Size |
---|---|
|
1 |
|
2 |
|
4 |
|
8 |
|
8 |
|
4 |
|
8 |
|
16 |
|
8 |
Abstract machine and hardware machine
- Programs are written with reference to a language standard
- The language standard is meant to define exactly how every program behaves when executed
- In some programming languages, such as Python and Java, the language
standard is opaque
- The standard completely defines the meaning of every program
- The same program should behave identically on any hardware
- The C and C++ standards are translucent
- The standard partially defines the meaning of a program
- Some aspects of a program can behave differently on different hardware
- Example: Sizes of primitive types
- Standard imposes some requirements, compiler can make choices accordingly
Standard sizes of primitive types
Type |
x86-64 Linux size |
Standard size |
---|---|---|
|
1 |
1 |
|
2 |
≥1 |
|
4 |
≥ |
|
8 |
≥ |
|
8 |
≥ |
|
4 |
≥4 (probably) |
|
8 |
≥ |
|
16 |
≥ |
|
8 |
N/A |
Alignment
- Hardware and compilers can impose restrictions on the addresses at which
certain types of object can appear
- The address of any
int
is a multiple of 4 (on x86-64) - The address of any pointer is a multiple of 8
- The address of any
- This is called alignment
- The C++
alignof
operator returns the alignment of a type or object- All objects of the same type have the same alignment
alignof(std::max_align_t)
is the maximum alignment for any type (16 on x86-64)
Alignments of primitive types
Type |
Alignment |
---|---|
|
1 |
|
2 |
|
4 |
|
8 |
|
8 |
|
4 |
|
8 |
|
16 |
|
8 |
- Except for
alignof(char)
, these values can change- On some x86-64 operating systems and compilers,
alignof(long)
= 4! - On x86-32 Linux,
alignof(double)
= 4
- On some x86-64 operating systems and compilers,
Compound types (collections)
- C++ offers several ways to construct compound objects from simpler ones
- Compound objects are objects and have sizes
- Compound objects are laid out in memory according to standard rules
Compound type example
#include <cstdio>
#include "hexdump.hh"
int main() {
int a[2] = {61, 62};
hexdump_object(a);
struct {
int a;
int b;
char c;
char d;
} s = {61, 62, 63, 64};
hexdump_object(s);
}
Array rule
- Array members are laid out sequentially in memory with no gaps
- Given
T a[N]
, and 0≤I
<N
:
a[I]
) = addressof(a
) + I
*sizeof(T)
sizeof(a)
= N
*sizeof(T)
Struct rule (?)
- Struct members are laid out sequentially in memory with gaps as required to ensure alignment
- The members of a compound object must obey alignment restrictions
- This imposes alignment requirements on the compound type
-
See, for example, the C++ standard 6.8.1.2 [basic.types.general]: “the underlying bytes making up [any simple] object can be copied into an array of char, unsigned char, or std::byte (17.2.1). If the content of that array is copied back into the object, the object shall subsequently hold its original value.” ↩︎