Overview
Last time, we discussed memory and addresses, introduced hexdump for looking
at objects, and introduced a performance mystery. This time, we resolve the
mystery and then investigate object representations: sizes,
alignments, and layout rules.
Performance mystery
Recall the mystery we saw last time: a simple integer sorting program
#include <cstdio>
#include <cassert>
#include <list>
#include <vector>
#include "hexdump.hh"
int main() {
    std::list<int> ls; // or std::vector<int>
    // read integers from stdin, storing them in sorted order
    int val;
    while (fscanf(stdin, "%d", &val) == 1) {
        auto it = ls.begin();
        while (it != ls.end() && *it < val) {
            ++it;
        }
        ls.insert(it, val);
    }
    // print integers in sorted order
    for (auto v : ls) {
        fprintf(stdout, "%d\n", v);
    }
}
runs much faster with std::vector (a contiguous array) than std::list (a
linked list)!
How much faster? Here are some results from several runs of ./intgen -n 100000 | time ./intsort > /dev/null (on my desktop in Docker):
| Data structure | Elapsed times | ||
|---|---|---|---|
| std::list | 27.48 sec | 27.45 sec | 27.42 sec | 
| std::vector | 0.73 sec | 0.73 sec | 0.75 sec | 
An investigation
- Let’s print out the integers’ addresses as well!
- The current foreach loop, for (auto v : ls), makes a copy of each element
- To print the actual addresses, we use iterators, which represent
positions into the data structure
for (auto it = ls.begin(); it != ls.end(); ++it) { hexdump_object(*it); }
- Sample output for std::listand./intgen -n 5:556014072f50 00 00 00 00 |....| 556014072ef0 01 00 00 00 |....| 556014072ed0 02 00 00 00 |....| 556014072f30 03 00 00 00 |....| 556014072f10 04 00 00 00 |....|
- Sample start of output for std::listand./intgen -n 50000:56266ed509d0 00 00 00 00 |....| 56266eda8d50 01 00 00 00 |....| 56266ed56b90 02 00 00 00 |....| 56266edd18f0 03 00 00 00 |....| 56266ec6f090 04 00 00 00 |....|
- Sample output for std::vectorand./intgen -n 5:5621d7572f00 00 00 00 00 |....| 5621d7572f04 01 00 00 00 |....| 5621d7572f08 02 00 00 00 |....| 5621d7572f0c 03 00 00 00 |....| 5621d7572f10 04 00 00 00 |....|
- Sample start of output for std::vectorand./intgen -n 50000:7fe0eb6c6010 00 00 00 00 |....| 7fe0eb6c6014 01 00 00 00 |....| 7fe0eb6c6018 02 00 00 00 |....| 7fe0eb6c601c 03 00 00 00 |....| 7fe0eb6c6020 04 00 00 00 |....|
Analysis
- std::vectorelements’ memory addresses are contiguous: next to one another, no gaps
- std::listelements’ memory address are not contiguous: big jumps, bigger the more integers there are
- Contiguous memory is faster to access!
- Cache memory (unit 4)
 
- But why do std::listaddresses bounce around so much?- Discuss!
 
Insertion order
| Data structure | Insertion order | Elapsed times | ||
|---|---|---|---|---|
| std::list | -r(random) | 27.48 sec | 27.45 sec | 27.42 sec | 
| std::vector | -r | 0.73 sec | 0.73 sec | 0.75 sec | 
| std::list | -u(increasing) | 7.43 sec | 7.43 sec | 7.36 sec | 
| std::vector | -u | 1.16 sec | 1.14 sec | 1.15 sec | 
| std::list | -d(decreasing) | 0.01 sec | 0.01 sec | 0.01 sec | 
| std::vector | -d | 0.35 sec | 0.33 sec | 0.34 sec | 
- The elements of a std::listare laid out in memory in increasing order by insertion time!- This means that when the input integers arrive in random order, traversing the list to find an insertion position involves many jumps to increasingly far-away points in memory—very bad for performance.
- But when input integers arrive in sequential order, then adjacent integers have nearby addresses—good for performance!
- Best for performance, though, is when no traversal is necessary at all (-d).
 
- The elements of a std::vectorare laid out in memory in increasing order by position.- This requires moving elements around when inserting new elements, but we make up for that with efficient memory traversal.
- Only when inserting in descending order does std::listwin.
 
- Can we do better still?
Primitive types
We now switch to ground-up exploration of C++ data representation.
C++’s primitive values include integers, floating point numbers, and pointers.
This program, sizes.cc, prints out some of these. What do you see?
#include <cstdio>
#include "hexdump.hh"
int main() {
    char c1 = 61;
    int i1 = 61;
    float f1 = 61;
    int* p1 = &i1;
    printf("c1: %d\n", (int) c1);
    printf("i1: %d\n", i1);
    printf("f1: %g\n", f1);
    printf("p1: %p\n", p1);
    hexdump_object(c1);
    hexdump_object(i1);
    hexdump_object(f1);
    hexdump_object(p1);
}
Primitive type observations
- charis a kind of integer- charis also special in C and C++: any object in memory can be observed as an array of- char1, any many objects can be manipulated as if they are arrays of- char
- We call a chara byte
- hexdumpprints the contents of memory by treating it as an array of- chars
- In modern implementations, a charholds exactly 8 bits
 
- intis a kind of integer- When charandintvariables hold the same values, their representations look similar
- inttakes more memory to represent
 
- When 
- floatis a kind of floating-point number- Same-valued floatandinthave very different representations
 
- Same-valued 
- int*is a kind of pointer- Takes yet more memory to represent
- Value is an address
 
Sizes and sizeof
- Every C++ object has a size
- Every object with the same type has the same size
- The size of an object is the number of bytes required to store the object
- For instance, memory returned by malloc(SIZE)can hold an object of sizeSIZE
 
- For instance, memory returned by 
- The C++ sizeofoperator returns the size of an object
Sizes of primitive types on x86-64
| Type | Size | 
|---|---|
| 
 | 1 | 
| 
 | 2 | 
| 
 | 4 | 
| 
 | 8 | 
| 
 | 8 | 
| 
 | 4 | 
| 
 | 8 | 
| 
 | 16 | 
| 
 | 8 | 
Abstract machine and hardware machine
- Programs are written with reference to a language standard
- The language standard is meant to define exactly how every program behaves when executed
- In some programming languages, such as Python and Java, the language
standard is opaque
- The standard completely defines the meaning of every program
- The same program should behave identically on any hardware
 
- The C and C++ standards are translucent
- The standard partially defines the meaning of a program
- Some aspects of a program can behave differently on different hardware
 
- Example: Sizes of primitive types
- Standard imposes some requirements, compiler can make choices accordingly
 
Standard sizes of primitive types
| Type | x86-64 Linux size | Standard size | 
|---|---|---|
| 
 | 1 | 1 | 
| 
 | 2 | ≥1 | 
| 
 | 4 | ≥ | 
| 
 | 8 | ≥ | 
| 
 | 8 | ≥ | 
| 
 | 4 | ≥4 (probably) | 
| 
 | 8 | ≥ | 
| 
 | 16 | ≥ | 
| 
 | 8 | N/A | 
Alignment
- Hardware and compilers can impose restrictions on the addresses at which
certain types of object can appear
- The address of any intis a multiple of 4 (on x86-64)
- The address of any pointer is a multiple of 8
 
- The address of any 
- This is called alignment
- The C++ alignofoperator returns the alignment of a type or object- All objects of the same type have the same alignment
- alignof(std::max_align_t)is the maximum alignment for any type (16 on x86-64)
 
Alignments of primitive types
| Type | Alignment | 
|---|---|
| 
 | 1 | 
| 
 | 2 | 
| 
 | 4 | 
| 
 | 8 | 
| 
 | 8 | 
| 
 | 4 | 
| 
 | 8 | 
| 
 | 16 | 
| 
 | 8 | 
- Except for alignof(char), these values can change- On some x86-64 operating systems and compilers, alignof(long)= 4!
- On x86-32 Linux, alignof(double)= 4
 
- On some x86-64 operating systems and compilers, 
Compound types (collections)
- C++ offers several ways to construct compound objects from simpler ones
- Compound objects are objects and have sizes
- Compound objects are laid out in memory according to standard rules
Compound type example
#include <cstdio>
#include "hexdump.hh"
int main() {
    int a[2] = {61, 62};
    hexdump_object(a);
    struct {
        int a;
        int b;
        char c;
        char d;
    } s = {61, 62, 63, 64};
    hexdump_object(s);
}
Array rule
- Array members are laid out sequentially in memory with no gaps
- Given T a[N], and 0≤I<N:
a[I]) = addressof(a) + I*sizeof(T)sizeof(a) = N*sizeof(T)Struct rule (?)
- Struct members are laid out sequentially in memory with gaps as required to ensure alignment
- The members of a compound object must obey alignment restrictions
- This imposes alignment requirements on the compound type
- 
See, for example, the C++ standard 6.8.1.2 [basic.types.general]: “the underlying bytes making up [any simple] object can be copied into an array of char, unsigned char, or std::byte (17.2.1). If the content of that array is copied back into the object, the object shall subsequently hold its original value.” ↩︎