Data representation 2: Sizes and layout

Overview

Last time, we discussed memory and addresses, introduced hexdump for looking at objects, and introduced a performance mystery. This time, we resolve the mystery and then investigate object representations: sizes, alignments, and layout rules.

Performance mystery

Recall the mystery we saw last time: a simple integer sorting program

datarep2/intsort.cc

#include <cstdio>
#include <cassert>
#include <list>
#include <vector>
#include "hexdump.hh"

int main() {
    std::list<int> ls; // or std::vector<int>

    // read integers from stdin, storing them in sorted order
    int val;
    while (fscanf(stdin, "%d", &val) == 1) {
        auto it = ls.begin();
        while (it != ls.end() && *it < val) {
            ++it;
        }
        ls.insert(it, val);
    }

    // print integers in sorted order
    for (auto v : ls) {
        fprintf(stdout, "%d\n", v);
    }
}

runs much faster with std::vector (a contiguous array) than std::list (a linked list)!

How much faster? Here are some results from several runs of ./intgen -n 100000 | time ./intsort > /dev/null (on my desktop in Docker):

Data structure	Elapsed times
`std::list`	27.48 sec	27.45 sec	27.42 sec
`std::vector`	0.73 sec	0.73 sec	0.75 sec

An investigation

Let’s print out the integers’ addresses as well!
The current foreach loop, for (auto v : ls), makes a copy of each element
To print the actual addresses, we use iterators, which represent positions into the data structure
```
for (auto it = ls.begin(); it != ls.end(); ++it) {
    hexdump_object(*it);
}
```

Sample output for std::list and ./intgen -n 5:

556014072f50  00 00 00 00                                       |....|
556014072ef0  01 00 00 00                                       |....|
556014072ed0  02 00 00 00                                       |....|
556014072f30  03 00 00 00                                       |....|
556014072f10  04 00 00 00                                       |....|

Sample start of output for std::list and ./intgen -n 50000:

56266ed509d0  00 00 00 00                                       |....|
56266eda8d50  01 00 00 00                                       |....|
56266ed56b90  02 00 00 00                                       |....|
56266edd18f0  03 00 00 00                                       |....|
56266ec6f090  04 00 00 00                                       |....|

Sample output for std::vector and ./intgen -n 5:

5621d7572f00  00 00 00 00                                       |....|
5621d7572f04  01 00 00 00                                       |....|
5621d7572f08  02 00 00 00                                       |....|
5621d7572f0c  03 00 00 00                                       |....|
5621d7572f10  04 00 00 00                                       |....|

Sample start of output for std::vector and ./intgen -n 50000:

7fe0eb6c6010  00 00 00 00                                       |....|
7fe0eb6c6014  01 00 00 00                                       |....|
7fe0eb6c6018  02 00 00 00                                       |....|
7fe0eb6c601c  03 00 00 00                                       |....|
7fe0eb6c6020  04 00 00 00                                       |....|

Analysis

std::vector elements’ memory addresses are contiguous: next to one another, no gaps
std::list elements’ memory address are not contiguous: big jumps, bigger the more integers there are
Contiguous memory is faster to access!
- Cache memory (unit 4)
But why do std::list addresses bounce around so much?
- Discuss!

Insertion order

Data structure	Insertion order	Elapsed times
`std::list`	`-r` (random)	27.48 sec	27.45 sec	27.42 sec
`std::vector`	`-r`	0.73 sec	0.73 sec	0.75 sec
`std::list`	`-u` (increasing)	7.43 sec	7.43 sec	7.36 sec
`std::vector`	`-u`	1.16 sec	1.14 sec	1.15 sec
`std::list`	`-d` (decreasing)	0.01 sec	0.01 sec	0.01 sec
`std::vector`	`-d`	0.35 sec	0.33 sec	0.34 sec

The elements of a std::list are laid out in memory in increasing order by insertion time!
- This means that when the input integers arrive in random order, traversing the list to find an insertion position involves many jumps to increasingly far-away points in memory—very bad for performance.
- But when input integers arrive in sequential order, then adjacent integers have nearby addresses—good for performance!
- Best for performance, though, is when no traversal is necessary at all (-d).
The elements of a std::vector are laid out in memory in increasing order by position.
- This requires moving elements around when inserting new elements, but we make up for that with efficient memory traversal.
- Only when inserting in descending order does std::list win.
Can we do better still?

Primitive types

We now switch to ground-up exploration of C++ data representation.

C++’s primitive values include integers, floating point numbers, and pointers. This program, sizes.cc, prints out some of these. What do you see?

datarep2/sizes.cc

#include <cstdio>
#include "hexdump.hh"

int main() {
    char c1 = 61;
    int i1 = 61;
    float f1 = 61;
    int* p1 = &i1;

    printf("c1: %d\n", (int) c1);
    printf("i1: %d\n", i1);
    printf("f1: %g\n", f1);
    printf("p1: %p\n", p1);

    hexdump_object(c1);
    hexdump_object(i1);
    hexdump_object(f1);
    hexdump_object(p1);
}

Primitive type observations

char is a kind of integer
- char is also special in C and C++: any object in memory can be observed as an array of char¹, any many objects can be manipulated as if they are arrays of char
- We call a char a byte
- hexdump prints the contents of memory by treating it as an array of chars
- In modern implementations, a char holds exactly 8 bits
int is a kind of integer
- When char and int variables hold the same values, their representations look similar
- int takes more memory to represent
float is a kind of floating-point number
- Same-valued float and int have very different representations
int* is a kind of pointer
- Takes yet more memory to represent
- Value is an address

Sizes and `sizeof`

Every C++ object has a size
Every object with the same type has the same size
The size of an object is the number of bytes required to store the object
- For instance, memory returned by malloc(SIZE) can hold an object of size SIZE
The C++ sizeof operator returns the size of an object

Sizes of primitive types on x86-64

Type	Size
`char`	1
`short`	2
`int`	4
`long`	8
`long long`	8
`float`	4
`double`	8
`long double`	16
`T*`	8

Abstract machine and hardware machine

Programs are written with reference to a language standard
The language standard is meant to define exactly how every program behaves when executed
In some programming languages, such as Python and Java, the language standard is opaque
- The standard completely defines the meaning of every program
- The same program should behave identically on any hardware
The C and C++ standards are translucent
- The standard partially defines the meaning of a program
- Some aspects of a program can behave differently on different hardware
Example: Sizes of primitive types
- Standard imposes some requirements, compiler can make choices accordingly

Standard sizes of primitive types

Type	x86-64 Linux size	Standard size
`char`	1	1
`short`	2	≥1
`int`	4	≥`sizeof(short)`
`long`	8	≥`sizeof(int)`
`long long`	8	≥`sizeof(long)`
`float`	4	≥4 (probably)
`double`	8	≥`sizeof(float)`, ≥8 (probably)
`long double`	16	≥`sizeof(double)`
`T*`	8	N/A

Alignment

Hardware and compilers can impose restrictions on the addresses at which certain types of object can appear
- The address of any int is a multiple of 4 (on x86-64)
- The address of any pointer is a multiple of 8
This is called alignment
The C++ alignof operator returns the alignment of a type or object
- All objects of the same type have the same alignment
- alignof(std::max_align_t) is the maximum alignment for any type (16 on x86-64)

Alignments of primitive types

Type	Alignment
`char`	1
`short`	2
`int`	4
`long`	8
`long long`	8
`float`	4
`double`	8
`long double`	16
`T*`	8

Except for alignof(char), these values can change
- On some x86-64 operating systems and compilers, alignof(long) = 4!
- On x86-32 Linux, alignof(double) = 4

Compound types (collections)

C++ offers several ways to construct compound objects from simpler ones
Compound objects are objects and have sizes
Compound objects are laid out in memory according to standard rules

Compound type example

datarep2/compounds.cc

#include <cstdio>
#include "hexdump.hh"

int main() {
    int a[2] = {61, 62};
    hexdump_object(a);

    struct {
        int a;
        int b;
        char c;
        char d;
    } s = {61, 62, 63, 64};
    hexdump_object(s);
}

Array rule

Array members are laid out sequentially in memory with no gaps
Given T a[N], and 0≤I<N:

addressof(a[I]) = addressof(a) + I*sizeof(T)

sizeof(a) = N*sizeof(T)

Struct rule (?)

Struct members are laid out sequentially in memory with gaps as required to ensure alignment
The members of a compound object must obey alignment restrictions
This imposes alignment requirements on the compound type

See, for example, the C++ standard 6.8.1.2 [basic.types.general]: “the underlying bytes making up [any simple] object can be copied into an array of char, unsigned char, or std::byte (17.2.1). If the content of that array is copied back into the object, the object shall subsequently hold its original value.” ↩︎