This is not the current version of the class.

Data representation 2: Sizes and layout

Overview

Last time, we discussed memory and addresses, introduced hexdump for looking at objects, and introduced a performance mystery. This time, we resolve the mystery and then investigate object representations: sizes, alignments, and layout rules.

Performance mystery

Recall the mystery we saw last time: a simple integer sorting program

#include <cstdio>
#include <cassert>
#include <list>
#include <vector>
#include "hexdump.hh"

int main() {
    std::list<int> ls; // or std::vector<int>

    // read integers from stdin, storing them in sorted order
    int val;
    while (fscanf(stdin, "%d", &val) == 1) {
        auto it = ls.begin();
        while (it != ls.end() && *it < val) {
            ++it;
        }
        ls.insert(it, val);
    }

    // print integers in sorted order
    for (auto v : ls) {
        fprintf(stdout, "%d\n", v);
    }
}

runs much faster with std::vector (a contiguous array) than std::list (a linked list)!

How much faster? Here are some results from several runs of ./intgen -n 100000 | time ./intsort > /dev/null (on my desktop in Docker):

Data structure Elapsed times
std::list 27.48 sec 27.45 sec 27.42 sec
std::vector 0.73 sec 0.73 sec 0.75 sec

An investigation

Analysis

Insertion order

Data structure Insertion order Elapsed times
std::list -r (random) 27.48 sec 27.45 sec 27.42 sec
std::vector -r 0.73 sec 0.73 sec 0.75 sec
std::list -u (increasing) 7.43 sec 7.43 sec 7.36 sec
std::vector -u 1.16 sec 1.14 sec 1.15 sec
std::list -d (decreasing) 0.01 sec 0.01 sec 0.01 sec
std::vector -d 0.35 sec 0.33 sec 0.34 sec

Primitive types

We now switch to ground-up exploration of C++ data representation.

C++’s primitive values include integers, floating point numbers, and pointers. This program, sizes.cc, prints out some of these. What do you see?

#include <cstdio>
#include "hexdump.hh"

int main() {
    char c1 = 61;
    int i1 = 61;
    float f1 = 61;
    int* p1 = &i1;

    printf("c1: %d\n", (int) c1);
    printf("i1: %d\n", i1);
    printf("f1: %g\n", f1);
    printf("p1: %p\n", p1);

    hexdump_object(c1);
    hexdump_object(i1);
    hexdump_object(f1);
    hexdump_object(p1);
}

Primitive type observations

Sizes and sizeof

Sizes of primitive types on x86-64

Type

Size

char

1

short

2

int

4

long

8

long long

8

float

4

double

8

long double

16

T*

8

Abstract machine and hardware machine

Standard sizes of primitive types

Type

x86-64 Linux size

Standard size

char

1

1

short

2

≥1

int

4

sizeof(short)

long

8

sizeof(int)

long long

8

sizeof(long)

float

4

≥4 (probably)

double

8

sizeof(float), ≥8 (probably)

long double

16

sizeof(double)

T*

8

N/A

Alignment

Alignments of primitive types

Type

Alignment

char

1

short

2

int

4

long

8

long long

8

float

4

double

8

long double

16

T*

8

Compound types (collections)

Compound type example

#include <cstdio>
#include "hexdump.hh"

int main() {
    int a[2] = {61, 62};
    hexdump_object(a);

    struct {
        int a;
        int b;
        char c;
        char d;
    } s = {61, 62, 63, 64};
    hexdump_object(s);
}

Array rule

addressof(a[I]) = addressof(a) + I*sizeof(T)
sizeof(a) = N*sizeof(T)

Struct rule (?)


  1. See, for example, the C++ standard 6.8.1.2 [basic.types.general]: “the underlying bytes making up [any simple] object can be copied into an array of char, unsigned char, or std::byte (17.2.1). If the content of that array is copied back into the object, the object shall subsequently hold its original value.” ↩︎