Data representation 3: Layout and allocators

Overview

Last time, we examined the way C++ compilers assign sizes to objects, and the way they lay out compound objects in memory. This time we explore struct alignment in more depth and introduce pointer arithmetic and segments (general areas of memory).

Objects in memory

Every object occupies a contiguous range of memory
Objects that might exist at the same time occupy disjoint ranges of memory
- Live objects never overlap

Alignments and compound types

Hardware (and the compiler) impose alignment restrictions on primitive types
- Whenever an object of type T is stored in memory, its address must be a multiple of alignof(T)
- Standard says alignments must be powers of two
- Therefore every alignment is an integral multiple of all smaller alignments
The members of a compound object must obey alignment restrictions
This imposes alignment requirements on compound types
- Array: alignof(T[N]) = alignof(T)
- Struct: alignof(struct { T₁ m₁; …; T_N m_N; }) = max_I alignof(T_I)
For structs, alignment can require adding padding between some members
- You can change a struct’s size by reordering its members!

Struct alignment practice

struct s {
    char a;
    T1 b;
    T2 c;
    T3 d;
};

T1, T2, and T3 are some permutation of unsigned char, int, and short.
Can you make s have size 8?
Can you make s have alignment 8?

Struct rule, mathematically

Given struct { T₁ m₁; T₂ m₂; … T_N m_N; } s:
alignof(s) = max_I alignof(T_I)
A helper function, offsetof, computes the offset of each member relative to the address of the struct. The members are laid out in order, starting at the beginning of the struct, with minimal padding added between members to respect member alignments.

offsetof(s, m₁) = 0

For I>1, offsetof(s, m_I) =
⎡
⎢
⎢

offsetof(s, m_I-1) + sizeof(T_I-1)

alignof(T_I)

⎤
⎥
⎥
× alignof(T_I)

(The expression \lceil X / Y \rceil \times Y rounds up X to the nearest multiple of Y.) Note offsetof(s, m_I) is always a multiple of alignof(T_I), as we require.
Finally, we can compute the size of the whole structure:

sizeof(s) =

⎡
⎢
⎢

offsetof(s, m_N) + sizeof(T_N)

alignof(s)

⎤
⎥
⎥

× alignof(s)

Alignment and `malloc`

Malloc rule: Any non-null pointer returned by malloc and friends has the maximum alignment of any type (alignof(std::max_align_t))
- On x86-64, this is 16
- C++ new need not follow this rule; new T returns a pointer aligned for T
Standard vs. ABI
- All C++ implementations on all machines lay out struct members sequentially (standard requirement)
- Padding minimization is not required by the standard, but is required on x86-64 Linux (application-binary interface)

Pointer representation and address representation

On our machine, the representation of a pointer equals the representation of the address that holds the corresponding object
Can cast pointers to and from address type
- The type of an address is uintptr_t
If two objects have the same address, can cast one type of pointer to the other
- This can happen with struct members and other aliases

Pointer arithmetic

Pointers and arrays are conceptually linked
We can do arithmetic on pointers into an array
The results are the same as doing arithmetic on array indexes
This idea generalized gave rise to C++ iterators!

Pointer comparisons

T a[N];
int i = ..., j = ...;
T* p1 = &a[i];
T* p2 = &a[j];

p1 == p2        ⇔        i == j
p1 != p2        ⇔        i != j
p1 < p2         ⇔        i < j
p1 > p2         ⇔        i > j
p1 <= p2        ⇔        i <= j
p1 >= p2        ⇔        i >= j

Pointer addition and subtraction

T a[N];
int i = ..., j = ..., k = ...;
T* p1 = &a[i];
T* p2 = &a[j];

p1 + k     ==     &a[i] + k     ==    &a[i + k]
p1 - k     ==     &a[i] - k     ==    &a[i - k]
p1 - p2    ==   &a[i] - &a[j]   ==    i - j        // type of `p1 - p2` is `ptrdiff_t` == `long`

p1 += k    ==    p1 = p1 + k    ==    p1 = &a[i + k]
++p1       ==    p1 = p1 + 1    ==    p1 = &a[i + 1]

Valid indexes

For an array of size N:
- Can form pointers &a[0]…&a[N]
- Can dereference pointers &a[0]…&a[N-1]
<, >, <=, >=, - are valid with pointers into the same array
Any single (non-array) object can be treated as an array of size 1

Pointer arithmetic vs. address arithmetic

Given int a[2]; int* p1 = &a[0]; int* p2 = &a[1];
The difference between the address values of p1 and p2 is 4
- Because the objects a[0] and a[1] are 4 bytes big, objects cannot overlap, and arrays are laid out sequentially
If we convert each pointer to an address and subtract
- Equals the size of the element times the difference in indexes
If we subtract the pointers
- Answer defined to equal difference in indexes
So addrdiff == ptrdiff * sizeof(T)
- And addrdiff >= ptrdiff

Union types

int main() {
    union {
        int a;
        int b;
        char c;
        char d;
    } u;
    u.a = 61;
    hexdump_object(u);
}

Union types define objects that might have one of several different underlying representations
- One active member at a time, defined by most recent assignment
- Useful for special cases
Given union { T₁ m₁; T₂ m₂; … T_N m_N; } u:

addressof(u.m_I) = addressof(u)

sizeof(u) = max_I sizeof(T_I)

alignof(u) = max_I alignof(T_I)

Segments

We’ve now seen many objects with many different addresses.
Can we classify those into groups? Are there any patterns?
- Yes! Groups of addresses are called segments.

Segment explorer

#include <cstdio>
#include "hexdump.hh"

int i1 = 61;
const int i2 = 62;

int main() {
    int i3 = 63;
    int* i4 = new int{64};

    hexdump_named_object(i1);
    hexdump_named_object(i2);
    hexdump_named_object(i3);
    hexdump_named_object(i4);
    hexdump_named_object(*i4);
}

Segment names

i1, a modifiable global object, is in the data segment
i2, a non-modifiable global object, is in the code/text/readonly segment
i3 and i4, local variable objects, are in the stack segment
*i4, a dynamically-allocated object, is in the heap segment