Overview
Last time, we examined the way C++ compilers assign sizes to objects, and the way they lay out compound objects in memory. We ended on a cliffhanger about alignment. This time we explain alignment in depth, and then describe an arena allocator, a purpose-built dynamic memory allocator that improves performance when many small objects are allocated and freed.
Mysterious struct sizes
#include <cstdio>
#include "hexdump.hh"
int main() {
struct {
int a;
int b;
char c;
char d;
} s1 = {61, 62, 63, 64};
struct {
int a;
char b;
int c;
char d;
} s2 = {61, 62, 63, 64};
hexdump_object(s1);
hexdump_object(s2);
}
Alignments and compound types
- Hardware (and the compiler) impose alignment restrictions on primitive types
- Whenever an object of type
T
is stored in memory, its address must be a multiple ofalignof(T)
- Standard says alignments must be powers of two
- Therefore every alignment is an integral multiple of all smaller alignments
- Whenever an object of type
- The members of a compound object must obey alignment restrictions
- This imposes alignment requirements on compound types
- Array:
alignof(T[N])
=alignof(T)
- Union:
alignof(union { T1 m1; …; TN mN; })
= maxI
alignof(TI)
- Struct:
alignof(struct { T1 m1; …; TN mN; })
= maxI
alignof(TI)
- Array:
- For structs, alignment can require adding padding between some members
- You can change a struct’s size by reordering its members!
Alignment and size
- For an array
T a[N]
:- Array layout rule: addressof(
a[I]
) = addressof(a
) +I
×sizeof(T)
- Array alignment rule:
alignof(a)
=alignof(T)
- Array layout rule: addressof(
- What can you deduce about the relationship between
alignof(T)
andsizeof(T)
?
- addressof(
a[1]
) = addressof(a[0]
) +sizeof(T)
- addressof(
a[0]
) = addressof(a
) is a multiple ofalignof(a)
=alignof(T)
- addressof(
a[1]
) must be a multiple ofalignof(T)
- So
sizeof(T)
must be a multiple ofalignof(T)
!
Struct rule
- Struct members are laid out sequentially in memory
- In the order they are declared
- With minimum padding required to obey alignment restrictions
Struct example 1
struct s { // <- Alignment = ?, Size = ?
// <- Padding = ?
int a; // <- Alignment = 4, Offset = ?
// <- Padding = ?
char b; // <- Alignment = 1, Offset = ?
// <- Padding = ?
int c; // <- Alignment = 4, Offset = ?
// <- Padding = ?
char d; // <- Alignment = 1, Offset = ?
// <- Padding = ?
};
Struct example 1, completed
struct s { // <- Alignment = 4, Size = 16
// <- Padding = 0 (always)
int a; // <- Alignment = 4, Offset = 0 (always)
// <- Padding = 0 (`b` is less aligned than `a`)
char b; // <- Alignment = 1, Offset = 4
// <- Padding = 3 (recover alignment for `c`)
int c; // <- Alignment = 4, Offset = 8
// <- Padding = 0
char d; // <- Alignment = 1, Offset = 12
// <- Padding = 3 (recover alignment for `sizeof(s)`)
};
Struct rule, mathematically
Given
struct { T1 m1; T2 m2; … TN mN; } s
:alignof(s)
= maxI
alignof(TI)
A helper function,
offsetof
, computes the offset of each member relative to the address of the struct. The members are laid out in order, starting at the beginning of the struct, with minimal padding added between members to respect member alignments.offsetof(s, m1)
= 0ForI>1
,offsetof(s, mI)
=
×⎡
⎢
⎢offsetof(s, mI-1)
+sizeof(TI-1)
alignof(TI)
⎤
⎥
⎥alignof(TI)
(The expression \lceil X / Y \rceil \times Y rounds up X to the nearest multiple of Y.) We can see that
offsetof(s, mI)
is always a multiple ofalignof(TI)
, as we require.Finally, we can compute the size of the whole structure:
sizeof(s)
=⎡
⎢
⎢offsetof(s, mN)
+sizeof(TN)
alignof(s)
⎤
⎥
⎥alignof(s)
Alignment and malloc
- Malloc rule: Any non-null pointer returned by
malloc
and friends has the maximum alignment of any type (alignof(std::max_align_t)
)- On x86-64, this is 16
- C++
new
need not follow this rule;new T
returns a pointer aligned forT
- Standard vs. ABI
- All C++ implementations on all machines lay out struct members sequentially (standard requirement)
- Padding minimization is not required by the standard, but is required on x86-64 Linux (application-binary interface)
Performance and layout
// access 10M integers in up, down, or random order
unsigned long sum = 0;
unsigned long rand_sum = 0;
for (int i = 0; i != size; ++i) {
int r = rand() % size;
int idx;
if (style == access_up) {
idx = i;
} else if (style == access_down) {
idx = size - i - 1;
} else if (style == access_random) {
idx = r;
}
sum += v[idx];
rand_sum += r;
}
- Loop accesses a 100,000,000—element array of
int
s - In ascending, descending, or random order
- What’s different between the orders? Will performance change?
Sequential faster than random
- Many aspects of computer hardware and software are built to optimize for
compact and sequential memory access
- Processor cache, prefetching algorithms, etc.—subjects of Unit 3
- Consequences for data structure layout
- Vectors often faster than linked lists
- Fewer allocations often faster than more allocations
- Be aware of these characteristics, but don’t over-optimize prematurely
Performance consequences of memory allocation
// allocate and free nodes `noperations` times
long counter = 0;
for (unsigned i = 0; i != noperations; ++i) {
int pos = random_node_index(randomness);
if (n[pos] == nullptr) {
n[pos] = new node;
n[pos]->contents = counter;
++counter;
} else {
delete n[pos];
n[pos] = nullptr;
}
}
- Loop allocates & deallocates “nodes” 20,000,000 times
- Up to 8,192 nodes allocated at a time
- Models tasks like tree building & freeing, etc.
mb0.cc
uses a general-purpose memory allocator- Can we speed it up?
Memory arenas
- An arena allocator is an object whose purpose is to handle the allocation and deallocation of other objects
- It can be faster than a general-purpose allocator because it can make assumptions about the objects it allocates, such as that they are all the same size, or that they will all be freed at the same time