Overview
Last time, we examined the way C++ compilers assign sizes to objects, and the way they lay out compound objects in memory. This time we explore struct alignment in more depth and introduce pointer arithmetic and segments (general areas of memory).
Objects in memory
- Every object occupies a contiguous range of memory
- Objects that might exist at the same time occupy disjoint ranges of memory
- Live objects never overlap
Alignments and compound types
- Hardware (and the compiler) impose alignment restrictions on primitive types
- Whenever an object of type
T
is stored in memory, its address must be a multiple ofalignof(T)
- Standard says alignments must be powers of two
- Therefore every alignment is an integral multiple of all smaller alignments
- Whenever an object of type
- The members of a compound object must obey alignment restrictions
- This imposes alignment requirements on compound types
- Array:
alignof(T[N])
=alignof(T)
- Struct:
alignof(struct { T1 m1; …; TN mN; })
= maxI
alignof(TI)
- Array:
- For structs, alignment can require adding padding between some members
- You can change a struct’s size by reordering its members!
Struct alignment practice
struct s {
char a;
T1 b;
T2 c;
T3 d;
};
T1
,T2
, andT3
are some permutation ofunsigned char
,int
, andshort
.- Can you make
s
have size 8? - Can you make
s
have alignment 8?
Struct rule, mathematically
Given
struct { T1 m1; T2 m2; … TN mN; } s
:alignof(s)
= maxI
alignof(TI)
A helper function,
offsetof
, computes the offset of each member relative to the address of the struct. The members are laid out in order, starting at the beginning of the struct, with minimal padding added between members to respect member alignments.offsetof(s, m1)
= 0ForI>1
,offsetof(s, mI)
=
×⎡
⎢
⎢offsetof(s, mI-1)
+sizeof(TI-1)
alignof(TI)
⎤
⎥
⎥alignof(TI)
(The expression \lceil X / Y \rceil \times Y rounds up X to the nearest multiple of Y.) Note
offsetof(s, mI)
is always a multiple ofalignof(TI)
, as we require.Finally, we can compute the size of the whole structure:
sizeof(s)
=⎡
⎢
⎢offsetof(s, mN)
+sizeof(TN)
alignof(s)
⎤
⎥
⎥alignof(s)
Alignment and malloc
- Malloc rule: Any non-null pointer returned by
malloc
and friends has the maximum alignment of any type (alignof(std::max_align_t)
)- On x86-64, this is 16
- C++
new
need not follow this rule;new T
returns a pointer aligned forT
- Standard vs. ABI
- All C++ implementations on all machines lay out struct members sequentially (standard requirement)
- Padding minimization is not required by the standard, but is required on x86-64 Linux (application-binary interface)
Pointer representation and address representation
- On our machine, the representation of a pointer equals the representation of the address that holds the corresponding object
- Can cast pointers to and from address type
- The type of an address is
uintptr_t
- The type of an address is
- If two objects have the same address, can cast one type of pointer to the other
- This can happen with struct members and other aliases
Pointer arithmetic
- Pointers and arrays are conceptually linked
- We can do arithmetic on pointers into an array
- The results are the same as doing arithmetic on array indexes
- This idea generalized gave rise to C++ iterators!
Pointer comparisons
T a[N];
int i = ..., j = ...;
T* p1 = &a[i];
T* p2 = &a[j];
p1 == p2 ⇔ i == j
p1 != p2 ⇔ i != j
p1 < p2 ⇔ i < j
p1 > p2 ⇔ i > j
p1 <= p2 ⇔ i <= j
p1 >= p2 ⇔ i >= j
Pointer addition and subtraction
T a[N];
int i = ..., j = ..., k = ...;
T* p1 = &a[i];
T* p2 = &a[j];
p1 + k == &a[i] + k == &a[i + k]
p1 - k == &a[i] - k == &a[i - k]
p1 - p2 == &a[i] - &a[j] == i - j // type of `p1 - p2` is `ptrdiff_t` == `long`
p1 += k == p1 = p1 + k == p1 = &a[i + k]
++p1 == p1 = p1 + 1 == p1 = &a[i + 1]
Valid indexes
- For an array of size
N
:- Can form pointers
&a[0]
…&a[N]
- Can dereference pointers
&a[0]
…&a[N-1]
- Can form pointers
<
,>
,<=
,>=
,-
are valid with pointers into the same array- Any single (non-array) object can be treated as an array of size 1
Pointer arithmetic vs. address arithmetic
- Given
int a[2]; int* p1 = &a[0]; int* p2 = &a[1];
- The difference between the address values of
p1
andp2
is 4- Because the objects
a[0]
anda[1]
are 4 bytes big, objects cannot overlap, and arrays are laid out sequentially
- Because the objects
- If we convert each pointer to an address and subtract
- Equals the size of the element times the difference in indexes
- If we subtract the pointers
- Answer defined to equal difference in indexes
- So
addrdiff == ptrdiff * sizeof(T)
- And
addrdiff >= ptrdiff
- And
Union types
int main() {
union {
int a;
int b;
char c;
char d;
} u;
u.a = 61;
hexdump_object(u);
}
- Union types define objects that might have one of several different underlying representations
- One active member at a time, defined by most recent assignment
- Useful for special cases
- Given
union { T1 m1; T2 m2; … TN mN; } u
:
u.mI
) = addressof(u
)sizeof(u)
= maxI
sizeof(TI)
alignof(u)
= maxI
alignof(TI)
Segments
- We’ve now seen many objects with many different addresses.
- Can we classify those into groups? Are there any patterns?
- Yes! Groups of addresses are called segments.
Segment explorer
#include <cstdio>
#include "hexdump.hh"
int i1 = 61;
const int i2 = 62;
int main() {
int i3 = 63;
int* i4 = new int{64};
hexdump_named_object(i1);
hexdump_named_object(i2);
hexdump_named_object(i3);
hexdump_named_object(i4);
hexdump_named_object(*i4);
}
Segment names
i1
, a modifiable global object, is in the data segmenti2
, a non-modifiable global object, is in the code/text/readonly segmenti3
andi4
, local variable objects, are in the stack segment*i4
, a dynamically-allocated object, is in the heap segment