Lecture 3
Segments
Review: What data is stored where? The addresses are taken from
executions of ./mexplore
.
Object declaration |
Lifetime |
Segment |
Example address range |
---|---|---|---|
Constant global |
Static |
Code (Text) |
0x400000 (≈1 × 222) |
Global |
Static |
Data |
0x600000 (≈1.5 × 222) |
Local |
Automatic |
Stack |
0x7fff448d0000 (≈225 × 222) |
Anonymous, returned by malloc |
Dynamic |
Heap |
0x1a00000 (≈8 × 222) |
Constant global data and global data have the same lifetime, but are stored in different segments. The operating system uses different segments so it can prevent the program from modifying constants. It marks the code segment, which contains functions (instructions) and constant global data, as read-only, and any attempt to modify code-segment memory causes a crash (a “Segmentation violation”).
Automatic-lifetime and dynamic-lifetime data is allocated and freed as
the program runs. The compiler manages automatic-lifetime data using the
stack (we’ll see how soon); the application programmer manages
dynamic-lifetime data by calling malloc
and free
.
Static-lifetime data and compiled instructions exist for the whole program runtime. Before the program runs, the compiler writes the initial values for this data into object files, which the linker combines into an executable. The operating system loads those values into memory before it starts executing the program’s instructions. This means that an executable is normally at least as big as the static-lifetime data. There is an exception, however: a special, separate segment, the “bss” segment, is used to hold static-lifetime data with initial value zero. Such data is common; all static-lifetime data is initialized to zero unless otherwise specified in the program text. Rather than storing a bunch of zeros in the object files and executable, the compiler and linker simply track the location and size of all zero-initialized global data. The operating system sets this memory to zero during the program load process. Since clearing memory is faster than loading data from disk, this optimization saves both time (the program loads faster) and space (the executable is smaller).
Integer representation
A bit is the fundamental unit of digital information: it’s either 0 or 1.
C manages memory in units of bytes, which, on all modern machines,
contains 8 bits. (But not all
machines!) C’s unit for a byte is
char
: the abstract machine says a byte is stored in char
. That
means an unsigned char
holds values [0, 255] inclusive.
Other integer types can hold more values. On x86-64, an unsigned short
can hold values [0, 65535] inclusive. An unsigned int
can
hold values [0, 4294967295] inclusive. And an unsigned long
can
hold values [0, 18446744073709551615] inclusive.
The abstract machine doesn’t specify how large integers are stored in memory—it’s the compiler and hardware’s job to make that choice. But modern computers make one of two basic choices, called little endian and big endian. Here’s how it works.
- Write the large integer in hexadecimal format, including all leading
zeros. For example, the
unsigned int
value 65534 would be written0x0000FFFE
. There will be twice as many hexadecimal digits assizeof(TYPE)
. - Both little-endian and big-endian mode divide the integer into its component bytes, which are its digits in base 256. In our example, they are, from most to least significant, 0x00, 0x00, 0xFF, and 0xFE.
- In little endian representation, the bytes are stored in memory from least to most significant. If our example was stored at address 0x30, we would have:
0x30: 0xFE 0x31: 0xFF 0x32: 0x00 0x33: 0x00
- In big endian representation, the bytes are stored in memory from most to least significant, which is the reverse order.
0x30: 0x00 0x31: 0x00 0x32: 0xFF 0x33: 0xFE
x86-64 uses little endian order. The Internet’s fundamental protocols, such as IP and TCP, use big endian order, which is therefore also called “network” byte order.
Alignment
Repeated executions of programs like ./mexplore
show that the C
compiler and library restricts the addresses at which some kinds of data
appear. In particular, the address of every int
value is always a
multiple of 4, whether it’s located on the stack (automatic lifetime),
the data segment (static lifetime), or the heap (dynamic lifetime).
A bunch of observations will show you these rules:
Type | Size | Address restriction |
---|---|---|
char (signed char , unsigned char ) |
1 | No restriction |
short (unsigned short ) |
2 | Multiple of 2 |
int (unsigned int ) |
4 | Multiple of 4 |
long (unsigned long ) |
8 | Multiple of 8 |
float |
4 | Multiple of 4 |
double |
8 | Multiple of 8 |
T\* |
8 | Multiple of 8 |
These are the alignment restrictions for an x86-64 machine.
Alignment restrictions can make hardware simpler, and therefore faster. For instance, consider cache blocks, which we discussed in the first lecture. CPUs access memory through a transparent hardware cache. Data moves from primary memory, or RAM (which is large—a couple gigabytes on most laptops—and uses cheaper, slower technology) to the cache in units of 64 or 128 bytes. Those units are always aligned: on a machine with 128-byte cache blocks, the bytes with memory addresses [127, 128, 129, 130] live in two different cache blocks (with addresses [0, 127] and [128, 255]). But the 4 bytes with addresses [4n, 4n+1, 4n+2, 4n+3] always live in the same cache block. (This is true for any small power of two: the 8 bytes with addresses [8n,…,8n+7] always live in the same cache block.) In general, it’s often possible to make a system faster by leveraging restrictions—and here, the CPU hardware can load data faster when it can assume that the data lives in exactly one cache line.
The compiler, library, and operating system all work together to enforce alignment restrictions. In abstract machine terms, accessing data through an unaligned pointer causes undefined behavior. Note that undefined behavior occurs even though x86-64 hardware has optional alignment: the hardware is capable of dereferencing an unaligned pointer.
The compiler’s __alignof__
special form returns a type or
object’s alignment. On x86-64 Linux, __alignof__(T) == sizeof(T)
for all fundamental types (the types built in to C: integer types,
floating point types, and pointers). But this isn’t always true; on
x86-32 Linux, double
has size 8 but alignment 4.
User-defined types: rules
The sizes and alignments for user-defined types—arrays, structs, and unions—are derived from a couple simple rules or principles. Here they are. The first rule applies to all types.
1. First-member rule. The address of the first member of a collection equals the address of the collection.
Thus, the address of an array is the same as the address of its first element. The address of a struct is the same as the address of the first member of the struct.
The next three rules depend on the class of user-defined type. Every C abstract machine enforces these rules.
2. Array rule. The address of the i
th element of an array of
type T
is ADDRESSOF(array) + i \* sizeof(T)
.
3. Struct rule. The second and subsequent members of a struct are laid out in order, with no overlap, subject to alignment constraints.
4. Union rule. All members of a union share the address of the union.
Every C abstract machine also enforces
5. Malloc rule. Any non-NULL pointer returned by malloc
has
alignment appropriate for any type. In other words, assuming the
allocated size is adequate, the pointer returned from malloc
can
safely be cast to T\*
for any T
.
Oddly, this holds even for small allocations. The C standard (the
abstract machine) requires that malloc(1)
return a pointer whose
alignment is appropriate for any type, including types that don’t fit.
The malloc rule implies that every C implementation has a maximum
alignment. The GCC and clang compilers allow you to query that alignment
with this expression: __alignof__(struct __attribute__((aligned)) {})
. On x86-64 Linux, this returns 16.
(This is the size and alignment of long double
, as well as some
special types associated with “SIMD” instructions that we won’t discuss
in this course.) So every x86-64 Linux malloc
implementation should
always return pointers that are 16-byte aligned. The maximum alignment
is a multiple of every other alignment; on our machine (and most
others), this works out simply because every alignment is a power of
two.
The last rule is not required by the abstract machine, but it’s how sizes and alignments on our machines work:
6. Minimum rule. The sizes and alignments of user-defined types, and the offsets of struct members, are minimized within the constraints of the other rules.
The minimum rule, and the sizes and alignments of basic types, are defined by the x86-64 Linux “ABI”—its Application Binary Interface. This specification standardizes how x86-64 Linux C compilers should behave, and lets users mix and match compilers without problems.
Consequences of the size and alignment rules
From these rules we can derive some interesting consequences.
For example, the offset of any struct member—which is the difference between the address of the member and the address of the containing struct—is a multiple of the member’s alignment.
To see why, consider a struct s
with member m
at offset o
.
The malloc rule says that any pointer returned from malloc
is
correctly aligned for s
. Every pointer returned from malloc
is
maximally aligned, equalling 16\*x
for some integer x
. The
struct rule says that the address of m
, which is 16\*x + o
, is
correctly aligned. That means that 16\*x + o = __alignof__(m)\*y
for some integer y
. Divide both sides by a = __alignof__(m)
and you see that 16\*x/a + o/a = y
. But 16/a
is an integer—the
maximum alignment is a multiple of every alignment—so 16\*x/a
is an
integer. We can conclude that o/a
must also be an integer!
For another, the size of every type is a multiple of its alignment.
To see why, consider an array with two elements. By the array rule,
these elements have addresses a
and a+sizeof(T)
, where a
is
the address of the array. Both of these addresses contain a T
, so
they are both a multiple of __alignof__(T)
. That means
sizeof(T)
is also a multiple of __alignof__(T)
.
Finally, we can also derive the necessity for padding at the end of structs. (How?)
Pointer representation
We distinguish pointers, which are concepts in the C abstract machine, from addresses, which are hardware concepts. A pointer combines an address and a type.
The memory representation of a pointer is the same as the memory representation of its address, so a pointer with address 0x1347810A is stored the same way as the integer with the same value.
The C abstract machine defines an unsigned integer type uintptr_t
that can hold any address. (You have to #include \<inttypes.h\>
to
get the definition.) On most machines, including x86-64, uintptr_t
is the same as unsigned long
. Casts between pointer types and
uintptr_t
are information preserving, so this assertion will never
fail:
void* ptr = malloc(...);
uintptr_t addr = (uintptr_t) ptr;
void* ptr2 = (void*) addr;
assert(ptr == ptr2);
Since it is a 64-bit architecture, the size of an x86-64 address is 64 bits (8 bytes). That’s also the size of x86-64 pointers.
Pointer arithmetic
Since addresses are integers, we can clearly do all kinds of arithmetic on them, including useless arithmetic (multiplying two addresses together is unlikely to produce a useful result).
But the C language also lets us do arithmetic and comparisons on pointers. Here’s the rule:
Pointer arithmetic rule. In the C abstract machine, arithmetic on pointers produces the same result as arithmetic on the corresponding array indexes.
Specifically, consider an array T a[n]
and a pointer T\* p = &a[i]
, where 0 \<= i \<= n
. Then:
- For an integer
x
with0 \<= i + x \<= n
, we havep + x == &a[i + x]
andx + p == &a[x + i]
. - For an integer
x
with0 \<= i - x \<= n
, we havep - x == &a[i - x]
. - For a pointer
T\* q = &a[j]
with0 \<= j \<= n
, we havep - q == (ptrdiff_t) (i - j)
andq - p == (ptrdiff_t) (j - i)
. - For a pointer
T\* q = &a[j]
with0 \<= j \<= n
, we havep \< q
iffi \< j
(and similarly for\>
,\<=
, and\>=
).
(ptrdiff_t
is a signed type large enough to hold the difference
between any two pointers. Like uintptr_t
, it’s defined by #include \<inttypes.h\>
; on x86-64, it’s the same as long
.)
The pointer arithmetic design is one of the basic building blocks of C,
and it has fundamental consequences for C’s expressivity. A library
design that takes advantage of pointer arithmetic makes it easy to apply
functions to subarrays as well as arrays. (Thanks to pointer
arithmetic and the properties of addition and subtraction, a pointer to
an array’s i
th element behaves like the subarray starting at index
i
.) It is therefore natural for C functions to take pointers where,
in other languages, you might expect to see arrays. The C++ language’s
iterator design takes the underlying abstract idea of pointer arithmetic
to an impressive and beautiful place, making it possible to build very
fast data types and generic algorithms.
But there are some gotchas having to do with undefined behavior.
- It is undefined behavior to form a pointer before the beginning of an
array or after the end. That is, if
i + x \< 0
ori + x \> n
wheren
is the size of the underlying array, then&p[i] + x
causes undefined behavior. - It is undefined behavior to subtract or compare pointers that are not
to the same array. (Exception:
==
and!=
comparisons are OK.)
If you want to subtract or compare pointers that might not be to the
same array, do the subtraction or comparison using addresses by casting
the pointers to uintptr_t
first.
The pointer arithmetic rule means that computations on pointers produce
different numeric results than computations on addresses. Consider m + x
where x
is an integer. If m
is an address type like
uintptr_t
, then m + x
is computed in modular arithmetic, using
the machine’s add instruction. But if m
is a pointer type, then the
compiler must scale x
by the size of the underlying type. For any
type T, we have:
T* a = ...;
int i;
T* b = a + i; // assume defined behavior
assert((uintptr_t) b == (uintptr_t) a + i * sizeof(T));
assert(b - a == i);
assert((uintptr_t) b - (uintptr_t) a == i * sizeof(T));