Data representation 2: Sizes and layout

Learning more after lecture

Last time

We saw how some C++ objects are represented as bytes in computer memory. On an x86-64 machine, int objects take 4 bytes to represent. These bytes are stored in memory, which is an array of bytes, each of which has an integer address. Integers’ bytes are stored in 4 consecutive addresses, with the lowest-valued place (the ones place) in the lowest-addressed byte, and the highest-valued place (the 224s place) in the highest-addressed byte. We also learned that pointers on x86-64 machines take 8 bytes to represent, and are stored similarly. We saw that dynamically-allocated memory and local variables are given qualitatively different kinds of address when a program runs; the local variables have high addresses (like 0x7fff'ffff'f2da, near 247 = 0x8000'0000'0000), but dynamically allocated memory has lower addresses (like 0x5020'0000'0010 or 0x21e'f2b0, depending on sanitizer and Docker settings). We saw that computer arithmetic can overflow, because 4 bytes (32 bits) cannot represent all \infty integers, and got a hint of how overflow can make life complicated for library designers. We saw that arrays of integers are laid out contiguously in memory, with no gaps. And we got our first hints of undefined behavior, when we used print_bytes to print more memory than our program was allowed to access, and when we caused integer overflow using signed arithmetic. Our programs are compiled by default with a sanitizer that can catch many such errors, but we can turn the sanitizer off (with make SAN=0) to live crazy. We learned what assertions are. Finally, we showed a representation of the machine code that the processor actually runs. Different C++ source codes can generate identical machine code, and these functions behave identically; in fact, the processor can run machine code from anywhere, including images of Hello Kitty.

This time

We investigate the lifetime and layout of C++ objects, including how that impacts performance.

Abstract machine

Wiggle room

Rules for memory layout

Disjoint objects and finite memory

Example programs

Results

Function local variables are stored in high addresses, in a region of memory called the stack. The portion of the stack allocated to a given function is called its stack frame. If two function executions in the same program thread are live at the same time, then they have disjoint stack frames, and the stack frame of the caller (the function that started executing first) has larger addresses than the stack frame of the callee. This is because stacks grow down. All space allocated to a function is reclaimed when the function returns. Since their space is reclaimed automatically, local objects are said to have automatic storage duration. Referring to an object with automatic storage duration after its function returns is undefined behavior and should crash your program. A function can return exactly one object; if a function wants to return data of variable size, it must use dynamically allocated memory. The standard C++ library has many datatypes that use dynamic memory as part of their implementation.