Learning more after lecture
- Lecture notes (on the course site’s Lectures menu)
- Textbook readings (see the course site’s Textbook page)
- Lecture code at https://github.com/cs61/cs61-lectures/
Last time
We saw how some C++ objects are represented as bytes in computer
memory. On an x86-64 machine, int
objects take 4 bytes to represent. These
bytes are stored in memory, which is an array of bytes, each of which has
an integer address. Integers’ bytes are stored in 4 consecutive addresses,
with the lowest-valued place (the ones place) in the lowest-addressed byte,
and the highest-valued place (the 224s place) in the
highest-addressed byte. We also learned that pointers on x86-64 machines take
8 bytes to represent, and are stored similarly. We saw that
dynamically-allocated memory and local variables are given qualitatively
different kinds of address when a program runs; the local variables have high
addresses (like 0x7fff'ffff'f2da, near 247 = 0x8000'0000'0000), but
dynamically allocated memory has lower addresses (like 0x5020'0000'0010 or
0x21e'f2b0, depending on sanitizer and Docker settings). We saw that computer
arithmetic can overflow, because 4 bytes (32 bits) cannot represent all \infty integers, and got a hint of how overflow can make
life complicated for library designers. We saw that arrays of integers are
laid out contiguously in memory, with no gaps. And we got our first hints of
undefined behavior, when we used print_bytes
to print more memory than our
program was allowed to access, and when we caused integer overflow using
signed arithmetic. Our programs are compiled by default with a sanitizer
that can catch many such errors, but we can turn the sanitizer off (with make SAN=0
) to live crazy. We learned what assertions are. Finally, we showed a
representation of the machine code that the processor actually runs. Different
C++ source codes can generate identical machine code, and these functions
behave identically; in fact, the processor can run machine code from anywhere,
including images of Hello Kitty.
This time
We investigate the lifetime and layout of C++ objects, including how that impacts performance.
Abstract machine
- C++ programs are written for an abstract machine defined by the C++ technical standard
- The abstract machine says what C++ programs mean
- It also says which programs have no meaning!
- But the abstract machine doesn’t exist in the world
- A C++ compiler is a program that translates source code to machine
instructions that run on a processor
- The output of a correct compiler is a program that has the same observable effects as the abstract machine
- Example observable effect: bytes printed by
printf
Wiggle room
- Sometimes the abstract machine is strict, sometimes loose
- Example strict requirement:
sizeof(char) == 1
- Example strict requirement: After
int x = 2;
, the value ofx
is2
- Example loose requirement: The numeric value of a pointer isn’t defined
- Example loose requirement: The as-if rule
- The compiler can transform its generated code however it wants, as long as the resulting program is no different in observable effect!
- Example strict requirement:
Rules for memory layout
- The memory representation of an object
x
comprisessizeof(x)
contiguous bytes starting at the address&x
- All objects of the same type have the same size and memory layout
- Every access made by a running program must be to a live object, meaning an object within its lifetime (having storage that has been allocated and has not yet been released)
- Distinct objects that are live at the same time must occupy disjoint
addresses
- The compiler and the operating system work together to enforce this
Disjoint objects and finite memory
- How to implement the disjoint address requirement?
- One way to solve this: give every object a new address!I*#&$!!!
- …
Example programs
datarep2/locals.cc
datarep2/functionlocals.cc
datarep2/strings.cc
datarep2/stringify.cc
datarep2/std-stringify.cc
Results
Function local variables are stored in high addresses, in a region of memory called the stack. The portion of the stack allocated to a given function is called its stack frame. If two function executions in the same program thread are live at the same time, then they have disjoint stack frames, and the stack frame of the caller (the function that started executing first) has larger addresses than the stack frame of the callee. This is because stacks grow down. All space allocated to a function is reclaimed when the function returns. Since their space is reclaimed automatically, local objects are said to have automatic storage duration. Referring to an object with automatic storage duration after its function returns is undefined behavior and should crash your program. A function can return exactly one object; if a function wants to return data of variable size, it must use dynamically allocated memory. The standard C++ library has many datatypes that use dynamic memory as part of their implementation.