C and C++ grant programmers fine-grained, low-level control over memory,
including the ability to manufacture pointers to objects from integers. This
is powerful but dangerous. It lets us write hexdump
, for example, and
examine memory as a program runs. It also lets programmers make mistakes, and
those mistakes are central to the “disastrously central role that
[C] has played in our ongoing computer security
nightmare”. This section is about
recognizing those mistakes.
Preread: Live object rule & Kinds of memory error
Attendance at section is required and recorded for all college students. All college students should fill out this attendance form when attending section. For more on the attendance policy and what to do if you miss section refer to the syllabus here and here.
Attendance is not recorded for Extension students.
Live object rule
Most memory errors boil down to violations of this requirement:
Live object rule. Every memory access made by a C++ program must be to a live object, meaning an object within its lifetime (having storage that has been allocated and has not yet been released).
Different kinds of object storage have different lifetimes. Lifetimes generally correlate to the segments in which objects are stored.
-
Static lifetime (global variables in code and data segments): These objects live for as long as the program.
-
Automatic lifetime (local variables in the stack segment): These objects live for as long as their containing block. They are allocated at their point of declaration and released when the program exits the corresponding block.
-
Dynamic lifetime (dynamically-allocated variables in the heap segment): These objects have user-defined lifetime. They are allocated by an allocation function (
malloc
ornew
) and released by a deallocation function (free
ordelete
).
This program violates the live object rule four times: once for an automatic-lifetime object, once for a dynamic-lifetime object, once for a static-lifetime object, and once for no object at all.
int global[10];
int main() {
int* ptr;
{
int local;
ptr = &local;
}
*ptr = 2; // VIOLATION: `local`’s lifetime has ended
ptr = new int;
delete ptr;
*ptr = 2; // VIOLATION: the dynamically allocated int’s lifetime has ended
ptr = &global[10];
*ptr = 2; // VIOLATION: no object at `global[10]`, past bounds of array
ptr = (int*) 0x90127592174982;
*ptr = 2; // almost certainly VIOLATION: points to no object
}
Kinds of memory error
Memory errors are common enough in C and C++ that some frequently-occurring classes of error have specific names.
-
Invalid access: A read or write access that doesn’t access an object. Many kinds of invalid access have their own names, including:
- Null pointer dereference: A read or write access to a null pointer. (The null pointer never points to a live object.)
- Boundary read/write error: A read or write access immediately past the beginning or end of an object.
- Out-of-bounds access: A read or write access past the boundaries of an array.
- Stack buffer overflow: A write access past the end of an array with automatic lifetime (a local variable).
- Unaligned access: A read or write access to a pointer that’s not properly aligned for its type. (Every live object has a properly-aligned address.)
- Wild read/write: A read or write access to a pointer that’s nowhere near a live object.
- Aliasing violation: An access using a pointer that doesn’t correspond to the object type stored at that address (for instance, accessing a
std::string
using anint*
pointer).
-
Use-after-free: A read or write access to an object after its lifetime has ended. (This term is commonly applied when a program accesses a dynamically-allocated object after it was explicitly freed, but it is also possible to access a local object after it’s deallocated.)
-
Invalid free: Passing a pointer to
free
(ordelete
) that was not allocated bymalloc
(ornew
). -
Double free: Freeing dynamically-allocated memory that has already been freed.
-
Modify const: A write access to an object that is read-only (was originally declared
const
).
Invalid accesses and use-after-free errors violate the live object rule. Invalid and double frees violate the rules for dynamic memory allocation, but don’t violate the live object rules in and of themselves.
Sometimes a program that causes a memory error will crash right away; for
instance, dereferencing a null pointer almost always crashes the program. This
is good! We want errors to crash the program—a reliable crash is far
easier to debug. Sometimes, though, the consequences of an error are more
insidious. An error might cause no problem on some OSes, and cause a huge
security hole in others. Sanitizers (make SAN=1
) will catch many errors, but
not all.
Memory safe languages (offline)
Many other programming languages are memory-safe, meaning that in those languages, it is impossible to write a program that violates the live object rule. The key technology that makes memory safety possible is automatic garbage collection. In an automatically-garbage-collected language like Java, Python, or Javascript, every object is dynamically allocated, but the programmer never needs to call an explicit deallocation function. Instead, the language runtime library will periodically search the program’s memory space for objects that can no longer be referenced, and free any it finds.
Automatic garbage collection is wonderful, but heap-allocating every object can be expensive, so many languages use compiler analysis to determine which heap-allocated objects can actually be placed on the stack.
Simple examples
In the next parts of section, we’ll ask you to analyze a bunch of programs—perhaps in groups—and then gather back together and discuss what you’ve learned. For each program, please answer:
- Does the program contain a memory error? Does it violate the live object rule?
- What kinds of memory errors does it contain, if any?
Code for the examples is in the cs61-sections/datareps2
directory, but try
to find an answer on your own before running the compiler.
Exercise S1. Are there memory errors, and if so, which ones?
int f(int x) { return x; } int main() { printf("%d\n", f(2)); }
Exercise S2. Are there memory errors, and if so, which ones?
int global = 0; int* f(int x) { return &global; } int main() { int* ptr = f(2); printf("%d\n", *ptr); }
Exercise S3. Are there memory errors, and if so, which ones?
int* f(int x) { return &x; } int main() { int* ptr = f(2); printf("%d\n", *ptr); }
Exercise S4. Are there memory errors, and if so, which ones?
int main() { int* ptr; int x = 2; ptr = &x; printf("%d\n", *ptr); }
Exercise S5. Are there memory errors, and if so, which ones?
int main() { int* ptr; { int x = 2; ptr = &x; } printf("%d\n", *ptr); }
Buffer examples
These examples turn to programs that involve buffers, which are character arrays (including strings). Programs that read input do so into one or more buffers.
It’s worth noting that many C interfaces for I/O are vulnerable to stack buffer overflows, and these errors have caused critical software vulnerabilities over the years. For instance, the
gets
standard library function, which reads a line from standard input but offers no way to limit the number of characters read, is so dangerous that many operating systems now prohibit its use. The CWE (“Common Weakness Enumeration”) database has some illustrative examples of buffer overflows.
Exercise B1. Are there memory errors, and if so, which ones?
const char* f(int x) { return "example"; } int main() { printf("%s\n", f(2)); }
Exercise B2. Are there memory errors, and if so, which ones?
const char* f(int x) { char buf[100] = "example"; return buf; } int main() { printf("%s\n", f(2)); }
Exercise B3. Are there memory errors, and if so, which ones?
const char* f(int x) { char buf[100] = "example"; return buf; } int main() { printf("%p\n", f(2)); }
Exercise B4. Are there memory errors, and if so, which ones?
const char* f(int x) { char* buf = new char[100]; strcpy(buf, "example"); return buf; } int main() { const char* s = f(2); printf("%s\n", s); delete[] s; }
Exercise B5. Are there memory errors, and if so, which ones?
const char* f(int x) { char* buf = new char[100]; strcpy(buf, "example"); return buf; } int main() { const char* s = f(2); delete[] s; printf("%s\n", s); }
Exercise B6. Are there memory errors, and if so, which ones?
const char* f(int) { char* buf = new char[strlen("example")]; strcpy(buf, "example"); return buf; } int main() { const char* s = f(2); printf("%s\n", s); delete[] s; }
Hulk buffers (optional exercises)
Consider this function.
const char* hulk_players[] = {
"RUFFALO",
"BANA",
"FERRIGNO",
"MIYAUCHI"
};
void hulk_greeting(const char* name) {
char uc_name[16];
// make uppercase version of `name`
for (size_t i = 0; name[i] != 0; ++i) {
uc_name[i] = toupper(name[i]);
}
uc_name[strlen(name)] = 0;
printf("HULK SAY HELLO TO %s\n", uc_name);
// find player
for (size_t i = 0; hulk_players[i] != nullptr; ++i) {
if (strcmp(hulk_players[i], uc_name) == 0) {
printf("YOU GOOD ACTOR\n");
return;
}
}
printf("YOU NO PLAY HULK! HULK SMASH\n");
}
Exercise H1. What’s an argument to
hulk_greeting
that would cause no memory errors?
Exercise H2. What’s an argument to
hulk_greeting
that would cause a stack buffer overflow?
Exercise H3. What’s an argument to
hulk_greeting
that would cause an out-of-bounds read?
Exercise H4. What’s an argument to
hulk_greeting
that would cause a null pointer dereference?
Exercise H5. What’s an argument to
hulk_greeting
that would cause a wild read?
Exercise H6. What’s an argument to
hulk_greeting
that would cause a use after free error?
Linked list examples
Now let’s turn our mind to something a little tougher: linked lists. Linked lists are sometimes considered simple data structures, but manipulating them is actually quite tricky, and if you’re not careful your linked list code may encounter memory errors.
Our linked list, which is doubly-linked, uses these declarations:
struct node {
int value;
node* prev;
node* next;
};
extern node* head;
For each node:
- The
next
pointer points to the next node in the list, except that it isnullptr
for the last element (the tail or back element). - The
prev
pointer points to the previous node in the list, except that it isnullptr
for the first element (the head or front element).
Here’s an example function using this structure; it returns a pointer to the
list node with value v
, or nullptr
if there’s no such node.
node* find(int v) {
node* trav = head;
while (trav && trav->value != v) {
trav = trav->next;
}
return trav;
}
Exercise LL1. A data invariant is a statement about an object or data structure that is always true for any correct program. A function that modifies the data structure might break the invariant temporarily, but it should always restore the invariant before returning.
One data invariant for any list program using
ll.hh
would be:
- Either
head == nullptr
orhead
points to a livenode
object.What other data invariants should hold for list programs? Discuss! Ideally, you would enumerate a complete set of invariants, so that a program that preserves all the data invariants would never encounter a memory error involving lists.
Your invariants may use this definition: A
node
objectn
is a member of the list if and only ifn
is accessible by traversingnext
pointers starting athead
.We believe this set of invariants is complete:
- If
head != nullptr
, thenhead->prev == nullptr
.- If
n
is a member of the list, andn->prev != nullptr
, thenn->prev
is a live object and a member of the list.- If
n
is a member of the list, andn->prev != nullptr
, thenn->prev->next == n
.- If
n
is a member of the list, andn->next != nullptr
, thenn->next
is a live object and a member of the list.- If
n
is a member of the list, andn->next != nullptr
, thenn->next->prev == n
.
Given a set of data invariants, you can write functions whose only purpose is to check them, without modifying the relevant data structures or doing anything else. Such representation checker functions can be invaluable help in tracking down bugs in your code. For instance, if you’re having trouble with a data structure, try calling the representation checker at the end of every function that modifies the data structure. This will provide early warning as soon as the data structure goes wrong!
Offline Exercise. Write a representation checker function for the list.
It’s difficult to check the invariants about whether members of the list are live objects, but luckily many sanitizers will automatically check liveness as we traverse the list. Sanitizers don’t understand the list-specific invariants about
next
andprev
pointers, though, so we must check them ourselves.void check_list() { node* prev = nullptr; node* trav = head; while (trav) { assert(trav->prev == prev); prev = trav; trav = trav->next; } }
Exercise LL2. Write a function
void append(node* n)
that addsn
to the end of the list whose head ishead
. In addition to addingn
to the list, your function should setn->next
andn->prev
to the proper values. You may assume thatn
is a livenode
and not already a member of the list.(We’ve posted reference linked-list code, but you’ll get more out of this exercise by working on the code yourself.) You can write and test your code in
ll2.cc
.Here’s one correct version:
void append(node* n) { node* prev = nullptr; node* trav = head; while (trav) { prev = trav; trav = trav->next; } n->prev = prev; n->next = nullptr; if (prev) { prev->next = n; } else { head = n; } }
Hilariously our first version of this code was wrong. 🙄
Exercise LL3. Now swap your
append
function with a neighbor or friend and try to find memory bugs! Does theirappend
function contain memory errors? If so, which ones?
Exercise LL4. Write a function
void erase(node* n)
that removesn
from the list. You may assume thatn
is a live object and a member of the list. Whenerase
returns,n
should still be a live object, but no longer a member of the list. You can write and test your code inll4.cc
.Here’s one correct version:
void erase(node* n) { if (n->next) { n->next->prev = n->prev; } if (n->prev) { n->prev->next = n->next; } else { head = n->next; } }
Exercise LL5. Now swap your
erase
function with a neighbor or friend and try to find memory bugs!
Exercise LL6. Can this linked list function violate any linked-list invariants or cause memory bugs (either immediately, or in a later call to
erase
orinsert
orfind
)? If so, how?// Insert node `n` into the list immediately before node `before`. // If `before == nullptr`, append `n` to the list. void insert(node* before, node* n) { if (!before) { node* prev = head; while (prev->next != nullptr) { prev = prev->next; } n->prev = prev; n->next = nullptr; prev->next = n; } else { n->prev = before->prev; n->next = before; before->prev = n; } }
This code has a number of problems!
If the list is empty, then line 10 will cause a null pointer dereference.
If
before == head
, then this code should maken
the front of the list. But lines 17–19 do not updatehead
!If
before != nullptr && before != head
, then this code should update bothbefore->prev->next
andbefore->prev
in lines 17–19, but it doesn’t updatebefore->prev->next
.Error #1 will cause a memory error right away. Errors #2 and #3 will not, but they are still evil! Breaking a data structure leaves it in a wobbly, dangerous state, like a bridge with a crack in a stanchion; later operations can compound this problem, worsening the “crack” until a memory error occurs (and the bridge falls down). For instance, consider this sequence of operations using the provided
insert
and our suggestederase
. Theinsert
operation triggers error #3; the memory error happens later.0. initial state head == N1 N2 prev: nullptr prev: N1 next: N2 next: nullptr 1. insert(N2, N3); head == N1 N3 N2 prev: nullptr prev: N1 prev: N3 next: N2 XXX next: N2 next: nullptr 2. erase(N1); N1 N3 head == N2 [not on] prev: N1 prev: nullptr [list] next: N2 next: nullptr 3. // N1 can be freed now that it’s been removed from the list delete N1; N3 head == N2 prev: N1 !!! prev: nullptr next: N2 next: nullptr 4. erase(N3); // This will try to modify N3->prev->next and cause a use-after-free error!
The list invariants hold in the initial state, state 0, but they fail starting after
insert
in step 1. (N1->next->prev != N1
andN3->prev->next != N3
.) This error gets worse with more operations until a memory error happens in step 4. A good sanitizer might detect that memory error, but notice how far step 4 is from theinsert
bug that caused the problem! If you implemented a representation checker and checked the list representation at the end ofinsert
, that would detect the error basically right after it happened, helping you localize the problem and therefore debug faster.