Live object rule
C and C++ grant programmers fine-grained, low-level control over memory,
including the ability to manufacture pointers to objects from integers. This
is powerful but dangerous. It lets us write hexdump
, for example, and
examine memory as a program runs; and it is also central to the “disastrously
central role that
[C] has played in our ongoing computer security
nightmare”.
Most memory errors boil down to violations of this requirement:
Live object rule. Every memory access made by a C++ program must be to a live object, meaning an object within its lifetime (having storage that has been allocated and has not yet been released).
Different kinds of object storage have different lifetimes. Lifetimes generally correlate to the segments in which objects are stored.
-
Static lifetime (global variables; code and data segments): These objects live for as long as the program.
-
Automatic lifetime (local variables; stack segment): These objects live for as long as their containing block. They are allocated at their point of declaration and released when the program exits the corresponding block.
-
Dynamic lifetime (dynamically-allocated variables; heap segment): These objects have user-defined lifetime. They are allocated by an allocation function (
malloc
ornew
) and released by a deallocation function (free
ordelete
).
Exercise L1. Does this program violate the live object rule?
int f(int x) { return x; } int main() { printf("%d\n", f(2)); }
Exercise L2. Does this program violate the live object rule?
int global = 0; int* f(int x) { return &global; } int main() { int* ptr = f(2); printf("%d\n", *ptr); }
Exercise L3. Does this program violate the live object rule?
int* f(int x) { return &x; } int main() { int* ptr = f(2); printf("%d\n", *ptr); }
Exercise L4. Does this program violate the live object rule?
int main() { int* ptr; int x = 2; ptr = &x; printf("%d\n", *ptr); }
Exercise L5. Does this program violate the live object rule?
int main() { int* ptr; { int x = 2; ptr = &x; } printf("%d\n", *ptr); }
Exercise L6. Does this program violate the live object rule?
const char* f(int x) { return "example"; } int main() { printf("%s\n", f(2)); }
Exercise L7. Does this program violate the live object rule?
const char* f(int x) { char buf[100] = "example"; return buf; } int main() { printf("%s\n", f(2)); }
Exercise L8. Does this program violate the live object rule?
const char* f(int x) { char buf[100] = "example"; return buf; } int main() { printf("%p\n", f(2)); }
Exercise L9. Does this program violate the live object rule?
const char* f(int x) { char* buf = new char[100]; strcpy(buf, "example"); return buf; } int main() { const char* s = f(2); printf("%s\n", s); delete[] s; }
Exercise L10. Does this program violate the live object rule?
const char* f(int x) { char* buf = new char[100]; strcpy(buf, "example"); return buf; } int main() { const char* s = f(2); delete[] s; printf("%s\n", s); }
Exercise L11. Does this program violate the live object rule?
const char* f(int) { char* buf = new char[strlen("example")]; strcpy(buf, "example"); return buf; } int main() { const char* s = f(2); printf("%s\n", s); delete[] s; }
Classifying memory errors
Memory errors are common enough in C and C++ that some frequently-occurring classes of error have specific names. The first eight of these errors are violations of the live object rule; the last two are violations of the dynamic allocation specification.
- Wild read: Reading memory that doesn’t belong to any object.
- Wild write: Writing memory that doesn’t belong to any object.
- Use-after-free: An access to an object that has already been destroyed. (Though typically used for dynamically-allocated objects, the term can apply to local variables with automatic lifetimes.)
- Null pointer dereference: A read or write access involving a null pointer.
- Out-of-bounds access: A read or write to an array that goes out of the
array’s bounds. (Remember that forming a pointer at the end of an
array is OK: in an array
T a[N]
, it is OK to form the pointer&a[N]
, but not to dereference it.) - Stack buffer overflow: A write out-of-bounds access to an array with automatic lifetime (a local variable).
- Boundary read: A read to memory after the end of an object.
- Boundary write: A write to memory after the end of an object.
- Invalid free: Passing a pointer to
free
(ordelete
) that was not allocated bymalloc
(ornew
). - Double free: Freeing dynamically-allocated memory that has already been freed.
Note. Although “wild read” and “wild write” could be used for any live-object-rule violation, the intended connotation is generally access to a crazy area of memory unassociated with any known object.
Exercise C1. Which of the L* exercises above contain use-after-free errors?
Exercise C2. Which of the L* exercises contain out-of-bounds accesses?
Now consider this function.
const char* hulk_players[] = {
"RUFFALO",
"BANA",
"FERRIGNO",
"MIYAUCHI"
};
void hulk_greeting(const char* name) {
char uc_name[16];
// make uppercase version of `name`
for (size_t i = 0; name[i] != 0; ++i) {
uc_name[i] = toupper(name[i]);
}
uc_name[strlen(name)] = 0;
printf("HULK SAY HELLO TO %s\n", uc_name);
// find player
for (size_t i = 0; hulk_players[i] != nullptr; ++i) {
if (strcmp(hulk_players[i], uc_name) == 0) {
printf("YOU GOOD ACTOR\n");
return;
}
}
printf("YOU NO PLAY HULK! HULK SMASH\n");
}
Exercise C3. What’s an argument to
hulk_greeting
that would cause no memory errors?
Exercise C4. What’s an argument to
hulk_greeting
that would cause a stack buffer overflow?
Exercise C5. What’s an argument to
hulk_greeting
that would cause a out-of-bounds read?
Exercise C6. What’s an argument to
hulk_greeting
that would cause a null pointer dereference?
Exercise C7. What’s an argument to
hulk_greeting
that would cause a wild read?
Exercise C8. What’s an argument to
hulk_greeting
that would cause a use after free error?
It’s worth noting that many C interfaces for I/O are vulnerable to stack
buffer overflows, and these
errors have caused critical software vulnerabilities over the years. For
instance, the gets
standard library function, which reads a line from
standard input but offers no way to limit the number of characters read, is so
dangerous that many operating systems now prohibit its use. (The CWE (“Common
Weakness Enumeration”) database has some illustrative examples of buffer
overflows.)
Memory bugs
Exercises M1–M8. The
cs61-sections/s02
directory has several files namedmembug?.cc
. What memory bugs are present in each file, if any? Before running, predict the effect of the memory bug; then run the program to observe its effects.
membug1.cc
:membug2.cc
:membug3.cc
:membug4.cc
:membug5.cc
:membug6.cc
:membug7.cc
:membug8.cc
:
Addresses and performance (advanced material)
In the third lecture, we ran an accessor
program that
accessed a large array of integers in three orders, up, down, and random. We
saw results like this:
./accessor -u |
up order | 0.69 sec |
./accessor -d |
down order | 0.69 sec |
./accessor -r |
random order | 1.11 sec |
(The difference can be even larger; run ./accessor
yourself to find out.) We
said that this difference could be explained by the access pattern: computers
are faster at accessing data that is located close together, sequentially,
than at accessing random data.
A small change to the benchmark can cause even weirder results. The
s02/inserter.cc
program inserts 50,000 integers, either in sequential “up”
order (1, 2, 3, 4, …), reverse-sequential “down” order (49999, 49998, 49997,
…), or random order, into a linked list, keeping the list sorted at all times.
The critical loop that finds the correct insertion position looks like this:
auto it = ls.begin();
while (it != ls.end() && *it < idx) {
++it;
}
ls.insert(it, idx);
Exercise A1. Draw pictures demonstrating how the linked list grows for each insertion order. For example, here are the first steps in growing the list in up order:
start: list empty +----+ | ls | +----+ after inserting 0 +----+ +--------+ | ls |------->| node 0 | +----+ +--------+ after inserting 1 +----+ +--------+ +--------+ | ls |------->| node 0 |------->| node 1 | +----+ +--------+ +--------+ after inserting 2 +----+ +--------+ +--------+ +--------+ | ls |------->| node 0 |------->| node 1 |------->| node 2 | +----+ +--------+ +--------+ +--------+
Exercise A2. How many iterator operations will be made in the course of the benchmark for each insertion order? Which insertion order will cause the most computation, and which the least? (Consider checking your work by modifying
inserter.cc
to track operations.)
Exercise A3. Run
./inserter
with the-u
,-d
, and-r
arguments. Which order is fastest and which is slowest? Any idea why?
Exercise A4. Modify
s02/inserter.cc
to print out the addresses of (the first 100) integers in the list after it is created. Run experiments with the three flags (-u
,-d
,-r
). Does this help explain why-r
is slower than-d
at largeN
, even though-r
makes fewer traversal steps?