Section 2: Memory bugs

Live object rule

C and C++ grant programmers fine-grained, low-level control over memory, including the ability to manufacture pointers to objects from integers. This is powerful but dangerous. It lets us write hexdump, for example, and examine memory as a program runs; and it is also central to the “disastrously central role that [C] has played in our ongoing computer security nightmare”.

Most memory errors boil down to violations of this requirement:

Live object rule. Every memory access made by a C++ program must be to a live object, meaning an object within its lifetime (having storage that has been allocated and has not yet been released).

Different kinds of object storage have different lifetimes. Lifetimes generally correlate to the segments in which objects are stored.

Exercise L1. Does this program violate the live object rule?

int f(int x) {
    return x;
}

int main() {
    printf("%d\n", f(2));
}

Exercise L2. Does this program violate the live object rule?

int global = 0;

int* f(int x) {
    return &global;
}

int main() {
    int* ptr = f(2);
    printf("%d\n", *ptr);
}

Exercise L3. Does this program violate the live object rule?

int* f(int x) {
    return &x;
}

int main() {
    int* ptr = f(2);
    printf("%d\n", *ptr);
}

Exercise L4. Does this program violate the live object rule?

int main() {
    int* ptr;
    int x = 2;
    ptr = &x;
    printf("%d\n", *ptr);
}

Exercise L5. Does this program violate the live object rule?

int main() {
    int* ptr;
    {
        int x = 2;
        ptr = &x;
    }
    printf("%d\n", *ptr);
}

Exercise L6. Does this program violate the live object rule?

const char* f(int x) {
    return "example";
}

int main() {
    printf("%s\n", f(2));
}

Exercise L7. Does this program violate the live object rule?

const char* f(int x) {
    char buf[100] = "example";
    return buf;
}

int main() {
    printf("%s\n", f(2));
}

Exercise L8. Does this program violate the live object rule?

const char* f(int x) {
    char buf[100] = "example";
    return buf;
}

int main() {
    printf("%p\n", f(2));
}

Exercise L9. Does this program violate the live object rule?

const char* f(int x) {
    char* buf = new char[100];
    strcpy(buf, "example");
    return buf;
}

int main() {
    const char* s = f(2);
    printf("%s\n", s);
    delete[] s;
}

Exercise L10. Does this program violate the live object rule?

const char* f(int x) {
    char* buf = new char[100];
    strcpy(buf, "example");
    return buf;
}

int main() {
    const char* s = f(2);
    delete[] s;
    printf("%s\n", s);
}

Exercise L11. Does this program violate the live object rule?

const char* f(int) {
    char* buf = new char[strlen("example")];
    strcpy(buf, "example");
    return buf;
}

int main() {
    const char* s = f(2);
    printf("%s\n", s);
    delete[] s;
}

Classifying memory errors

Memory errors are common enough in C and C++ that some frequently-occurring classes of error have specific names. The first eight of these errors are violations of the live object rule; the last two are violations of the dynamic allocation specification.

  1. Wild read: Reading memory that doesn’t belong to any object.
  2. Wild write: Writing memory that doesn’t belong to any object.
  3. Use-after-free: An access to an object that has already been destroyed. (Though typically used for dynamically-allocated objects, the term can apply to local variables with automatic lifetimes.)
  4. Null pointer dereference: A read or write access involving a null pointer.
  5. Out-of-bounds access: A read or write to an array that goes out of the array’s bounds. (Remember that forming a pointer at the end of an array is OK: in an array T a[N], it is OK to form the pointer &a[N], but not to dereference it.)
  6. Stack buffer overflow: A write out-of-bounds access to an array with automatic lifetime (a local variable).
  7. Boundary read: A read to memory after the end of an object.
  8. Boundary write: A write to memory after the end of an object.
  9. Invalid free: Passing a pointer to free (or delete) that was not allocated by malloc (or new).
  10. Double free: Freeing dynamically-allocated memory that has already been freed.

Note. Although “wild read” and “wild write” could be used for any live-object-rule violation, the intended connotation is generally access to a crazy area of memory unassociated with any known object.

Exercise C1. Which of the L* exercises above contain use-after-free errors?

Exercise C2. Which of the L* exercises contain out-of-bounds accesses?

Now consider this function.

const char* hulk_players[] = {
    "RUFFALO",
    "BANA",
    "FERRIGNO",
    "MIYAUCHI"
};

void hulk_greeting(const char* name) {
    char uc_name[16];

    // make uppercase version of `name`
    for (size_t i = 0; name[i] != 0; ++i) {
        uc_name[i] = toupper(name[i]);
    }
    uc_name[strlen(name)] = 0;
    printf("HULK SAY HELLO TO %s\n", uc_name);

    // find player
    for (size_t i = 0; hulk_players[i] != nullptr; ++i) {
        if (strcmp(hulk_players[i], uc_name) == 0) {
            printf("YOU GOOD ACTOR\n");
            return;
        }
    }
    printf("YOU NO PLAY HULK! HULK SMASH\n");
}

Exercise C3. What’s an argument to hulk_greeting that would cause no memory errors?

Exercise C4. What’s an argument to hulk_greeting that would cause a stack buffer overflow?

Exercise C5. What’s an argument to hulk_greeting that would cause a out-of-bounds read?

Exercise C6. What’s an argument to hulk_greeting that would cause a null pointer dereference?

Exercise C7. What’s an argument to hulk_greeting that would cause a wild read?

Exercise C8. What’s an argument to hulk_greeting that would cause a use after free error?

It’s worth noting that many C interfaces for I/O are vulnerable to stack buffer overflows, and these errors have caused critical software vulnerabilities over the years. For instance, the gets standard library function, which reads a line from standard input but offers no way to limit the number of characters read, is so dangerous that many operating systems now prohibit its use. (The CWE (“Common Weakness Enumeration”) database has some illustrative examples of buffer overflows.)

Memory bugs

Exercises M1–M8. The cs61-sections/s02 directory has several files named membug?.cc. What memory bugs are present in each file, if any? Before running, predict the effect of the memory bug; then run the program to observe its effects.

  • membug1.cc:
  • membug2.cc:
  • membug3.cc:
  • membug4.cc:
  • membug5.cc:
  • membug6.cc:
  • membug7.cc:
  • membug8.cc:

Addresses and performance (advanced material)

In the third lecture, we ran an accessor program that accessed a large array of integers in three orders, up, down, and random. We saw results like this:

./accessor -u up order 0.69 sec
./accessor -d down order 0.69 sec
./accessor -r random order 1.11 sec

(The difference can be even larger; run ./accessor yourself to find out.) We said that this difference could be explained by the access pattern: computers are faster at accessing data that is located close together, sequentially, than at accessing random data.

A small change to the benchmark can cause even weirder results. The s02/inserter.cc program inserts 50,000 integers, either in sequential “up” order (1, 2, 3, 4, …), reverse-sequential “down” order (49999, 49998, 49997, …), or random order, into a linked list, keeping the list sorted at all times. The critical loop that finds the correct insertion position looks like this:

auto it = ls.begin();
while (it != ls.end() && *it < idx) {
    ++it;
}
ls.insert(it, idx);

Exercise A1. Draw pictures demonstrating how the linked list grows for each insertion order. For example, here are the first steps in growing the list in up order:

start: list empty
+----+
| ls |
+----+

after inserting 0
+----+        +--------+
| ls |------->| node 0 |
+----+        +--------+

after inserting 1
+----+        +--------+        +--------+
| ls |------->| node 0 |------->| node 1 |
+----+        +--------+        +--------+

after inserting 2
+----+        +--------+        +--------+        +--------+
| ls |------->| node 0 |------->| node 1 |------->| node 2 |
+----+        +--------+        +--------+        +--------+

Exercise A2. How many iterator operations will be made in the course of the benchmark for each insertion order? Which insertion order will cause the most computation, and which the least? (Consider checking your work by modifying inserter.cc to track operations.)

Exercise A3. Run ./inserter with the -u, -d, and -r arguments. Which order is fastest and which is slowest? Any idea why?

Exercise A4. Modify s02/inserter.cc to print out the addresses of (the first 100) integers in the list after it is created. Run experiments with the three flags (-u, -d, -r). Does this help explain why -r is slower than -d at large N, even though -r makes fewer traversal steps?