From CS61
Jump to: navigation, search

Garbage collection

The cs61-exercises/fundamentals5x directory contains a program called nogcbench that benchmarks a dynamic memory allocator, m61gc.c. The nogcbench program allocates an array, called chunks, which can hold pointers to K chunks of memory. Then, N times, it frees a randomly-chosen chunk and allocates another one. Finally, it frees all chunks and the chunks array and exits.

It also contains a gcbench program that does the same thing—except that it never calls free. If you run gcbench on Linux, it will quickly run out of memory and crash.

Your job is to finish the conservative garbage collector in m61gc.c so that gcbench completes all 1000000 allocations and exits with no memory leaks—even though gcbench never explicitly calls free.

All the places you’ll modify are in m61gc.c.

Mark-sweep garbage collection

CS:APP3e section 9.10 describes garbage collection in more detail.

The job of a garbage collector is to reclaim memory that hasn’t been explicitly freed, but that the program will never access again. The GC might run, for example, when the program runs out of memory. C is generally not a garbage-collected language, but there do exist garbage collectors for C, and you’ll write one today!

We can distinguish between precise and conservative garbage collectors. A precise GC precisely detects all garbage: if an allocation cannot be referenced, it will be freed. Precise GC requires a strongly-typed language. Conservative GCs might get confused, but they only get confused in a safe direction (that’s why we call them “conservative”). They might think that an unreferenceable allocation is live (i.e., not garbage), but they will never think that a live allocation is unreferenceable.

At a high level, a GC acts like this.

  1. The collector examines all of memory and finds all active allocations.
  2. The collector frees all inactive allocations.

The first step is the hardest. How can the GC find which allocations are active? Your GC, like many C GCs, will look for patterns in memory that look like pointers into the heap. A heap allocation is live iff a pointer to that allocation is stored somewhere in memory. So where to start? Your GC will look for pointers in the stack and in global variables.

But remember that heap pointers might also be stored on the heap! For instance, in the gcbench program, pointers to chunks are stored in the chunks array, which is itself dynamically allocated. So if we find a live allocation, we need to then recursively check the contents of that allocation for more pointers.

Your work

  1. Modify m61_gc to unmark all active allocations. Use the marked member of the allocation structure.
  2. Complete mark_allocations. If a valid pointer is found in the indicated range of memory, mark that allocation—and additionally check its contents for more pointers. Hint: Check out find_allocation.
  3. Complete m61_gc to free inactive allocations.

When you complete this, ./gcbench should complete without crashing and ending with either 0 allocations or some other small number!

Advanced work

When you first complete your GC, it will likely be pretty slow. Speed it up as much as you can!


Q1. The first change I see in m61_gc is:

    // unmark all active allocations

However, at this point, I don't know which allocations are active, so how can I do this?

A1. This is an initialization step: the state of your allocs is as you left it when your last run of the garbage collector finished -- that is all the allocations that were active last time have been left in your table. Before beginning a new GC pass, you want to clear all the marked bits so that you'll be able to tell which allocations get marked this time.

Solution video

This one’s pretty long, mostly because we talk through and debug some common errors.

  • 1:32—exploring benchmark
  • 4:42—using diff and pipes
  • 7:29—starting the exercise
  • 8:40–11:52—some common errors in the “unmark all allocations” loop
  • 17:56–20:55—some common errors in mark_allocations pointer loading
  • 23:52—a non-crashing solution!
  • 23:52–27:42—performance optimization: avoid double-marking
  • 28:30–36:50—precision fix: free all unmarked allocations