Garbage collection
The cs61-exercises/fundamentals5x
directory contains a program
called nogcbench
that benchmarks a dynamic memory allocator,
m61gc.c
. The nogcbench
program allocates an array, called
chunks
, which can hold pointers to K chunks of memory. Then, N
times, it frees a randomly-chosen chunk and allocates another one.
Finally, it frees all chunks and the chunks
array and exits.
It also contains a gcbench
program that does the same thing—except
that it never calls free. If you run gcbench
on Linux, it will
quickly run out of memory and crash.
Your job is to finish the conservative garbage collector in m61gc.c
so that gcbench
completes all 1000000 allocations and exits with no
memory leaks—even though gcbench
never explicitly calls free.
All the places you’ll modify are in m61gc.c
.
Mark-sweep garbage collection
CS:APP3e section 9.10 describes garbage collection in more detail.
The job of a garbage collector is to reclaim memory that hasn’t been explicitly freed, but that the program will never access again. The GC might run, for example, when the program runs out of memory. C is generally not a garbage-collected language, but there do exist garbage collectors for C, and you’ll write one today!
We can distinguish between precise and conservative garbage collectors. A precise GC precisely detects all garbage: if an allocation cannot be referenced, it will be freed. Precise GC requires a strongly-typed language. Conservative GCs might get confused, but they only get confused in a safe direction (that’s why we call them “conservative”). They might think that an unreferenceable allocation is live (i.e., not garbage), but they will never think that a live allocation is unreferenceable.
At a high level, a GC acts like this.
- The collector examines all of memory and finds all active allocations.
- The collector frees all inactive allocations.
The first step is the hardest. How can the GC find which allocations are active? Your GC, like many C GCs, will look for patterns in memory that look like pointers into the heap. A heap allocation is live iff a pointer to that allocation is stored somewhere in memory. So where to start? Your GC will look for pointers in the stack and in global variables.
But remember that heap pointers might also be stored on the heap! For
instance, in the gcbench
program, pointers to chunks are stored in
the chunks
array, which is itself dynamically allocated. So if we
find a live allocation, we need to then recursively check the contents
of that allocation for more pointers.
Your work
- Modify
m61_gc
to unmark all active allocations. Use themarked
member of theallocation
structure. - Complete
mark_allocations
. If a valid pointer is found in the indicated range of memory, mark that allocation—and additionally check its contents for more pointers. Hint: Check outfind_allocation
. - Complete
m61_gc
to free inactive allocations.
When you complete this, ./gcbench
should complete without crashing
and ending with either 0 allocations
or some other small number!
Advanced work
When you first complete your GC, it will likely be pretty slow. Speed it up as much as you can!
FAQ
Q1. The first change I see in m61_gc
is:
// unmark all active allocations
// YOUR CODE HERE
However, at this point, I don't know which allocations are active, so how can I do this?
A1. This is an initialization step: the state of your allocs
is as
you left it when your last run of the garbage collector finished -- that
is all the allocations that were active last time have been left in your
table. Before beginning a new GC pass, you want to clear all the marked
bits so that you'll be able to tell which allocations get marked
this time.
Solution video
This one’s pretty long, mostly because we talk through and debug some common errors.
- 1:32—exploring
benchmark
- 4:42—using
diff
and pipes - 7:29—starting the exercise
- 8:40–11:52—some common errors in the “unmark all allocations” loop
- 17:56–20:55—some common errors in
mark_allocations
pointer loading - 23:52—a non-crashing solution!
- 23:52–27:42—performance optimization: avoid double-marking
- 28:30–36:50—precision fix: free all unmarked allocations