- Ungraded intermediate checkin: Fri 9/17 (see Turnin, below, for more)
- Final turnin Fri 9/24 at 11:59:59pm EDT (one day later for DCE)
- Most of our problem sets, including this one, can be completed natively on Linux, Mac OS X (with Xcode and Homebrew), Docker, or, possibly, Windows Subsystem for Linux. We grade using Linux.
Goal
Most systems programming requires the explicit allocation and deallocation of dynamic memory. This lets us write fast code for modern machines, since we have full control over memory; and, sadly, it lets us write programs that crash due to memory problems. But we can also build tools that help us debug memory allocation problems.
In this problem set, you will write a debugging memory allocator. Specifically, your debugging allocator will:
- track memory usage,
- catch common programming errors (e.g., use after free, double free),
- detect writing off the end of dynamically allocated memory (e.g., writing 65 bytes into a 64-byte piece of memory), and
- catch less common, somewhat devious, programming errors.
You will also augment your debugging allocator with heavy hitter reporting that tells a programmer where most of the dynamically allocated memory is allocated.
The handout code for this problem set is in the pset1
directory. This
contains a skeleton memory allocator and many test programs. Your code
will go in m61.cc
.
Mechanics
This problem set serves several purposes for us and you.
- It gets you back into C/C++ programming, if you are a little rusty.
- It teaches you about byte wrangling, sizes, and the memory representations of information. This is great preparation for the rest of class.
- It introduces you to some common memory errors and shows how careful code can defend against them.
- You’ll learn how malloc works!
- It shows you some new algorithms.
- It will be fun!
Collaboration
This problem set is subject to the CS 61 collaboration policy.
Getting the code
Problem sets are released through GitHub Classroom. Our GitHub organization is “cs61”. Problem sets will be released using the cs61/cs61-f21-psets repository.
Follow the GitHub setup instructions to obtain your own
personal GitHub problem sets repository. (You may have already done this for
Pset 0.) Then, from your repository directory, run git pull handout main
to obtain pset 1, and git push
to save the results in your
GitHub. This should create a pset1/
subdirectory that contains your handout
code.
Planning, coding, and testing
Like much systems programming, the problem sets in this class reward
incremental work and test-driven development. Our handout code comes
with many tests, and our make check
targets run your code against all those
tests in order. It’s a good idea to work on the tests in order: get the simple
stuff working, then gradually add more complex functionality.
We encourage each student to complete this work independently (coding without a partner). Our solution involves tens of lines of code (not hundreds), which is modest, but doing the work yourself will teach you a lot.
The make check
commands will print out differences between actual and
expected outputs for each test. To print differences for a single test, try
make check-N
, as in make check-1
. If you want to see the full output of a
test, run that test on the command line; for example:
$ make test001
$ ./test001
Timing
A graduated submission process will encourage you to get started on the pset early.
- Every student should commit and upload an initial checkin by midnight on
Friday 9/17 (one day later for DCE). Your
m61
should pass at least teststest001.cc
throughtest013.cc
, but remember, this submission will not be graded and you won’t hear back from us unless we’re worried about your submission. - The complete problem set is due by Friday 9/24 (one day later for DCE).
We believe most students could comfortably complete the whole problem set, including heavy hitters, by 9/17. The earlier you complete the problem set, the earlier you can get started on the next one!
It is important to get started early—this problem set is not trivial even though you don’t have to write that much code to make it work.
Turnin
You will turn in your code using Git. In the pset1/ subdirectory, you
MUST include two text files called README.md
and AUTHORS.md
. The
README.md
file will include any extra credit attempted and comments
for graders. The AUTHORS.md
file should list your names as well as any
other collaborators and citations. (Just list “None” if you have none,
but we’d be surprised if you had no citations for the heavy hitter part
of the pset.)
Your submission will not be graded until you configure the grading server. Log in and enter your GitHub username and repository URL. Verify that all tests that you expect to pass continue to pass on the grading server.
Grading breakdown:
- 70% tested debugging allocator functions (the test programs we
hand out, plus others). If running make check reports
*** All tests succeeded!
you’ve probably got all these points. - 10% coding style. Remember to document your metadata.
- 20% design and implementation of heavy hitter reports.
Context
C-style memory allocation uses two basic functions, malloc
and
free
.
void* malloc(size_t size)
Allocate size
bytes of memory and returns a pointer to it. This
memory is not initialized (it can contain anything). Returns
nullptr
if the allocation failed (because size
was too big, or
memory is exhausted, or for whatever other reason).
void free(void* ptr)
Free a single block of memory previously allocated by malloc
.
The rules of malloc
and free
are simple: Allocate once, then
free once.
- Dynamically-allocated memory remains active until it explicitly
freed with a call to
free
. - A successful call to
malloc(sz)
returns a pointer ptr to “new” dynamically allocated memory. This means that the sz bytes of storage starting at address ptr are guaranteed not to overlap with any other active objects, including the program’s code and global variables, its stack, or with any other active dynamically allocated memory. - The pointer argument in
free(ptr)
must either equalnullptr
or a pointer to active dynamically-allocated memory. In particular:- It is not OK to call
free(ptr)
ifptr
points to the program’s code, or into its global variables, or into the stack. - It is not OK to call
free(ptr)
unlessptr
was returned by a previous call tomalloc
. - It is not OK to call
free(ptr)
ifptr
is currently inactive (i.e.,free(ptr)
was previously called with the same pointer argument, and theptr
memory block was not reused by anothermalloc()
).
- It is not OK to call
These errors are called invalid frees. The third error is also called a double free.
Some notes on boundary cases:
- In C++,
malloc(0)
should return a non-null pointer. Ifptr = malloc(0)
, thenptr
does not overlap with any other allocation and can be passed tofree()
. free(nullptr)
is allowed. It does nothing.malloc(sz)
returns memory whose alignment works for any object. (We’ll discuss alignment in class; for a preview, see CS:APP3e §3.9.3.) On 64-bit x86 machines, this means that the address value returned bymalloc()
must be evenly divisible by 16.
A secondary memory allocation function, calloc, allocates memory for an array of objects and clears it to zero.
void* calloc(size_t nmemb, size_t sz) {
void* ptr = malloc(sz * nmemb);
if (ptr != nullptr) {
memset(ptr, 0, sz * nmemb); // clear memory to 0
}
return ptr;
}
(NB: There’s actually a bug there! One of our tests would find it.)
You will work on our replacements for these functions, which are called
m61_malloc
, m61_free
, and m61_calloc
. Our versions of these
functions simply call basic versions, base_malloc
and base_free
. The m61
functions take extra filename and line number arguments; you’ll use these to
track where memory was allocated and to report where errors occur. Our header
file, m61.hh
, uses macros so that calls in the test programs supply these
arguments automatically.
A note on undefined behavior
Debugging allocators interact with undefined behavior. As we tell you in class, any program that invokes undefined behavior has no meaning. As far as the C++ language standard is concerned, once undefined behavior occurs, a program may do absolutely anything, such as force demons to fly out of your nose. Many of our tests explicitly invoke undefined behavior, and thus have no meaning. But helpful debuggers catch common bugs, and undefined-behavior bugs with malloc and free are common, so your helpful debugging allocator must produce specific warnings for these cases!
Is that even possible? Yes!
Debugging allocators work by making certain undefined behaviors well-defined. For instance, when a debugging allocator is in force, a program with a simple double free will reliably print a specific error message and exit before any truly undefined behavior occurs.
To accomplish this magic trick, debugging allocators make use of properties of
their implementations. Your debugging allocator should use our “base”
allocator, base_malloc
and base_free
, which is defined in basealloc.cc
.
These functions behave like malloc
and free
, but has the following
additional properties:
base_free
does not modify freed memory (the contents of freed storage remain unchanged until the storage is reused by a futurebase_malloc
).base_free
never returns freed memory to the operating system.
This makes it much easier to write a debugging allocator with
base_malloc/free
than with C’s default malloc/free
. For instance, the
following program is well-defined:
int main(int argc, char* argv[]) {
int* x = base_malloc(sizeof(int));
*x = 10;
base_free(x);
assert(*x == 10); // will always succeed: base_free doesn’t overwrite freed memory
}
Thus, although base_malloc
and base_free
can be used in the same way as
malloc
and free
, their behavior is undefined in fewer situations. In
contrast, this program, which uses the system malloc
and free
, always
exhibits undefined behavior (and the assertion will likely fail).
int main(int argc, char* argv[]) {
int* x = malloc(sizeof(int));
*x = 10;
free(x);
assert(*x == 10); // use-after-free undefined behavior according to language standard
}
If it uses the base_
allocator functions, your debugging allocator should
run tests 1 through 27 completely safely, with no sanitizer warnings. (Use
make SAN=1 check-1-27
to check.) Even though some of the test programs, such
as test020
, appear to invoke undefined behavior, your debugging allocator
library will catch the problems before the undefined behavior actually occurs.
But base_malloc
and base_free
do not eliminate all undefined behavior.
Double frees, invalid frees, and wild writes are still disastrous, and the
following program is just as bad as a version using the system allocator.
int main(int argc, char* argv[]) {
int* x = base_malloc(sizeof(int));
base_free(x);
base_free(x); // UNDEFINED BEHAVIOR DEATH
}
Your code should work with our handout version of basealloc.cc
. If you add
to or change basealloc.cc
, please explain this in your README.md
file.
Debugging allocator (70%)
This section describes the main functionality required for the debugging allocator, ordered from simpler to more complex, and from low-numbered to high-numbered tests.
Simple statistics (tests 1–17)
Implement the following function:
void m61_get_statistics(m61_statistics* stats)
Fill in the stats
structure with overall statistics about memory
allocations so far.
The m61_statistics
structure is defined like this:
struct m61_statistics {
unsigned long long nactive; // number of active allocations [#malloc - #free]
unsigned long long active_size; // number of bytes in active allocations
unsigned long long ntotal; // number of allocations, total
unsigned long long total_size; // number of bytes in allocations, total
unsigned long long nfail; // number of failed allocation attempts
unsigned long long fail_size; // number of bytes in failed allocation attempts
uintptr_t heap_min; // smallest address in any region ever allocated
uintptr_t heap_max; // largest address in any region ever allocated
};
Most of these statistics are easy to track, and you should tackle them first.
Run make check
to test your work. You’ll want to examine the test
programs to figure out what they’re doing and what output is expected. You
will notice some comments at the end of the file, such as this in
test001.cc
:
//! alloc count: active 0 total 0 fail 0
//! alloc size: active 0 total 0 fail 0
These lines define the expected output for the test. The make check
command checks your actual output against the expected output and
reports any discrepancies. (It does this by calling compare.pl
.)
Please note that you should not change the test files!
Stuck? Check our hints.
Allocation metadata (tests 18–19)
The hardest statistic to track is active_size
, since to track it, your
m61_free(ptr)
implementation must find the number of bytes allocated for
ptr
.
The best way to do this is called internal metadata. To implement internal
metadata, m61_malloc
adds additional space to the user’s requested size
when calling base_malloc
. It then stores metadata about the allocation,
including the original requested size, in the beginning portion of the
returned allocation. This metadata is called “internal metadata” because it is
stored inside the base allocation, rather than separately. You’ll define the
structure format for your internal metadata. m61_malloc
then returns a
pointer to the payload, which is the portion of the allocation following the
metadata. Your m61_free
code will take the payload pointer as input, and
then use address arithmetic to calculate the pointer to the corresponding
metadata (possible because the metadata has fixed size). From that metadata it
can read the size of the allocation. See CS:APP3e Figure 9.35 “Format of a
simple heap block” for an example of this type of layout.
Alternately, you could create a hash table that maps pointer values (or
addresses) to sizes. m61_malloc
would add an entry, and m61_free
would
check this table and then remove the entry. This is very straightforward, and
you might try this first.
For full credit, your code’s metadata must be bounded. This means that the
space allocated for metadata should not grow independently of the amount of
actively allocated space. If the user frees a lot of memory, then your code
must free most or all of the associated metadata. This means, for example,
that you should not store per-allocation metadata in an infinitely-growing
hash table or vector. (It is OK to keep a bounded amount of metadata about
freed allocations, such as statistics or a list of the last N freed
allocations.) Use test019
to check whether your metadata is bounded; the
test should report at most a few megabytes of peak memory allocated (not
hundreds or thousands of megabytes).
Too-big allocations (tests 20–23)
Your debugging malloc library should return a null pointer if the user asks for a too-big size that cannot be allocated. It also needs to be robust against integer overflow attacks. (See, for example CS:APP3e Aside "Security vulnerability in the XDR library", in section 2.3, p100.)
Our handout code’s m61_calloc
function is wrong. Fix this function and fix
any other integer overflow errors you find.
Invalid free and double-free detection (tests 24–34)
m61_free(ptr, file, line)
should print an error message and then
call C’s abort()
function when ptr
does not point to active
dynamically-allocated memory.
Some things to watch:
-
Be careful of calls like
free((void*) 0x16)
, where theptr
argument is notnullptr
but it also doesn’t point to heap memory. Your debugging allocator should not crash when passed such a pointer. It should print an error message and exit in an orderly way. -
The test programs define the desired error message format. Here’s our error message for test024:
MEMORY BUG: test024.cc:8: invalid free of pointer 0xffffffffffffffe0, not in heap
-
Error messages should be printed to standard error (using C’s
fprintf(stderr, ...)
, or, equivalently, C++’sstd::cerr
). -
Every error message must end in a newline. (If you leave out the newline, the message might not be printed in some environments.)
-
Different error situations require different error messages; the other test programs define the required messages.
-
Include the file name and line number of the problematic call to
free()
.
Boundary write error detection (tests 35–38)
A boundary error is when a program reads or writes memory beyond the actual dimensions of an allocated memory block. An example boundary write error is to write the 11th entry in an array of size 10:
int* array = (int*) malloc(10 * sizeof(int));
...
for (int i = 0; i <= 10 /* WHOOPS */; ++i) {
array[i] = calculate(i);
}
These kinds of errors are relatively common in practice. (Other errors can happen, such as writing to totally random locations in memory or writing to memory before the beginning of an allocated block, rather than after its end; but after-the-end boundary writes seem most common.)
A debugging memory allocator can’t detect boundary read errors, but
it can detect many boundary write errors. Your m61_free(ptr, file, line)
should print an error message and call abort()
if it detects
that the memory block associated with ptr
suffered a boundary write
error.
No debugging allocator can reliably detect all boundary write errors. For example, consider this:
int* array = (int*) malloc(10 * sizeof(int));
int secret = array[10]; // save boundary value
array[10] = 1384139431; // boundary write error
array[10] = secret; // restore old value! dmalloc can’t tell there was an error!
Or this:
int* array = (int*) malloc(10 * sizeof(int));
array[200000] = 0; // a boundary write error, but very far from the boundary!
We’re just expecting your code to catch common simple cases. You should definitely catch the case where the user writes one or more zero bytes directly after the allocated block.
Memory leak reporting (tests 39–42)
A memory leak happens when code allocates a block of memory but forgets to free it. Memory leaks are not as serious as other memory errors, particularly in short-running programs. They don’t cause a crash. (The operating system always reclaims all of a program’s memory when the program exits.) But in long-running programs, such as your browser, memory leaks have serious effect and are important to avoid.
Write an m61_print_leak_report()
function that, when called, prints a
report about every allocated object in the system. This report
should list every object that has been malloc
ed but not
free
d. Print the report to standard output (not standard
error). A report should follow the format of this example:
LEAK CHECK: test042.cc:13: allocated object 0x9b81140 with size 14
LEAK CHECK: test042.cc:14: allocated object 0x9b81110 with size 15
LEAK CHECK: test042.cc:11: allocated object 0x9b81170 with size 12
LEAK CHECK: test042.cc:18: allocated object 0x9b81050 with size 18
LEAK CHECK: test042.cc:10: allocated object 0x9b811e0 with size 11
LEAK CHECK: test042.cc:15: allocated object 0x9b810e0 with size 16
LEAK CHECK: test042.cc:16: allocated object 0x9b81080 with size 17
To use the leak checker, call m61_print_leak_report()
before exiting the
program, after cleaning up all known allocations using free
calls. Any
missing free
s should show up in the leak report.
To implement a leak checker, you’ll need to keep track of every active
allocated block of memory. It’s easiest to do this by adding more
information to the block metadata. You will also use the file
and
line
arguments to m61_malloc()
.
Note: You may assume that the
file
argument to these functions has static storage duration. This means you don’t need to allocate a copy of the string contents: it is perfectly safe to use theconst char* file
argument.Many students use a doubly-linked list for their metadata. Should you do this, check out our doubly-linked list implementation.
Advanced reports and checking
Our last test programs, test044.cc
and test045.cc
, require you to update
your reporting and error detection code to print better information and defend
against more diabolically invalid frees. You will need to read the test code
and understand what is being tested to defend against it.
Update your invalid free
message. After determining that a pointer is
invalid, your code should check whether the pointer is inside a
different allocated block. This will use the same structures you created
for the leak checker. If the invalid pointer is inside another block,
print out that block, like so:
MEMORY BUG: test044.cc:9: invalid free of pointer 0x833306c, not allocated
test044.cc:8: 0x833306c is 128 bytes inside a 2001 byte region allocated here
And make sure your invalid free detector can handle certain diabolical situations. What situations? Check the test code to find out!
Performance tests
Test programs test014.cc
and test015.cc
call m61_malloc
500,000 times,
supplying tens of thousands of different filename/line-number pairs. Your
solution should run test014
in a second or less. Our solution, with heavy
hitters, runs test014
in well under 1 second in Docker.
As mentioned above, test019.cc
should use at most a few megabytes of memory
during its execution. Hundreds or thousands of megabytes used indicates
unbounded metadata.
Style (10%)
Your code should be clean, clear, correct, and consistent. The most important style guideline is consistency. Our handout code follows a certain format (indentation amount, comment usage, variable names). We won’t grade you down if you choose a different format. But pick a format. Don’t write code that changes style from line to line.
For this problem set, additional coding guidelines will be incorporated into your grade.
- Document your metadata. Explain what additional data structures you used to track debugging and heavy-hitter information.
- Your metadata should be compact. The debugging allocator should introduce at most tens of bytes of overhead per allocation, not hundreds or thousands.
- Your malloc and free functions should be relatively fast. For full credit, you should add only O(1) additional work to a successful (i.e., non-erroneous) malloc or free call.
- Your malloc and free functions should not store unbounded metadata.
- Your implementation should not introduce memory errors or undefined
behaviors of its own. Again, your code should cleanly pass at least
make SAN=1 check-1-27
.
Heavy hitter reports (20%)
Memory allocation is expensive, and you can speed up a program a lot by optimizing how that program uses malloc and by optimizing malloc itself. (Did you know that both Google and Facebook have malloc specialists? Here’s Google’s tcmalloc, and Facebook liked jemalloc so much that they hired Jason Evans.)
But before optimizing a program, we must collect data. Programmer intuition is frequently wrong: programmers tend to assume the slowest code is either the code they found most difficult to write or the last thing they worked on. This recommends a memory allocation profiler—a tool that tracks and reports potential memory allocation problems.
Your job is to design and implement a particular kind of profiling, heavy hitter reports, for your memory allocator. This has two parts. You will:
-
Track the heaviest users of malloc() by code location (file and line). A “heavy” location is a location that is responsible for allocating many payload bytes.
-
Generate a readable report that summarizes this information, using this exact format (each location on one line, sorted in descending order by percentage):
HEAVY HITTER: hhtest-largealloc.cc:50: 348428831 bytes (~26.8%) HEAVY HITTER: hhtest-largealloc.cc:45: 297114828 bytes (~22.9%) HEAVY HITTER: hhtest-largealloc.cc:40: 244718839 bytes (~18.9%) HEAVY HITTER: hhtest-largealloc.cc:35: 201609163 bytes (~15.6%) HEAVY HITTER: hhtest-largealloc.cc:30: 98738132 bytes (~7.8%)
Rule 1: If a program makes lots of allocations, and a single line of code is responsible for 20% or more of the total payload bytes allocated by a program, then your heavy-hitter report should mention that line of code (possibly among others).
Rule 2: Your design should handle both large numbers of allocations
and large numbers of allocation sites. In particular, you should be
able to handle a program that calls malloc() at tens of thousands of
different file-line pairs (such as test014
).
Rule 3: Your report should include some information that helps the user decide which lines are likely to be the heaviest hitters, such as exact or estimated byte counts, or even just by ordering the hitters.
Note that you should only count payload bytes: don’t include allocator overhead such as metadata space.
How should you implement this? That’s up to you, but here are some pointers.
- Sampling is acceptable. It would be OK, for example, to report
information extrapolated from a sample of the allocations.
This could cut down the amount of data you need to store.
- You could sample exactly every Nth allocation, but random sampling
is usually better, since it avoids synchronization effects. (For
instance, if the program cycled among 4 different allocation sites,
then sampling every 20th allocation would miss 75% of the allocation
sites!) For random sampling you’ll need a source of randomness. Use
random()
or
drand48(),
or use the C++ standard library’s more modern randomness facility (see
test022.cc
for an example).
- You could sample exactly every Nth allocation, but random sampling
is usually better, since it avoids synchronization effects. (For
instance, if the program cycled among 4 different allocation sites,
then sampling every 20th allocation would miss 75% of the allocation
sites!) For random sampling you’ll need a source of randomness. Use
random()
or
drand48(),
or use the C++ standard library’s more modern randomness facility (see
- Clever, yet easy, algorithms can help you catch all heavy
hitters with O(1) space and simple data structures!
- A Simple Algorithm for Finding Frequent Elements in Streams and Bags, Karp, Shenker, and Papadimitriou
- Frequency Estimation of Internet Packet Streams with Limited Space, Demaine, López-Ortiz, and Munro. The paper’s context doesn’t matter; the relevant algorithms, "Algorithm MAJORITY" and "Algorithm FREQUENT," appear on pages 6-7, where they are simply and concisely presented. (You want FREQUENT, but MAJORITY is helpful for understanding.)
- YOU DO NOT NEED TO USE THESE ALGORITHMS! But why not take a look? They’re surprisingly simple.
We provide a test program for you to test heavy hitter reports,
hhtest.cc
. This program contains 40 different allocators that allocate
regions of different sizes. Its first argument, the skew, varies the
relative probabilities that each allocator is run. Before exiting, it
calls m61_print_heavy_hitters()
, a function defined in m61.cc
.
Running ./hhtest 0
will call every allocator with equal probability. But
some of the allocators allocate more data than others. So when we run our
simple heavy hitter detector against ./hhtest 0
, it reports:
HEAVY HITTER: hhtest-largealloc.cc:50: 348428831 bytes (~26.8%)
HEAVY HITTER: hhtest-largealloc.cc:45: 297114828 bytes (~22.9%)
HEAVY HITTER: hhtest-largealloc.cc:40: 244718839 bytes (~18.9%)
HEAVY HITTER: hhtest-largealloc.cc:35: 201609163 bytes (~15.6%)
HEAVY HITTER: hhtest-largealloc.cc:30: 98738132 bytes (~7.8%)
If we run ./hhtest 1
, however, then the first allocator is called twice as
often as the next allocator, which is called twice as often as the next
allocator, and so forth. There is almost no chance that allocator #39 is
called at all. The report for ./hhtest 1
is:
HEAVY HITTER: hhtest-tinyalloc.cc:5: 499043 bytes (~50.0%)
HEAVY HITTER: hhtest-tinyalloc.cc:10: 249136 bytes (~25.0%)
HEAVY HITTER: hhtest-tinyalloc.cc:15: 123995 bytes (~12.5%)
HEAVY HITTER: hhtest-tinyalloc.cc:20: 61625 bytes (~6.3%)
At some intermediate skews, though, and there may be no heavy hitters at
all. Our code reports nothing above 20% when run against ./hhtest 0.4
.
Negative skews call the large allocators more frequently.
./hhtest -0.4
:
HEAVY HITTER: hhtest-largealloc.cc:50: 3387377996 bytes (~40.5%)
HEAVY HITTER: hhtest-largealloc.cc:45: 2197955996 bytes (~26.3%)
HEAVY HITTER: hhtest-largealloc.cc:40: 1393219996 bytes (~16.7%)
HEAVY HITTER: hhtest-largealloc.cc:35: 861516348 bytes (~10.3%)
Try ./hhtest --help
to get a full description of hhtest
’s arguments.
You should test with many different arguments; for instance, make
sure you try different allocation “phases.” A great software
engineer would also create tests of her own; we encourage you to do
this!
This idea can be taken quite far. Google, for example, links a heavy-hitter detector with many important servers. It is possible (within Google) to connect to many servers and generate a graph of its current heavy hitter allocation sites, including their calling functions and relationships among functions. Here’s a little example (scroll down the page); here’s a bigger one. Have fun!!
Other links
Debugging allocators have a long history. dmalloc
is one of the early ones; here’s a list of some
others. Modern compilers
integrate both optional allocation debugging features and some debugging
features that are on by default. For instance, Mac OS and Linux’s memory
allocators can detect some boundary write errors, double frees, and so
forth. Recent clang and gcc -fsanitize=memory
arguments can catch
even more problems.
Feel free to look at the code and documentation for other allocators to get ideas, but make sure you cite them if you do.
Extra credit
We encourage you to learn more, flex your coding muscles, and show your mastery by doing some extra credit work! Please note that our policy is to not debug extra credit work for students.
Reallocation
Easy. Implement m61_realloc
. This function is based on the C
standard library’s realloc
, and should behave like this:
/// m61_realloc(ptr, sz, file, line)
/// Reallocate the dynamic memory pointed to by `ptr` to hold at least
/// `sz` bytes, returning a pointer to the new block. If `ptr` is
/// `nullptr`, behaves like `m61_malloc(sz, file, line)`. If `sz` is 0,
/// behaves like `m61_free(ptr, file, line)`. The allocation request
/// was at location `file`:`line`.
void* m61_realloc(void* ptr, size_t sz, const char* file, long line);
Frequent allocations
Moderately easy. Make your heavy-hitter detector find frequent allocations as well as heavy allocations. Allocating a single byte a billion times is expensive, even if the byte is immediately freed each time! So augment your heavy hitter detector to report call sites that are responsible for 20% or more of the total number of allocations the program made, even if those call sites aren’t responsible for many total bytes relative to other call sites. For good coding style, your solution should involve some code reuse.
More tests
Easy to difficult. Add more tests! Model your new tests after our existing
tests: add new files with names like test046.cc
. In your test files, explain
what is being tested and why your tests are not redundant with our existing
tests. Your debugging allocator should pass your tests; this may require
updating your allocator.
Here are some example tests you could add, but the point of this exercise is to think creatively.
- (Easy) Boundary write tests at different boundaries or with different offsets.
- (Moderate) Patterns that stress-test different parts of the allocator, such as double-free detection, performance, and filename/line-number pairs.
- (Hard) New kinds of wild-write and wild-free tests that attempt to corrupt
internal allocator state. Of course, it’s easy, and uninteresting, to write
a test that corrupts allocator state. The interesting thing is to write a
test that almost evades detection. For example, check out
nastytest1.cc
!
Alternately, show how to test your allocator itself using a test
framework, such as the afl
fuzz tester.
Fuzz testing is able to discover hidden problems in many kinds of code, but it
requires some setup.
Line number lookup
Difficult. Most of our test programs use macros that redefine
malloc
and free
to supply filename and line number information. But C++
discourages the use of fancy macros, and C++-style allocation, such as the
m61_allocator
class used in test035
and up, has no place for macros to go.
A C++ debugging memory allocator will use return address information to
identify call sites.
Change your debugging allocator so that m61_allocator
’s allocate
and
deallocate
functions pass information based on
__builtin_return_address
to the m61
functions. Then, make your allocator transform this information
numbers into real filename–line number pairs just before printing. For
instance, you can call out to an external program, such as Linux’s
addr2line
. But do not perform that transformation until printing, since the
transformation is expensive.
Use ./test043
to check your work. Before your fixes, the LEAK CHECK
lines
will start with ?:0
; afterwards, they should start with something like
.../test043.cc:NN
.
Notes
This lab was created for CS 61.