This section covers concepts from the C++ standard library, especially
container data structures and iterators, using an example that might remind
you of part of Problem Set 1.
Prework: Container data structures and std::map
Simultaneously enrolled college students are responsible for attending
section live (in-person or via zoom). If you cannot attend any section
live in a particular week, you may watch a recording (Canvas > Zoom > Cloud
Recordings) instead. Please self-report your live attendance (or that you watched
a recording) using this attendance form.
How section works
We release section material via the CS 61 course site (where you’re reading
this now) and the cs61-sections repository (obtain a local copy with a
command like git clone git@github.com:cs61/cs61-sections
). TFs will work
through the section material at their own pace, stopping at exercises,
questions, and discussions so students can work through the material together.
It’s best if you bring your laptop. Solutions for most section problems are
available here, but you will learn more if you try to solve the problems
yourself first.
Section works well if students ask questions, and sometimes TFs will take the
section in an unexpected direction based on student requests. Please speak up!
If you have questions from lecture, now’s the time to ask.
Students will be expected to have digested all material from section handouts.
Container data structures (prework and reference)
C++’s standard containers library
(ref1,
ref2,
ref3)
offers several powerful container data structures, including:
std::map<K, V>
, a balanced tree structure that can find items by key and
can enumerate all items in sorted order by key;
std::vector<T>
, a growable contiguous array of objects;
std::list<T>
, a linked list of objects;
std::unordered_map<K, V>
, a hash table that can quickly find items by key
and can enumerate all items, but not in any particular order;
std::multimap<K, V>
, a version of std::map
that can store multiple
values with the same key;
- and several more.
Container data structures have similar APIs, and the algorithms library
(ref1,
ref2) offers generic algorithms
that can operate on many containers.
C++ container APIs resemble APIs in other languages, but do have some
unique features, especially in the way they use iterators
(ref) to
represent positions in a data structure.
In this section, we’ll concentrate on std::map
.
std::map
std::map<K, V>
(ref1,
ref2,
ref3) is a
container that maps keys of type K
to values of type V
. The key
and value types can be almost any C++ types, but K
objects must support
comparison with the less-than operator <
. (Integers, pointers, and C++
std::string
s all support <
; so do many other types.)
This code creates an empty map, then prints every element of the map (there
are none, so it prints nothing).
std::map<uintptr_t, const char*> m;
for (auto it = m.begin(); it != m.end(); ++it) {
fprintf(stderr, "key %zu, value %s\n", it->first, it->second);
}
The it
object is an iterator that represents a position in, or cursor into,
the map. A map iterator “points at” a key–value pair; the key is in
it->first
, the value in it->second
. (The iterator is actually an object,
not a pointer, but it supports pointer notation anyway.)
Several map functions return iterators. The m.begin()
iterator points at the
first key–value pair in the map. The special m.end()
iterator represents the
absence of an item. Its position is “one past” of all the valid positions in
the map; it’s illegal to refer to m.end()->first
or m.end()->second
.
There’s also m.find(K)
, which returns an iterator pointing at the element
with a given key, and m.lower_bound(K)
and a.upper_bound(K)
, which return
iterators pointing at elements nearest a given key (in a sense defined
more precisely below).
Iterators can also be moved and compared. Move an iterator to the next
position with ++it
(as you see above), and to the previous position with
--it
. Compare map iterators with ==
and !=
; two iterators are equal if
and only if they point to the same element.
The type of m.begin()
is std::map<uintptr_t, const char*>::iterator
. Ugh!
Rather than type that whole thing, in most situations you can just say auto
and the compiler will figure it out.
Insert elements with the insert
member function, which takes a key–value
pair argument:
#include <map> // gain access to `std::map`
#include <cstdio>
#include <cstdint> // gain access to `uintptr_t`
int main() {
std::map<uintptr_t, const char*> m;
m.insert({0, "Hello"});
m.insert({2, "!"});
m.insert({1, "world"});
for (auto it = m.begin(); it != m.end(); ++it) {
fprintf(stderr, "key %zu, value %s\n", it->first, it->second);
}
}
// ==> key 0, value Hello
// key 1, value world
// key 2, value !
insert({K, V})
does nothing if the element with key K
already exists. If
you want to modify any existing element, use m.insert_or_assign(K, V)
or,
shorter, m[K] = V
.
Question. m.find(K)
returns an iterator pointing at the element with
key K
. What does it return if K
isn’t contained in the map? It’s got to
return something!
Question. How can you tell whether an element with key K
is in a map?
Show solution
Using what we’ve discussed so far, you can check whether m.find(K) != m.end()
.
There are also shorthands that might be a little faster or easier to read:
m.contains(K) == true
when K
is in the map, and m.count(K) != 0
when K
is in the map.
Hide solution
Question. Write a test program to help figure out the exact sense in
which m.lower_bound(K)
returns an iterator pointing at the element
“nearest” a given key.
Show solution
I’d want to know if lower_bound
returns the nearest iterator by going to
the nearest lower key, or the nearest higher key, or the nearest key
by distance. I’m also curious of when lower_bound
return m.end()
, if
it ever does. Here’s a program testing those questions:
#include <map>
#include <cstdio>
#include <cstdint>
std::map<uintptr_t, const char*> m;
void print_lower_bound(uintptr_t key) {
auto it = m.lower_bound(key);
if (it == m.end()) {
fprintf(stderr, "lower_bound for %zu is at end\n", key);
} else {
fprintf(stderr, "lower_bound for %zu is at key %zu\n", key, it->first);
}
}
int test() {
m.insert({1, ""});
m.insert({3, ""});
m.insert({11, ""});
print_lower_bound(0);
print_lower_bound(1);
print_lower_bound(2);
print_lower_bound(4);
print_lower_bound(10);
print_lower_bound(11);
print_lower_bound(12);
}
// ==> lower_bound for 0 is at key 1
// lower_bound for 1 is at key 1
// lower_bound for 2 is at key 3
// lower_bound for 4 is at key 11
// lower_bound for 10 is at key 11
// lower_bound for 11 is at key 11
// lower_bound for 12 is at end
Hide solution
Question. What does m.lower_bound(K)
return?
Show solution
It appears that m.lower_bound(K)
returns the iterator pointing at the
first element whose key is greater than or equal to K
, or
m.end()
if K
is greater than all keys in the map.
Hide solution
Question. What does m.upper_bound(K)
return?
Show solution
m.upper_bound(K)
returns the iterator pointing at the first
element whose key is greater than K
, or m.end()
if K
is
greater than or equal to all keys in the map.
Hide solution
Question. Say that m.lower_bound(k) == m.upper_bound(k)
for some key
k
. What can you infer about the contents of map m
?
Erase elements with erase
, which takes either a key or one or two iterators.
m.erase(K)
erases the element with key K
, if there is one. m.erase(FIRST, LAST)
erases all elements between iterator FIRST
and iterator LAST
;
m.erase(m.begin(), m.end())
clears the container. m.erase(IT)
erases the
single element at iterator IT
.
Cloud Flap Sensor 5000

Source image from Shutterstock
In this section, you’ll help build the cloud storage back end for the Flappy
Bird Company’s new Cloud Flap Sensor 5000. This high-tech sensor straps onto a
bird’s back and counts the number of times it flaps its wings. When possible,
the sensor sends the number of recent flaps back to your server over the cell
phone network.
A sample from the sensor has three components:
- The start time, an integer measured in seconds since January 1, 1970.
- The duration, an integer number of seconds.
- The flap count, which is the integer number of times the bird flapped
its wings between time
start
and time start + duration
.
(A real sensor message would have more components, like a sensor ID, but we’ll
assume for now there’s only one interesting bird.)
The sensor typically sends messages with increasing start
time and no gaps:
a message with start == S
and duration == D
would be followed by a message
with start == S+D
. However, birds go all over the place, so sometimes
messages are lost and there will be gaps. A valid sensor will never generate
samples that overlap in time or samples with zero duration
.
These are the important functions:
void add_sample(uintptr_t start, size_t duration, size_t flapcount);
bool has_sample(uintptr_t t);
(uintptr_t
and size_t
are types of unsigned integer, where “unsigned”
means they cannot represent negative numbers. On x86-64 machines, they are
both synonyms for unsigned long
, and they are large enough to represent
numbers from 0 to 18,446,744,073,709,551,615. More
information.)
add_sample
records a sample from the sensor.
has_sample
returns true
if a sample has been recorded that covers time
t
(that is, a sample with start <= t && t < start + duration
), and
false
if not.
You’ll implement add_sample
and has_sample
, using a global data structure
you’ll define.
Question. Think about how would you implement these functions. What data
structures might you use? What factors might affect that choice?
Cloud Flap Sensor Map
In the rest of this section, we’ll implement Cloud Flap Sensor 5000 using
std::map
. Here’s our proposed type for the global data structure (see in
datareps1/flapmap.hh
):
struct sample {
uintptr_t start;
size_t duration;
size_t flapcount;
};
std::map<uintptr_t, sample> flapmap;
using flapmap_iter = std::map<uintptr_t, sample>::iterator; // less typing later
The flapmap
holds every sample
received from the sensor, indexed by
start
time.
Exercise. Implement add_sample(uintptr_t start, size_t duration, size_t flapcount)
in flapmap1.cc
. Test your work.
Show solution
void add_sample(uintptr_t start, size_t duration, size_t flapcount) {
flapmap.insert({start, {start, duration, flapcount}});
}
Or, more verbosely:
void add_sample(uintptr_t start, size_t duration, size_t flapcount) {
sample s = { .start = start, .duration = duration, .flapcount = flapcount };
flapmap.insert({start, s});
}
Hide solution
Question. What does your add_sample
do with a duplicate sample
(meaning a sample with the same start
time as a previously-recorded
sample)?
Show solution
Ours throws the new sample away! If you used insert_or_assign
, yours
would replace the old sample with the new one.
Hide solution
(Offline) Discussion. If the Flap Sensor 5000 were to break, it might start
reporting invalid samples. It’s important to think about robustness while
you program. What kinds of arguments to add_sample
might indicate a broken
sensor? What should add_sample
do when passed an invalid sample?
Show solution
We’d be suspicious of an add_sample
call with several kinds of weird arguments:
duration
is zero.
start
represents a time unreasonably far in the past.
start + duration
represents a time unreasonably far into the future.
flapcount
represents an unrealistically large count.
- The
start
—start + duration
interval overlaps a previously recorded
sample.
There’s an art to determining what’s “unreasonably far” or
“unrealistically large”; picking good thresholds can require domain
knowledge or real-world experience. For example, the threshold for an
unrealistically large flap count depends on the kind of bird—for a
seagull, which usually flaps 2.5/sec, 20/sec is very unrealistic, but
hummingbirds can go up to 80/sec. You might think that any sample in the
future is invalid, but it’s super common for clocks in the real world to
get out of sync, so it’d be wise to accept future samples with some error
tolerance.
As for what to do with an invalid sample, there’s no right answer! Perhaps
it would be best to log an error message, or to simply ignore the sample?
What do you think? The answer might depend on the number of invalid
samples received, on the way in which they’re invalid, and on the
consequences of recording them. You’ll see later, for example, that some
of our code depends on the flapmap
containing no overlapping samples, so
it is important not to store those (or to clip them to avoid overlaps).
For the rest of this section, though, we’ll assume all received samples
are valid.
Hide solution
Question. How can you tell if a sample with start time start
is in the
map?
Question. has_sample
is answering a slightly different question than
whether a sample with start time start
is in the map. How is it different?
Show solution
has_sample(t)
is supposed to return true
if t
is contained in a
sample. It will be true if flapmap.find(start) != flapmap.end()
, but it
will also be true in other situations.
Hide solution
Exercise. Implement has_sample(t)
. Test your work.
Show solution
One good first question: What’s the simplest, most obviously correct
way to implement has_sample(t)
? A natural answer is by iterating through
all the samples!
bool has_sample_iter(uintptr_t t) {
for (auto it = flapmap.begin(); it != flapmap.end(); ++it) {
if (it->first <= t && t < it->first + it->second.duration) {
return true;
}
}
return false;
}
We can check the correctness of this “obviously correct” implementation by
writing some tests. Try to select a range of tests that catch edge cases, as
well as obvious cases. For this problem, you might start with:
- Test an empty
flapmap
(any time should return false).
- Test a single-sample
flapmap
with:
- A time before the sample (expect false);
- A time matching the start time (expect true);
- A time in the middle of the sample (expect true);
- A time matching the end time (expect false);
- A time after the end time (expect false).
- Test a multi-sample
flapmap
. In addition to the above, perhaps:
- Test a time in a gap between samples (expect false).
- Test the border time between adjacent samples (expect true).
A good way implement tests is with the assert
macro. The argument to
assert
is an expression that you, the programmer, believe is definitely
true. The running program checks whether the expression is actually true,
and crashes if it’s not.
int main() {
// empty flapmap tests
assert(!has_sample(0));
assert(!has_sample(-19348));
assert(!has_sample(192132471));
// single-sample tests
add_sample(10, 10, 1);
assert(!has_sample(0));
assert(has_sample(10));
assert(has_sample(15));
assert(!has_sample(20));
assert(!has_sample(25));
// multi-sample tests
add_sample(20, 10, 1);
add_sample(40, 10, 1);
assert(!has_sample(0));
assert(has_sample(10));
assert(has_sample(15));
assert(has_sample(20));
assert(has_sample(25));
assert(!has_sample(30));
assert(!has_sample(35));
assert(has_sample(40));
assert(has_sample(45));
assert(!has_sample(50));
}
A successful run of this test program prints nothing! But an unsuccessful run prints an
error and an indication of a crash.
A full testing strategy would add some more tests—such as random tests—but the code is
so simple that a small set of tests can give us good confidence.
It’s really useful to have a known-good implementation, even a slow
one, because we can test that implementation for consistency with an
optimized version. Nevertheless, the iterator version is awfully slow: it
tests every sample. We are guaranteed that samples in the flapmap
never
overlap; can we make use of that to speed up has_sample
?
We spent a long time on lower_bound
and upper_bound
so we could use
them here. Let’s start with upper_bound
.
bool has_sample(uintptr_t t) {
auto it = flapmap.upper_bound(t);
Now it
points at the first
sample whose start time is greater than t
(or to flapmap.end()
if all
samples have start time ≤t
). If that sample exists, it will
never contain t
; but the previous sample might! So let’s check the
previous sample.
bool has_sample(uintptr_t t) {
auto it = flapmap.upper_bound(t);
--it;
return it->first <= t && t < it->first + it->second.duration;
}
Is checking just the one previous sample enough? Well, let’s run the tests to
see how they do. You should get a crash! Do you see the bug in this code?
Our function might try to move it
before flapmap.begin()
.
This is super illegal: there’s nothing there. If it == flapmap.begin()
, then
the first sample’s start time is greater than t
, so the function should
return false.
bool has_sample(uintptr_t t) {
auto it = flapmap.upper_bound(t);
if (it == flapmap.begin()) {
return false;
} else {
--it;
return it->first <= t && t < it->first + it->second.duration;
}
}
This is correct: the tests pass, and we can reason through its
correctness.
- Any samples at or after
it
cannot contain t
, because they have start
time >t
.
- If
it == flapmap.begin()
, then all samples have start
time >t
, so no sample can contain t
.
- Otherwise, let
prev
be the sample immediately before it
. This is the
last sample with start time ≤ t
.
- We explicitly check whether
prev
contains t
.
- No sample before
prev
can contain t
, because samples don’t overlap.
Any sample preceding prev
has end time ≤ prev.start
≤ t
, and so cannot contain t
.
You can also use lower_bound
. The code is a little uglier,
but still fine.
bool has_sample_lower_bound(uintptr_t t) {
auto it = flapmap.lower_bound(t);
if (it->first == t) {
return true;
}
// Otherwise, might be contained in the previous sample
if (it == flapmap.begin() /* no previous sample */) {
return false;
} else {
--it;
return it->first <= t && t < it->first + it->second.duration;
}
}
Hide solution
Offline Exercise. Implement the function sample_overlaps(start, duration)
, which tests whether a sample with start time start
and
duration duration
would overlap any sample in flapmap
. Implement some
tests of your function. Start by drawing some pictures: How many different
kinds of overlapping samples are there? Then implement the corresponding
tests.
Show solution
bool sample_overlaps(uintptr_t start, size_t duration) {
return has_sample(start)
|| flapmap.lower_bound(start) != flapmap.lower_bound(start + duration);
}
We can count four distinct classes of overlapping, plus some interesting cases
where there is no overlapping. In the diagrams below, ====
represents
existing samples and <--->
represents the queried range.
1. ============
<-------------> overlapping at beginning
2. =============
<-------------> overlapping at end
3. =============================
<-------------> contained within
4. ====
<-------------> containing
5. ==== =====
<-------------> between (should not overlap)
6. ====== ========
<-------------> touching (should not overlap)
These tests cover those classes:
add_sample(0, 5, 1);
add_sample(10, 5, 1);
assert(sample_overlaps(4, 4));
assert(sample_overlaps(8, 4));
assert(sample_overlaps(1, 2));
assert(sample_overlaps(8, 10));
assert(!sample_overlaps(6, 3));
assert(!sample_overlaps(5, 5));
Hide solution
Too many samples
As you test Flap Sensor 5000 in the wild, you encounter a “success disaster”:
the sensor reports so many samples that your program runs out of memory!
Analysis shows that the problem is that the sensor is reporting more
information than you need: the sensor reports a flap count every second, but
your users only need average flap counts per hour. That said, your users
still want to make sure that all sample gaps are preserved (you should not
average over a gap in the sample stream).
Discussion. How would you propose changing the data collection process
to reduce memory usage under these constraints?
Sample coalescing
You decide to implement a sample coalescing strategy: add_sample
will
combine adjacent samples by adding their flap counts together.
For simplicity’s sake, the rest of this section will not limit how many
samples can be combined, but it’s not too difficult to limit coalescing
to at most one hour at a time. And remember, you can assume that samples
never overlap.
Exercise. Implement can_coalesce_up(flapmap_iter it)
, which tests
whether the sample at position it
can be combined with the next sample in
time. If it
has no next sample, can_coalesce_up(it)
should return false
.
Show solution
bool can_coalesce_up(flapmap_iter it) {
assert(it != flapmap.end());
// Check if next sample exists
auto next = it;
++next;
if (next == flapmap.end()) {
return false;
}
return it->first + it->second.duration == next->first;
}
Hide solution
Exercise. Implement coalesce_up(flapmap_iter it)
, which coalesces the
sample at position it
with the next sample in time. You can assume that
can_coalesce_up(it)
.
Show solution
void coalesce_up(flapmap_iter it) {
assert(can_coalesce_up(it));
auto next = it;
++next;
it->second.duration += next->second.duration;
it->second.flapcount += next->second.flapcount;
flapmap.erase(next);
}
Hide solution
Note. It’s important when modifying a container to keep track of a
property called iterator validity, which means whether an existing
iterator into the container remains valid. Different containers have
different validity properties. Maps are relatively generous: after an
insert
operation, all previous iterators into the container remain valid
(they still point at the same element). Iterators are also valid after
map::erase
, except that any iterators pointing at erased elements
become invalid and illegal to access.
Question. Say that flapmap
is completely coalesced, meaning that
there are no adjacent samples in flapmap
(so can_coalesce_up(it)
would
return false
for every iterator it
in the map). A new sample arrives for
times [start
, start+duration
). How many coalesce_up
operations would be
required to ensure that the new flapmap
was completely coalesced?
Show solution
Two, if samples can arrive out of order! For example, consider these samples:
add_sample(0, 10, 1);
add_sample(20, 10, 1);
// map now contains samples [0, 10) and [20, 30)
add_sample(10, 10, 1);
// map should contain just one sample, for [0, 30): this requires two coalescings
Hide solution
Exercise. Implement can_coalesce_down(flapmap_iter it)
, which tests
whether the sample at position it
can be combined with the previous
sample in time. If it
has no previous sample, can_coalesce_down(it)
should return false
.
Show solution
bool can_coalesce_down(flapmap_iter it) {
assert(it != flapmap.end());
// Check if previous sample exists
if (it == flapmap.begin()) {
return false;
}
auto prev = it;
--prev;
return prev->first + prev->second.duration == it->first;
}
Hide solution
Exercise. Implement a version of add_sample
that maximally coalesces
the new sample with adjacent samples.
Show solution
This version is probably easiest to read, which is a really important property.
void add_sample(uintptr_t start, size_t duration, size_t flapcount) {
// Strategy: first insert, then coalesce
flapmap.insert({start, {start, duration, flapcount}});
auto it = flapmap.find(start);
while (can_coalesce_down(it)) {
--it;
}
while (can_coalesce_up(it)) {
coalesce_up(it);
}
}
Hide solution
Discussion. Can you think of any ways to optimize this coalescing
add_sample
? Would you recommend implementing them?
Show solution
The version above isn’t the absolute fastest: it creates a new sample object
even if the new sample can be coalesced immediately. The simplest optimization
might be to check if the new sample can be coalesced with the most recent
sample in the map. That case seems likely to happen often, since samples will
generally arrive in increasing order.
void add_sample_faster_common_case(uintptr_t start, size_t duration, size_t flapcount) {
if (!flapmap.empty()) {
auto it = flapmap.end();
--it;
if (it->first + it->second.duration == start) {
it->second.duration += duration;
it->second.flapcount += flapcount;
return;
}
}
add_sample_original(start, duration, flapcount);
}
With a small tweak, this can coalesce samples with previous samples even if
samples are inserted out of order.
void add_sample_faster(uintptr_t start, size_t duration, size_t flapcount) {
auto it = flapmap.lower_bound(start);
if (it != flapmap.begin()) {
--it;
if (it->first + it->second.duration == start) {
it->second.duration += duration;
it->second.flapcount += flapcount;
return;
}
}
add_sample_original(start, duration, flapcount);
}
But that version doesn’t coalesce maximally! A sample that fills a gap will be
coalesced down, but not up.
We can do better by combining the original add_sample
code with our faster
version, reasoning carefully about where iterators point. Here’s the code:
void add_sample_integrated(uintptr_t start, size_t duration, size_t flapcount) {
auto it = flapmap.lower_bound(start);
// `it` points at the first sample with start time >= start.
// Change `it` to point at the previous sample: if our sample can be coalesced
// down, that’ll be the one we want.
if (it != flapmap.begin()) {
--it;
}
// Now `it` points at the *last* sample with start time *less than* start,
// if there is such a sample.
// Try coalescing down.
if (it != flapmap.end()
&& it->first + it->second.duration == start) {
it->second.duration += duration;
it->second.flapcount += flapcount;
// Now `it` points at the sample containing the new sample.
} else {
// Can’t coalesce down, so insert.
flapmap.insert({start, {start, duration, flapcount}});
it = flapmap.find(start);
// Now `it` points at the sample containing the new sample.
// (The following line is equivalent to `insert + find`, but even faster,
// because insert_or_assign returns the relevant iterator:)
// it = flapmap.insert_or_assign(start, {start, duration, flapcount});
}
// `it` always points at the new sample.
// Coalesce up.
while (can_coalesce_up(it)) {
coalesce_up(it);
}
}
Is this worth it? Well, that’s a difficult question. I’d say no,
unless you experience performance problems with the simpler solution.
Sometimes it’s emotionally hard to avoid optimizing, but before doing something
complicated, you should always develop a good testing
strategy.
Hide solution
Discussion. How does the Cloud Flap Sensor 5000 problem relate to
Problem Set 1?