This is not the current version of the class.

Datarep Section 1: C++ data structures

This section covers concepts from the C++ standard library, especially container data structures and iterators, using an example that might remind you of part of Problem Set 1.

Prework: Container data structures and std::map

Simultaneously enrolled college students are responsible for attending section live (in-person or via zoom). If you cannot attend any section live in a particular week, you may watch a recording (Canvas > Zoom > Cloud Recordings) instead. Please self-report your live attendance (or that you watched a recording) using this attendance form.

How section works

We release section material via the CS 61 course site (where you’re reading this now) and the cs61-sections repository (obtain a local copy with a command like git clone git@github.com:cs61/cs61-sections). TFs will work through the section material at their own pace, stopping at exercises, questions, and discussions so students can work through the material together. It’s best if you bring your laptop. Solutions for most section problems are available here, but you will learn more if you try to solve the problems yourself first.

Section works well if students ask questions, and sometimes TFs will take the section in an unexpected direction based on student requests. Please speak up! If you have questions from lecture, now’s the time to ask.

Students will be expected to have digested all material from section handouts.

Container data structures (prework and reference)

C++’s standard containers library (ref1, ref2, ref3) offers several powerful container data structures, including:

Container data structures have similar APIs, and the algorithms library (ref1, ref2) offers generic algorithms that can operate on many containers.

C++ container APIs resemble APIs in other languages, but do have some unique features, especially in the way they use iterators (ref) to represent positions in a data structure.

In this section, we’ll concentrate on std::map.

std::map

std::map<K, V> (ref1, ref2, ref3) is a container that maps keys of type K to values of type V. The key and value types can be almost any C++ types, but K objects must support comparison with the less-than operator <. (Integers, pointers, and C++ std::strings all support <; so do many other types.)

This code creates an empty map, then prints every element of the map (there are none, so it prints nothing).

std::map<uintptr_t, const char*> m;

for (auto it = m.begin(); it != m.end(); ++it) {
    fprintf(stderr, "key %zu, value %s\n", it->first, it->second);
}

The it object is an iterator that represents a position in, or cursor into, the map. A map iterator “points at” a key–value pair; the key is in it->first, the value in it->second. (The iterator is actually an object, not a pointer, but it supports pointer notation anyway.)

Several map functions return iterators. The m.begin() iterator points at the first key–value pair in the map. The special m.end() iterator represents the absence of an item. Its position is “one past” of all the valid positions in the map; it’s illegal to refer to m.end()->first or m.end()->second. There’s also m.find(K), which returns an iterator pointing at the element with a given key, and m.lower_bound(K) and a.upper_bound(K), which return iterators pointing at elements nearest a given key (in a sense defined more precisely below).

Iterators can also be moved and compared. Move an iterator to the next position with ++it (as you see above), and to the previous position with --it. Compare map iterators with == and !=; two iterators are equal if and only if they point to the same element.

The type of m.begin() is std::map<uintptr_t, const char*>::iterator. Ugh! Rather than type that whole thing, in most situations you can just say auto and the compiler will figure it out.

Insert elements with the insert member function, which takes a key–value pair argument:

#include <map> // gain access to `std::map`
#include <cstdio>
#include <cstdint> // gain access to `uintptr_t`

int main() {
    std::map<uintptr_t, const char*> m;
    m.insert({0, "Hello"});
    m.insert({2, "!"});
    m.insert({1, "world"});

    for (auto it = m.begin(); it != m.end(); ++it) {
        fprintf(stderr, "key %zu, value %s\n", it->first, it->second);
    }
}

// ==> key 0, value Hello
//     key 1, value world
//     key 2, value !

insert({K, V}) does nothing if the element with key K already exists. If you want to modify any existing element, use m.insert_or_assign(K, V) or, shorter, m[K] = V.

Question. m.find(K) returns an iterator pointing at the element with key K. What does it return if K isn’t contained in the map? It’s got to return something!

Show solution

Question. How can you tell whether an element with key K is in a map?

Show solution

Question. Write a test program to help figure out the exact sense in which m.lower_bound(K) returns an iterator pointing at the element “nearest” a given key.

Show solution

Question. What does m.lower_bound(K) return?

Show solution

Question. What does m.upper_bound(K) return?

Show solution

Question. Say that m.lower_bound(k) == m.upper_bound(k) for some key k. What can you infer about the contents of map m?

Show solution

Erase elements with erase, which takes either a key or one or two iterators. m.erase(K) erases the element with key K, if there is one. m.erase(FIRST, LAST) erases all elements between iterator FIRST and iterator LAST; m.erase(m.begin(), m.end()) clears the container. m.erase(IT) erases the single element at iterator IT.


Cloud Flap Sensor 5000

Flappy birds flapping

Source image from Shutterstock

In this section, you’ll help build the cloud storage back end for the Flappy Bird Company’s new Cloud Flap Sensor 5000. This high-tech sensor straps onto a bird’s back and counts the number of times it flaps its wings. When possible, the sensor sends the number of recent flaps back to your server over the cell phone network.

A sample from the sensor has three components:

  1. The start time, an integer measured in seconds since January 1, 1970.
  2. The duration, an integer number of seconds.
  3. The flap count, which is the integer number of times the bird flapped its wings between time start and time start + duration.

(A real sensor message would have more components, like a sensor ID, but we’ll assume for now there’s only one interesting bird.)

The sensor typically sends messages with increasing start time and no gaps: a message with start == S and duration == D would be followed by a message with start == S+D. However, birds go all over the place, so sometimes messages are lost and there will be gaps. A valid sensor will never generate samples that overlap in time or samples with zero duration.

These are the important functions:

void add_sample(uintptr_t start, size_t duration, size_t flapcount);
bool has_sample(uintptr_t t);

(uintptr_t and size_t are types of unsigned integer, where “unsigned” means they cannot represent negative numbers. On x86-64 machines, they are both synonyms for unsigned long, and they are large enough to represent numbers from 0 to 18,446,744,073,709,551,615. More information.)

You’ll implement add_sample and has_sample, using a global data structure you’ll define.

Question. Think about how would you implement these functions. What data structures might you use? What factors might affect that choice?


Cloud Flap Sensor Map

In the rest of this section, we’ll implement Cloud Flap Sensor 5000 using std::map. Here’s our proposed type for the global data structure (see in datareps1/flapmap.hh):

struct sample {
    uintptr_t start;
    size_t duration;
    size_t flapcount;
};

std::map<uintptr_t, sample> flapmap;
using flapmap_iter = std::map<uintptr_t, sample>::iterator; // less typing later

The flapmap holds every sample received from the sensor, indexed by start time.

Exercise. Implement add_sample(uintptr_t start, size_t duration, size_t flapcount) in flapmap1.cc. Test your work.

Show solution

Question. What does your add_sample do with a duplicate sample (meaning a sample with the same start time as a previously-recorded sample)?

Show solution

(Offline) Discussion. If the Flap Sensor 5000 were to break, it might start reporting invalid samples. It’s important to think about robustness while you program. What kinds of arguments to add_sample might indicate a broken sensor? What should add_sample do when passed an invalid sample?

Show solution

Question. How can you tell if a sample with start time start is in the map?

Show solution

Question. has_sample is answering a slightly different question than whether a sample with start time start is in the map. How is it different?

Show solution

Exercise. Implement has_sample(t). Test your work.

Show solution

Offline Exercise. Implement the function sample_overlaps(start, duration), which tests whether a sample with start time start and duration duration would overlap any sample in flapmap. Implement some tests of your function. Start by drawing some pictures: How many different kinds of overlapping samples are there? Then implement the corresponding tests.

Show solution

Too many samples

As you test Flap Sensor 5000 in the wild, you encounter a “success disaster”: the sensor reports so many samples that your program runs out of memory!1

Analysis shows that the problem is that the sensor is reporting more information than you need: the sensor reports a flap count every second, but your users only need average flap counts per hour. That said, your users still want to make sure that all sample gaps are preserved (you should not average over a gap in the sample stream).

Discussion. How would you propose changing the data collection process to reduce memory usage under these constraints?


Sample coalescing

You decide to implement a sample coalescing strategy: add_sample will combine adjacent samples by adding their flap counts together.

For simplicity’s sake, the rest of this section will not limit how many samples can be combined, but it’s not too difficult to limit coalescing to at most one hour at a time. And remember, you can assume that samples never overlap.

Exercise. Implement can_coalesce_up(flapmap_iter it), which tests whether the sample at position it can be combined with the next sample in time. If it has no next sample, can_coalesce_up(it) should return false.

Show solution

Exercise. Implement coalesce_up(flapmap_iter it), which coalesces the sample at position it with the next sample in time. You can assume that can_coalesce_up(it).

Show solution

Note. It’s important when modifying a container to keep track of a property called iterator validity, which means whether an existing iterator into the container remains valid. Different containers have different validity properties. Maps are relatively generous: after an insert operation, all previous iterators into the container remain valid (they still point at the same element). Iterators are also valid after map::erase, except that any iterators pointing at erased elements become invalid and illegal to access.

Question. Say that flapmap is completely coalesced, meaning that there are no adjacent samples in flapmap (so can_coalesce_up(it) would return false for every iterator it in the map). A new sample arrives for times [start, start+duration). How many coalesce_up operations would be required to ensure that the new flapmap was completely coalesced?

Show solution

Exercise. Implement can_coalesce_down(flapmap_iter it), which tests whether the sample at position it can be combined with the previous sample in time. If it has no previous sample, can_coalesce_down(it) should return false.

Show solution

Exercise. Implement a version of add_sample that maximally coalesces the new sample with adjacent samples.

Show solution

Discussion. Can you think of any ways to optimize this coalescing add_sample? Would you recommend implementing them?

Show solution

Discussion. How does the Cloud Flap Sensor 5000 problem relate to Problem Set 1?


  1. This is not realistic unless the sensor is broken. ↩︎