Section 1: C++ data structures

This section covers important parts of the C++ standard library, especially container data structures. We also continue to explore memory and data representation.

How section works

We release section material via the CS 61 course site (where you’re reading this now) and the cs61-sections repository (obtain a local copy with a command like git clone git@github.com:cs61/cs61-sections). TFs will work through the section material at their own pace, stopping at exercises so students can work through the material together. It’s best if you bring your laptop. Solutions for most section problems are available here, but you will learn more if you try to solve the problems yourself first.

Section works well if students ask questions, and sometimes TFs will take the section in an unexpected direction based on student requests. Please speak up! If you have questions from lecture, now’s the time to ask.

Students are expected to have digested all material from section handouts, even if your section didn’t finish.

C++’s Standard Template Library (STL)

C++ comes with a large library of useful data structures, including resizable arrays (std::vector), linked lists (std::list), ordered search trees (std::map), hash tables (std::unordered_map), and sets (std::set and std::unordered_set). It also comes with a library of useful algorithms, including std::sort (sorting) and std::lower_bound (binary searching). You may notice these structures and algorithms in handout code, and you may want to use these data structures yourself.

`std::vector` (resizable array)

std::vector<T> represents an array of objects of type T that can dynamically change size. Access into std::vector is as fast as access into a normal array, even though elements can be added and removed at runtime.

Here are some examples of basic std::vector methods.

s01/vector1.cc

#include <vector>     // include declaration of `std::vector`
#include <cstdio>
#include <cassert>

int main() {
    // Create a vector containing specific elements
    std::vector<int> a = { 1, 2, 3, 4 };

    // Create a vector containing 12 copies of the `long` value `61`
    std::vector<long> a2(12, 61);

    int e2 = a[2];    // get element
    assert(a[2] == 3);

    a[2] = 4;         // set element
    assert(a[2] == 4);

    a.size();         // return number of elements
    assert(a.size() == 4);

    a.push_back(5);   // add element to the end
    assert(a.size() == 5);
    assert(a[4] == 5);

    a.back();         // return last element (must not be empty)
    assert(a.back() == 5);

    a.pop_back();     // remove last element (but do not return it)
    assert(a.size() == 4);

    bool e = a.empty(); // return true iff `size() == 0`
    assert(!e);

    // Print vector contents
    printf("a: [");
    for (size_t i = 0; i != a.size(); ++i) {
        printf("%s%d", i ? ", " : "", a[i]);
    }
    printf("]\n");

    a.clear();        // erase all elements
    assert(a.empty());

    printf("Done!\n");
}

Go to the cs61-sections/s01 directory, run make, and then run ./vector1 to check that the assertions all succeed.

Exercise. Write code that creates an empty vector of longs called a.
std::vector<long> a;

Exercise. Write code that creates a vector containing 5 copies of the C string "HI". Try to do this at least three different ways. Use the compiler to check for syntax errors—for instance, put your code in vector2.cc and run make vector2.
The type of C strings is const char*. Here are three different ways; there are more:
std::vector<const char*> v1(5, "HI");

std::vector<const char*> v2{"HI", "HI", "HI", "HI", "HI"};

std::vector<const char*> v3
while (v3.size() != 5) {
    v3.push_back("HI");
}

Learn more:

Tutorial on std::vector from learncpp.com

Tutorial on std::vector from riptutorial.com

Reference material (comprehensive but scary): cplusplus.com, cppreference.com

Common methods on containers (reference)

Many C++ container types have methods with similar names. Here’s what they do.

container constructor: A “constructor” in C++ is a function that initializes an object. (This terminology is common to other object-oriented languages, such as Java.) Containers typically support several constructors that take different parameters. The empty constructor, which is called for a declaration like std::vector<T> a or std::vector<T> a{} with no arguments, leaves the container empty. An “initializer list,” such as std::vector<T> a = {x, y, z} or std::vector<T> a{x, y, z}, fills the container with the given elements.
container.size(): Return the number of elements in the container. The return type is usually size_t.
container.empty(): Return true iff the container is empty (has size() == 0).
container.clear(): Remove all elements from the container, leaving it empty().
container[index]: Return a modifiable reference to the element of the container at position index. “Modifiable reference” means the element value can be either read (as in e = v[i]) or written (as in v[i] = e). This method is only supported on containers where access by index is efficient, such as vectors, maps, and hash tables (but not linked lists).
container.back(): Return the last element in the container. This method is only supported on containers that represent a sequence, such as vectors and linked lists (but not maps and hash tables).
container.push_back(value): Add a new element to the end of the container, containing a copy of value.
container.pop_back(): Remove the last element in the container.

Iterators

C++ containers and algorithms rely on the iterator abstraction. An iterator indicates a current position in the container; the value at that position can be found by “dereferencing” the iterator as if it were a pointer. While many C++ container methods have close analogues in other programming languages, C++ iterators are their own thing. They are quite powerful and efficient.

Here’s the usual idiom for looping over the elements of a vector. (Iterators have long, complicated type names, so we usually use auto to avoid giving the whole type name: auto it = a.begin();)

std::vector<int> a = ...;
for (auto it = a.begin(); it != a.end(); ++it) {
    printf("%d\n", *it);
}

Because iterator methods have similar names for all containers, the exact same loop will work for a linked list or set, and related loops will work for maps and hash tables!

C++ also supports a “for-each” loop, written this way:

for (auto& elt : a) {
    printf("%d\n", elt);
}

But the for-each loop compiles into the iterator version behind the scenes.

What does the & in auto& elt mean? The & means that elt should be initialized by reference. As a result, in the body of the loop, elt refers to the actual elements of the vector, not copies of those elements. To understand the difference, consider this:
std::vector<int> a{1, 2, 3};
for (auto elt : a) {
    // `elt` is initialized as a copy of each element in turn
    elt *= 2; // modifies copies of the elements; the modifications are lost
}
assert(a == std::vector<int>{1, 2, 3}); // vector elements unchanged

for (auto& elt : a) {
    // `elt` is initialized as a REFERENCE (name, or alias) to each element in term
    elt *= 2; // modifies actual elements
}
assert(a == std::vector<int>{2, 4, 6}); // vector elements doubled
References are common in advanced C++. Anything you can do with references can be done through pointers instead, but references are safer than pointers and more like the calling conventions in other languages.

The most important iterator methods are:

container.begin(): Return an iterator pointing to the first element in the container.
container.end(): Return an iterator pointing “one past the end” of the container. (This is analogous to array sizes: the size of an array is “one past” the last valid index.) container.end() must never be dereferenced. If a container is empty, container.begin() == container.end().
++iterator: Move the iterator to point at the next element of the container. Note that you must not call ++it when it == container.end().
--iterator: Move the iterator to point at the previous element of the container. You must not call --it when it == container.begin().

Exercise. Write a loop that prints every other member of a vector of ints named a (so indexes 0, 2, 4, and so forth), but use iterators rather than indexes. This is trickier than it might appear!
Here’s one correct way.
auto it = a.begin();
while (it != a.end()) {
    printf("%d\n", *it);
    ++it;
    if (it != a.end()) {   // must not advance past end
        ++it;
    }
}
This simpler code isn’t correct: try running it on a vector with an odd number of elements and see what happens!
for (auto it = a.begin(); it != a.end(); it += 2) {
    printf("%d\n", *it);
}
(If you change the for loop’s test to it < a.end() it appears to work, but it’s still forms an iterator after a.end() which is illegal, so that version might break in a future release of the library or compiler.)

Another correct method uses a flag that tracks whether to print.
bool print = true;
for (auto it = a.begin(); it != a.end(); ++it) {
    if (print) {
        printf("%d\n", *it);
    }
    print = !print;
}

Home Exercise. Complete this function:
/** Return an iterator that points at the first occurrence of `value` in `a`.
 * For example, if `a` holds values [1, 2, 3], then `find_first(a, 1)`
 * should return `a.begin()`. */
std::vector<int>::iterator find_first(std::vector<int>& a, int value) {
    ...
}
You should decide what to return if value is not present in a!
std::vector<int>::iterator find_first(const std::vector<int>& a, int value) {
    auto it = a.begin();
    while (it != a.end() && *a != value) {
        ++it;
    }
    return it;
}
We return a.end() when the value is not found. A generic version of this function is already provided for you in the standard library, as std::find; it works for any element type and any container.

Iterator computations (reference)

Since std::vector’s underlying array representation allows efficient access at any index, vector iterators can efficiently be moved around by arbitrary distances. Such iterators are called random-access iterators.

To move a random-access iterator, you use syntax that looks like arithmetic.

iterator + count: Return an iterator that points count positions forward relative to iterator. For example, if it points at the 7th element of a vector, then it + 5 points at the 12th.
iterator - count: Return an iterator that points count positions backward relative to iterator.
it += count: Move it forward count positions.
it -= count: Move it backward count positions.
it1 - it2: (it1 and it2 must point into the same vector.) Return the number of positions from it2 to it1. For instance, (it + 5) - it == 5, and v.end() - v.begin() == v.size().

Modifying vectors

The following methods can be used to modify vectors. Many of them also apply to other containers.

container.insert(iterator, value): Inserts a copy of value into the container immediately before iterator, shifting *iterator and all later elements up. container.insert(container.end(), value) does the same thing as container.push_back(value). Returns an iterator pointing at the newly inserted element.
container.insert(iterator, {value1, value2, value3, ...}): Adds the values to the container before iterator.
container.erase(iterator): Removes the element pointed to by iterator, shifting all later elements down. Returns an iterator pointing at the first non-removed element, or end() if there is no such element; the input iterator becomes invalid.
container.erase(first, last): (first and last are iterators into the container.) Removes the elements from first up to, but not including, last, shifting all later elements down. container.erase(container.begin(), container.end()) does the same thing as container.clear().
vector.resize(size): Resize the vector to contain size elements. If size is bigger than the current size, adds new elements as required. vector.resize(0) does the same thing as vector.clear().

Growing a vector can force reallocation of the underlying storage for elements. When this happens, all previously-existing iterators into the vector are invalidated (they point into garbage memory and dereferencing them is illegal). This is why it‘s often important to use the return value from vector.insert() or vector.erase().

Question. Fill in the ?s. Check your answers by modifying vectorq.cc and running make vectorq && ./vectorq.

s01/vectorq.cc

std::vector<int> v = { 1, 2, 3, 4, 5 };

assert(v.size() == ?);
assert(v[3] == ?);

v.push_back(6);

assert(v.size() == ?);
assert(v.back() == ?);

v.erase(v.begin(), v.begin() + 1);

assert(v.size() == ?);
assert(v.back() == ?);
assert(v[0] == ?);

v.insert(v.begin() + 1, 10);
assert(v.size() == ?);
assert(v[0] == ?);
assert(v[1] == ?);

std::vector<int> v = { 1, 2, 3, 4, 5 };

v.size() == 5
v[3] == 4

v.push_back(6);

v.size() == 6
v.back() == 6

v.erase(v.begin(), v.begin() + 1);

v.size() == 5
v.back() == 6
v[0] == 2

v.insert(v.begin() + 1, 10);
v.size() == 6
v[0] == 2
v[1] == 10

Exercise. Complete this function:

/** Insert `value` into `a`, preserving the sort order. For example,
 * if `a` holds [1, 3, 4], then after `insert_sorted(a, 2)`, `a` should
 * hold [1, 2, 3, 4]. `a` must be sorted on input. */
void insert_sorted(std::vector<int>& a, int value) {
    ...
}

Exercise. Write a set of tests for your insert_sorted function. Here’s an example test to get you started:
void test_insert_sorted() {
    std::vector<int> a{};
    insert_sorted(a, 1);

    std::vector<int> expected = {1};
    assert(a == expected);   // vectors have the same elements in the same order
}
When writing tests, think about situations that might be difficult to get right (edge cases). What are some edge cases for insert_sorted? Another great way to write tests is to generate tests randomly! Computers are very fast, and can run a ton of tests in very little time. This process is sometimes called fuzzing.
Here are some example edge cases that might be hard to get right:

Inserting into an empty vector

Inserting a duplicate element

Inserting before the first element

Inserting after the last element

An is_sorted function that checks whether a vector is in sorted order can be useful for testing. One kind of random test might involve inserting random integers into a vector, then asserting that the vector is_sorted. That doesn’t check that the vector contains the right elements, or the right number of elements, but it does check that the elements in the vector are sorted.
bool is_sorted(std::vector<int>& a) {
    // search for an adjacent pair that’s out of sort order
    size_t pos = 0;
    while (pos + 1 < a.size() && a[pos] <= a[pos + 1]) {
        ++pos;
    }
    // OK if we reach the end before finding a misordered pair
    return pos + 1 >= a.size();
}
This function fully checks a call to insert_sorted by comparing its results with an alternate implementation. You wouldn’t want to use that implementation—it’s expensive!—but it’s obviously correct and can serve as ground truth. This kind of technique is very powerful.
void checked_insert_sorted(std::vector<int>& a, int value) {
    // Alternate implementation: push value at end, then sort
    std::vector<int> copy_of_a = a;
    copy_of_a.push_back(value);
    std::sort(copy_of_a.begin(), copy_of_a.end());

    // Implementation we want to check
    insert_sorted(a, value);

    // Validate by comparing results
    assert(a == copy_of_a);
}
And here’s a random tester that uses check_insert_sorted. If this tester succeeds, we should feel pretty confident that insert_sorted is correct.
#include <cstdio>
#include <cstdlib>
#include <cassert>
#include <vector>
#include <algorithm>

... insert_sorted ...
... checked_insert_sorted ...

int main() {
    // Create 1 million sorted vectors by checked insertion of random elements.
    // The vectors have random sizes from 50 to 99.
    for (int round = 0; round != 1000000; ++round) {
        size_t want_size = 50 + rand() % 50;
        std::vector<int> a;
        for (size_t i = 0; i != want_size; ++i) {
            checked_insert_sorted(a, rand());
        }
    }
}
But randomness might be likely to miss some cases you care about. Since we’re inserting random values chosen from a wide range (rand() returns numbers between 0 and 2,147,483,647), this tester is unlikely to validate duplicate values. It’s often a good idea to supplement random testing with your own edge cases.

Exercise. Complete this function, which should remove any elements of v that are even.
void remove_even_elements(std::vector<int> &v) {
    ...
}
Warning: This is tricky because the vector’s erase operation invalidates iterators!
To solve this problem using array indexes, you might write this:
void remove_even_elements(std::vector<int> &v) {
    for (size_t i = 0; i != v.size(); ++i) {
        if (v[i] % 2 == 0) {
            v.erase(v.begin() + i);
            --i;    // removed current element, which shifted future elements down;
                    // need to check this index again!
        }
    }
}
But doing this with iterators is tricky, because after v.erase(it), it becomes invalid and cannot be incremented or decremented! This iterator solution works by taking advantage of v.erase’s return value:
void remove_even_elements(std::vector<int> &v) {
    for (auto it = v.begin(); it != v.end(); ) {
        if (v[i] % 2 == 0) {
            it = v.erase(it);
        } else {
            ++i;
        }
    }
}
This pattern is also useful for lists, maps, and so forth.

Advanced Home Exercise. Write a version of insert_sorted that is efficient on large vectors. For example, you might use binary search to find the insertion position.

Vector representation

Exercise. A vector can change size as the program runs. In which segment do you think the elements of the vector are most likely to be stored?

In the heap. Dynamically-sized data must be stored in the heap! The stack, code, and data segments contain fixed-sized objects.

(That said, an optimized representation could reserve a fixed-size buffer for vectors containing small numbers of elements, and that buffer might be on the stack or in the data segment. A version of this “small-buffer optimization” is commonly implemented for strings).

Exercise. Use hexdump_object to print out the representation for a (non-empty) vector object. Also print out the representations of a vector iterator (it) and a vector element (*it or v[idx]), for all positions in the vector; and print out the representation of v.end(). What conclusions can you draw? How is a vector represented in memory?
Here’s some example code that prints these hexdumps:
std::vector<int> v = { 1, 2, 3, 4 };
printf("v:\n");
hexdump_object(v);
for (auto it = v.begin(); it != v.end(); ++it) {
    printf("element %zu:\n", it - v.begin());
    hexdump_object(it);
    hexdump_object(*it);
}
printf("end:\n");
auto it = v.end();
hexdump_object(it);
And some results, from Docker:
v:
7ffe534ca380  b0 3e 89 02 06 56 00 00  c0 3e 89 02 06 56 00 00  |.>...V...>...V..|
7ffe534ca390  c0 3e 89 02 06 56 00 00                           |.>...V..|
element 0:
7ffe534ca378  b0 3e 89 02 06 56 00 00                           |.>...V..|
560602893eb0  01 00 00 00                                       |....|
element 1:
7ffe534ca378  b4 3e 89 02 06 56 00 00                           |.>...V..|
560602893eb4  02 00 00 00                                       |....|
element 2:
7ffe534ca378  b8 3e 89 02 06 56 00 00                           |.>...V..|
560602893eb8  03 00 00 00                                       |....|
element 3:
7ffe534ca378  bc 3e 89 02 06 56 00 00                           |.>...V..|
560602893ebc  04 00 00 00                                       |....|
end:
7ffe534ca378  c0 3e 89 02 06 56 00 00                           |.>...V..|
And some observations based on those results.

In this implementation, the vector has size 24 bytes. It is a local variable, and is stored in the stack segment (we can tell from the very high addresses).

The vector’s contents are stored on the heap. (Each element has an address like 0x5606002893exx, which is far less than stack addresses but far greater than code or data segment addresses.)

Elements are stored contiguously and sequentially in array style. Element I is stored at location 0x5606002893eb0+4*I.

An iterator takes just 8 bytes to store. The iterator itself is a local variable and is stored on the stack (its location has a stack address, 0x7ffe534ca378). Its contents look like a pointer! When the iterator is pointing at element I, the content of the iterator is the address of that element! This relationship—the close correspondence between iterators into the vector and pointers at elements—is key to the efficiency of C++’s standard library.

The vector itself appears to contain three pointers. The first pointer is a pointer to the first element—it equals v.begin(). The second pointer is a pointer to one past the end element—it equals v.end(). Some more experiments can reveal that the third pointer is a “capacity” pointer indicating how much additional space has been reserved for growth; in this vector, no extra space has been reserved, so the capacity pointer equals the end pointer. Note that the vector’s size is not explicitly stored; v.size() must compute the result from the end and begin pointers.

Correspondences with other programming languages

C++	Python	Javascript	Java	PHP
`std::vector<T>`	`list`	`Array`	`ArrayList<T>`	`array`
`v.size()`	`len(l)`	`a.length`	`al.size()`	`count($a)`
`v.empty()`	`not l`	`a.length === 0`	`al.isEmpty()`	`empty($a)`
`v.clear()`	`l.clear()`	`a.length = 0`	`al.clear()`	`$a = []`
`v[idx]`	`l[idx]`	`a[idx]`	`al.get(idx)`	`$a[$idx]`
`v.back()`	`l[-1]`	`a[a.length - 1]`	`al.get(al.size() - 1)`	`$a[count($a) - 1]`
`v.push_back(e)`	`l.append(e)`	`a.push(e)`	`al.add(e)`	`array_push($a, $e)` or `$a[] = $e`
`v.pop_back()`	`l.pop()`	`a.pop()`	`al.remove(al.size() - 1)`	`array_pop($a)`
`v.insert(it, value)`	`l.insert(idx, value)`	`a.splice(idx, 0, value)`	`al.add(idx, value)`	`array_splice($a, $idx, 0, $value)`
`v.erase(it1, it2)`	`del l[idx1:idx2]`	`a.splice(idx1, idx2 - idx1)`	`al.removeRange(idx1, idx2)`	`array_splice($a, $idx1, $idx2 - $idx1)`

`std::list` (doubly-linked list)

The std::list type is another sequence structure that implements a doubly-linked list. Its methods resemble std::vector’s, but because indexing into a linked list at a specific position is not efficient, it does not support the index notation l[idx]—you have to use iterators to find a list element.

Reference material (comprehensive but scary): cplusplus.com, cppreference.com

`std::map` (ordered search tree)

std::map<KEY, VALUE> is a search data structure that maps KEYs to VALUEs, or, more precisely, objects of KEY type to objects of VALUE type. For example, std::map<std::string, int> maps std::string (C++ strings) to integers, and could be used to track the number of times different words appear in a text. Objects of KEY type must be comparable, and std::map stores its arguments in key order: if you iterate over the contents of the map, the results are returned in increasing order by key.

The contents of std::map are key–value pairs, implemented as std::pair<const KEY, VALUE> objects. std::pair is like a Python tuple: it has exactly two components, first and second, that can have different types.

Since std::map indexes by key, rather than position, its insert method looks a little different than std::vector’s. It also offers efficient find and count methods for finding or counting elements by key.

map.insert({key, value}): Insert a key-value pair into the map. (In this context, the {key, value} syntax creates a std::pair.) Does nothing if key is already present in the map. Returns a pair std::pair<iterator, bool>, where the iterator component is an iterator pointing at the entry for key, and the bool is true iff the element was inserted.
map[key] = value
map.insert_or_update({key, value}): Set key to value in the map. Unlike map.insert({key, value}), these operations always install the new value.
map.count(key): Return the number of map elements that with key key. For std::map, this is always 0 or 1.
map.find(key): Return an iterator pointing at the map element with key key, or map.end() if there is no such element.
map.erase(key): Remove the element with key key, if there is one. Returns the number of elements removed, which is 0 or 1. (Note that std::map also supports erase methods that take iterator arguments.)

Note that unlike with vector, changing the size of a map will not invalidate all iterators into the map.

s01/map1.cc

#include <map>
#include <string>
#include <cstdio>
#include <cassert>

int main() {
    // Map strings to integers
    std::map<std::string, int> m;

    // Insert without overwriting (leaves map unchanged if key present)
    m.insert({"one", 1});

    // Number of keys in map
    assert(m.size() == 1);
    assert(!m.empty());
    printf("m.size = %zu\n", m.size());

    // Test if key is in map
    size_t exists = m.count("one");
    assert(exists == 1);

    exists = m.count("two");
    assert(exists == 0);

    // Find matching element; returns `m.end()` if not found
    auto it0 = m.find("one");
    assert(it0 != m.end());
    // Iterator points to a key-value pair: `first` is the key, `second` the value
    assert(it0->first == "one");
    assert(it0->second == 1);
    printf("m.find(\"one\") -> %d\n", it0->second);

    auto it1 = m.find("two");
    assert(it1 == m.end());

    // Array syntax inserts or modifies
    m["one"] = 61;               // Insert into map (with overwrite semantics)
    assert(m["one"] == 61);
    // But beware; array syntax inserts a default if not found!
    int two_value = m["two"];
    assert(two_value == 0);
    assert(m.size() == 2);
    assert(m.find("two") != m.end());

    // Remove key
    m.erase("two");
    assert(m.size() == 1);

    // Iterate in sorted order
    m.insert({"two", 2});
    m.insert({"three", 3});
    m.insert({"four", 4});
    m.insert({"five", 5});
    for (auto it = m.begin(); it != m.end(); ++it) {
        // `it->first` is the key, `it->second` the value
        // (`it->first.c_str()` transforms a C++ string to printf form)
        printf("Found %s -> %d\n", it->first.c_str(), it->second);
    }

    printf("Done!\n");
}

Go to the cs61-sections/s01 directory, run make, and then run ./map1 to check that the assertions all succeed.

Exercise. Fill in the ?s. Check your answers by modifying mapq.cc and running make mapq && ./mapq.

s01/mapq.cc

std::map<int, int> m;

m.insert({1, 2});
assert(m[1] == ?);
assert(m.count(1) == ?);
assert(m.count(2) == ?);
assert(m.size() == ?);

int x = m[2];
assert(x == ?);
assert(m.size() == ?);

m.insert({1, 3});
assert(m[1] == ?);

m.erase(1);
assert(m.size() == ?);

m.insert({1, 3});
assert(m[1] == ?);

Home Exercise. Use hexdump_object to print out the representation for a (non-empty) map object. Also print out the representations of a map iterator (it) and element (*it), for all positions in the map; and print out the representation of m.end(). What conclusions can you draw? How is a map represented in memory? (Maps are more complicated than vectors, so don’t knock yourself out—just try to make some observations.)

Learn more:

Tutorial on std::map from riptutorial.com

Reference material (comprehensive but scary): cplusplus.com, cppreference.com

`std::unordered_map` (hash table)

The std::unordered_map<KEY, VALUE> data structure also maps keys to values, but it’s a hash table, not an ordered map, so it can be faster when order is not required. (That is, lookup can be take O(1) time in the number of elements, rather than O(log N) time like std::map.) std::unordered_map has the same methods as std::map, but instead of just requiring comparison on KEYs, it also requires a hash function. If you see a horrible error like

error: static_assert failed "the specified hash does not meet the Hash requirements"

or (EUUUUUGGGGGHHHHHHHHH—this kind of error is one of the very worst aspects of C++)

/usr/include/c++/7/bits/hashtable_policy.h: In instantiation of ‘struct std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > >’:
/usr/include/c++/7/type_traits:143:12:   required from ‘struct std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > >’
/usr/include/c++/7/type_traits:154:31:   required from ‘struct std::__not_<std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > > >’
/usr/include/c++/7/bits/unordered_map.h:103:66:   required from ‘class std::unordered_map<std::pair<int, int>, int>’
membug8.cc:4:54:   required from here
/usr/include/c++/7/bits/hashtable_policy.h:87:34: error: no match for call to ‘(const std::hash<std::pair<int, int> >) (const std::pair<int, int>&)’

that means that the key type has no hash function yet. You can use a std::map (which doesn’t require a hash function, but features slower lookup), or write your own hash function, using syntax like this:

namespace std {
     template <> struct hash<MY_TYPE> {
         size_t operator()(const MY_TYPE& x) const {
             return ... whatever ...;
         }
     };
}

Here is a pretty good hash function for pairs (based on the one in Boost). Do not ask your TF about it during section but feel free to ask us offline :)

namespace std {
    template <typename T, typename U> struct hash<pair<T, U>> {
        size_t operator()(const pair<T, U>& x) const {
            size_t h1 = std::hash<T>{}(x.first);
            size_t h2 = std::hash<U>{}(x.second);
            size_t k = 0xC6A4A7935BD1E995UL;
            h2 = ((h2 * k) >> 47) * k;
            return (h1 ^ h2) * k;
        }
    };
}

And here is a garbage hash function. You could use it as a placeholder if you like making your computer work hard.

namespace std {
     template <> struct hash<MY_TYPE> {
         size_t operator()(const MY_TYPE& x) const {
             return 0;
         }
     };
}

Exercise. The s01/uomap1.cc file is exactly like map1.cc, but uses std::unordered_map rather than std::map. Run ./uomap1; how does its output differ from ./map1, and why?

`std::set`, `std::unordered_set`

The std::set<T> and std::unordered_set<T> types represent sets of T objects. They are implemented like std::map and std::unordered_map, but since they don’t store values, their iterator types point to T objects rather than pairs and they do not support map[key] bracket notation.

Exercise. Look at s01/uoset1.cc and compare it to s01/uomap1.cc. How is it similar and how is it different?

Standard algorithms

The <algorithm> header defines useful algorithms that work on all kinds of C++ data structures, and even on C-style arrays. Here are some of the best. (The names first and last denote two iterators into the same container; they delimit a range of elements starting at first, and going up to, but not including, last. Most often you will pass container.begin() and container.end() for these arguments.)

std::sort(first, last): Sort the elements of the range. Element objects must be comparable using <.
std::sort(first, last, compare): Sort the elements of the range according to a compare function. compare(e1, e2) should return true iff e1 sorts below e2.
std::find(first, last, value): Return an iterator to the first element of the range that equals value, or last if no element is found.
std::find_if(first, last, predicate): Return an iterator to the first element of the range for which predicate(*it) returns true, or last if no element is found.
std::lower_bound(first, last, value): (The range between first and last must be sorted.) Return an iterator to the first position in the range that is greater than or equal to value, or last if all the elements of the range are less than value. Uses binary search, so it’s efficient if used on a vector or array.

Functions like std::find_if are often given lambda functions as arguments. Lambda functions bring functional programming techniques to C++. Some programming languages have nice syntax for lambda functions; here is a Javascript function that returns true iff its argument is even:

(x) => x % 2 === 0

C++’s syntax is weirder.

[&] (int x) { return x % 2 == 0; }

Here’s a use of that syntax to sort an array of ints by their distance from int center, so that elements e appear in increasing order by \left| e - \texttt{center} \right|.

void sort_by_distance_from_center(std::vector<int>& a, int center) {
    std::sort(a.begin(), a.end(), [&] (int x, int y) {
        return abs(x - center) < abs(y - center);
    });
}