This section covers important parts of the C++ standard library, especially container data structures. We also continue to explore memory and data representation.
How section works
We release section material via the CS 61 course site (where you’re reading
this now) and the cs61-sections repository (obtain a local copy with a
command like git clone git@github.com:cs61/cs61-sections
). TFs will work
through the section material at their own pace, stopping at exercises so
students can work through the material together. It’s best if you bring your
laptop. Solutions for most section problems are available here, but you will
learn more if you try to solve the problems yourself first.
Section works well if students ask questions, and sometimes TFs will take the section in an unexpected direction based on student requests. Please speak up! If you have questions from lecture, now’s the time to ask.
Students are expected to have digested all material from section handouts, even if your section didn’t finish.
C++’s Standard Template Library (STL)
C++ comes with a large library of useful data structures, including resizable
arrays (std::vector
), linked lists (std::list
), ordered search trees
(std::map
), hash tables (std::unordered_map
), and sets (std::set
and
std::unordered_set
). It also comes with a library of useful algorithms,
including std::sort
(sorting) and std::lower_bound
(binary searching). You
may notice these structures and algorithms in handout code, and you may want
to use these data structures yourself.
std::vector
(resizable array)
std::vector<T>
represents an array of objects of type T
that can
dynamically change size. Access into std::vector
is as fast as access into a
normal array, even though elements can be added and removed at runtime.
Here are some examples of basic std::vector
methods.
#include <vector> // include declaration of `std::vector`
#include <cstdio>
#include <cassert>
int main() {
// Create a vector containing specific elements
std::vector<int> a = { 1, 2, 3, 4 };
// Create a vector containing 12 copies of the `long` value `61`
std::vector<long> a2(12, 61);
int e2 = a[2]; // get element
assert(a[2] == 3);
a[2] = 4; // set element
assert(a[2] == 4);
a.size(); // return number of elements
assert(a.size() == 4);
a.push_back(5); // add element to the end
assert(a.size() == 5);
assert(a[4] == 5);
a.back(); // return last element (must not be empty)
assert(a.back() == 5);
a.pop_back(); // remove last element (but do not return it)
assert(a.size() == 4);
bool e = a.empty(); // return true iff `size() == 0`
assert(!e);
// Print vector contents
printf("a: [");
for (size_t i = 0; i != a.size(); ++i) {
printf("%s%d", i ? ", " : "", a[i]);
}
printf("]\n");
a.clear(); // erase all elements
assert(a.empty());
printf("Done!\n");
}
Go to the cs61-sections/s01
directory, run make
, and then run ./vector1
to check that the assertions all succeed.
Exercise. Write code that creates an empty vector of
long
s calleda
.
Exercise. Write code that creates a vector containing 5 copies of the C string
"HI"
. Try to do this at least three different ways. Use the compiler to check for syntax errors—for instance, put your code invector2.cc
and runmake vector2
.
Learn more:
- Tutorial on
std::vector
from learncpp.com- Tutorial on
std::vector
from riptutorial.com- Reference material (comprehensive but scary): cplusplus.com, cppreference.com
Common methods on containers (reference)
Many C++ container types have methods with similar names. Here’s what they do.
container
constructor- A “constructor” in C++ is a function that initializes an object. (This
terminology is common to other object-oriented languages, such as Java.)
Containers typically support several constructors that take different
parameters. The empty constructor, which is called for a declaration like
std::vector<T> a
orstd::vector<T> a{}
with no arguments, leaves the container empty. An “initializer list,” such asstd::vector<T> a = {x, y, z}
orstd::vector<T> a{x, y, z}
, fills the container with the given elements. container.size()
- Return the number of elements in the container. The return type is usually
size_t
. container.empty()
- Return true iff the container is empty (has
size() == 0
). container.clear()
- Remove all elements from the container, leaving it
empty()
. container[index]
- Return a modifiable reference to the element of the container at position
index
. “Modifiable reference” means the element value can be either read (as ine = v[i]
) or written (as inv[i] = e
). This method is only supported on containers where access by index is efficient, such as vectors, maps, and hash tables (but not linked lists). container.back()
- Return the last element in the container. This method is only supported on containers that represent a sequence, such as vectors and linked lists (but not maps and hash tables).
container.push_back(value)
- Add a new element to the end of the container, containing a copy of
value
. container.pop_back()
- Remove the last element in the container.
Iterators
C++ containers and algorithms rely on the iterator abstraction. An iterator indicates a current position in the container; the value at that position can be found by “dereferencing” the iterator as if it were a pointer. While many C++ container methods have close analogues in other programming languages, C++ iterators are their own thing. They are quite powerful and efficient.
Here’s the usual idiom for looping over the elements of a vector. (Iterators
have long, complicated type names, so we usually use auto
to avoid giving
the whole type name: auto it = a.begin();
)
std::vector<int> a = ...;
for (auto it = a.begin(); it != a.end(); ++it) {
printf("%d\n", *it);
}
Because iterator methods have similar names for all containers, the exact same loop will work for a linked list or set, and related loops will work for maps and hash tables!
C++ also supports a “for-each” loop, written this way:
for (auto& elt : a) {
printf("%d\n", elt);
}
But the for-each loop compiles into the iterator version behind the scenes.
What does the
&
inauto& elt
mean? The&
means thatelt
should be initialized by reference. As a result, in the body of the loop,elt
refers to the actual elements of the vector, not copies of those elements. To understand the difference, consider this:std::vector<int> a{1, 2, 3}; for (auto elt : a) { // `elt` is initialized as a copy of each element in turn elt *= 2; // modifies copies of the elements; the modifications are lost } assert(a == std::vector<int>{1, 2, 3}); // vector elements unchanged for (auto& elt : a) { // `elt` is initialized as a REFERENCE (name, or alias) to each element in term elt *= 2; // modifies actual elements } assert(a == std::vector<int>{2, 4, 6}); // vector elements doubled
References are common in advanced C++. Anything you can do with references can be done through pointers instead, but references are safer than pointers and more like the calling conventions in other languages.
The most important iterator methods are:
container.begin()
- Return an iterator pointing to the first element in the container.
container.end()
- Return an iterator pointing “one past the end” of the container. (This is
analogous to array sizes: the size of an array is “one past” the last valid
index.)
container.end()
must never be dereferenced. If a container is empty,container.begin() == container.end()
. ++iterator
- Move the iterator to point at the next element of the container. Note that
you must not call
++it
whenit == container.end()
. --iterator
- Move the iterator to point at the previous element of the container. You
must not call
--it
whenit == container.begin()
.
Exercise. Write a loop that prints every other member of a vector of
int
s nameda
(so indexes 0, 2, 4, and so forth), but use iterators rather than indexes. This is trickier than it might appear!
Home Exercise. Complete this function:
/** Return an iterator that points at the first occurrence of `value` in `a`. * For example, if `a` holds values [1, 2, 3], then `find_first(a, 1)` * should return `a.begin()`. */ std::vector<int>::iterator find_first(std::vector<int>& a, int value) { ... }
You should decide what to return if
value
is not present ina
!
Iterator computations (reference)
Since std::vector
’s underlying array representation allows efficient access
at any index, vector iterators can efficiently be moved around by arbitrary
distances. Such iterators are called random-access iterators.
To move a random-access iterator, you use syntax that looks like arithmetic.
iterator + count
- Return an iterator that points
count
positions forward relative toiterator
. For example, ifit
points at the 7th element of a vector, thenit + 5
points at the 12th. iterator - count
- Return an iterator that points
count
positions backward relative toiterator
. it += count
- Move
it
forwardcount
positions. it -= count
- Move
it
backwardcount
positions. it1 - it2
- (
it1
andit2
must point into the same vector.) Return the number of positions fromit2
toit1
. For instance,(it + 5) - it == 5
, andv.end() - v.begin() == v.size()
.
Modifying vectors
The following methods can be used to modify vectors. Many of them also apply to other containers.
container.insert(iterator, value)
- Inserts a copy of
value
into the container immediately beforeiterator
, shifting*iterator
and all later elements up.container.insert(container.end(), value)
does the same thing ascontainer.push_back(value)
. Returns an iterator pointing at the newly inserted element. container.insert(iterator, {value1, value2, value3, ...})
- Adds the
value
s to the container beforeiterator
. container.erase(iterator)
- Removes the element pointed to by
iterator
, shifting all later elements down. Returns an iterator pointing at the first non-removed element, orend()
if there is no such element; the input iterator becomes invalid. container.erase(first, last)
- (
first
andlast
are iterators into the container.) Removes the elements fromfirst
up to, but not including,last
, shifting all later elements down.container.erase(container.begin(), container.end())
does the same thing ascontainer.clear()
. vector.resize(size)
- Resize the vector to contain
size
elements. Ifsize
is bigger than the current size, adds new elements as required.vector.resize(0)
does the same thing asvector.clear()
.
Growing a vector can force reallocation of the underlying storage for
elements. When this happens, all previously-existing iterators into the
vector are invalidated (they point into garbage memory and dereferencing
them is illegal). This is why it‘s often important to use the return value
from vector.insert()
or vector.erase()
.
Question. Fill in the
?
s. Check your answers by modifyingvectorq.cc
and runningmake vectorq && ./vectorq
.std::vector<int> v = { 1, 2, 3, 4, 5 }; assert(v.size() == ?); assert(v[3] == ?); v.push_back(6); assert(v.size() == ?); assert(v.back() == ?); v.erase(v.begin(), v.begin() + 1); assert(v.size() == ?); assert(v.back() == ?); assert(v[0] == ?); v.insert(v.begin() + 1, 10); assert(v.size() == ?); assert(v[0] == ?); assert(v[1] == ?);
Exercise. Complete this function:
/** Insert `value` into `a`, preserving the sort order. For example, * if `a` holds [1, 3, 4], then after `insert_sorted(a, 2)`, `a` should * hold [1, 2, 3, 4]. `a` must be sorted on input. */ void insert_sorted(std::vector<int>& a, int value) { ... }
Exercise. Write a set of tests for your
insert_sorted
function. Here’s an example test to get you started:void test_insert_sorted() { std::vector<int> a{}; insert_sorted(a, 1); std::vector<int> expected = {1}; assert(a == expected); // vectors have the same elements in the same order }
When writing tests, think about situations that might be difficult to get right (edge cases). What are some edge cases for
insert_sorted
? Another great way to write tests is to generate tests randomly! Computers are very fast, and can run a ton of tests in very little time. This process is sometimes called fuzzing.
Exercise. Complete this function, which should remove any elements of
v
that are even.void remove_even_elements(std::vector<int> &v) { ... }
Warning: This is tricky because the vector’s
erase
operation invalidates iterators!
Advanced Home Exercise. Write a version of
insert_sorted
that is efficient on large vectors. For example, you might use binary search to find the insertion position.
Vector representation
Exercise. A vector can change size as the program runs. In which segment do you think the elements of the vector are most likely to be stored?
Exercise. Use
hexdump_object
to print out the representation for a (non-empty) vector object. Also print out the representations of a vector iterator (it
) and a vector element (*it
orv[idx]
), for all positions in the vector; and print out the representation ofv.end()
. What conclusions can you draw? How is a vector represented in memory?
Correspondences with other programming languages
C++ | Python | Javascript | Java | PHP |
---|---|---|---|---|
std::vector<T> |
list |
Array |
ArrayList<T> |
array |
v.size() |
len(l) |
a.length |
al.size() |
count($a) |
v.empty() |
not l |
a.length === 0 |
al.isEmpty() |
empty($a) |
v.clear() |
l.clear() |
a.length = 0 |
al.clear() |
$a = [] |
v[idx] |
l[idx] |
a[idx] |
al.get(idx) |
$a[$idx] |
v.back() |
l[-1] |
a[a.length - 1] |
al.get(al.size() - 1) |
$a[count($a) - 1] |
v.push_back(e) |
l.append(e) |
a.push(e) |
al.add(e) |
array_push($a, $e) or $a[] = $e |
v.pop_back() |
l.pop() |
a.pop() |
al.remove(al.size() - 1) |
array_pop($a) |
v.insert(it, value) |
l.insert(idx, value) |
a.splice(idx, 0, value) |
al.add(idx, value) |
array_splice($a, $idx, 0, $value) |
v.erase(it1, it2) |
del l[idx1:idx2] |
a.splice(idx1, idx2 - idx1) |
al.removeRange(idx1, idx2) |
array_splice($a, $idx1, $idx2 - $idx1) |
std::list
(doubly-linked list)
The std::list
type is another sequence structure that implements a
doubly-linked list. Its methods resemble std::vector
’s, but because indexing
into a linked list at a specific position is not efficient, it does not
support the index notation l[idx]
—you have to use iterators to find a list
element.
Reference material (comprehensive but scary): cplusplus.com, cppreference.com
std::map
(ordered search tree)
std::map<KEY, VALUE>
is a search data structure that maps KEY
s to
VALUE
s, or, more precisely, objects of KEY
type to objects of VALUE
type. For example, std::map<std::string, int>
maps std::string
(C++
strings) to integers, and could be used to track the number of times different
words appear in a text. Objects of KEY
type must be comparable, and
std::map
stores its arguments in key order: if you iterate over the contents
of the map, the results are returned in increasing order by key.
The contents of std::map
are key–value pairs, implemented as
std::pair<const KEY, VALUE>
objects. std::pair
is like a Python tuple: it
has exactly two components, first
and second
, that can have different
types.
Since std::map
indexes by key, rather than position, its insert
method
looks a little different than std::vector
’s. It also offers efficient find
and count
methods for finding or counting elements by key.
map.insert({key, value})
- Insert a key-value pair into the map. (In this context, the
{key, value}
syntax creates astd::pair
.) Does nothing ifkey
is already present in the map. Returns a pairstd::pair<iterator, bool>
, where theiterator
component is an iterator pointing at the entry forkey
, and thebool
is true iff the element was inserted. map[key] = value
map.insert_or_update({key, value})
- Set
key
tovalue
in the map. Unlikemap.insert({key, value})
, these operations always install the new value. map.count(key)
- Return the number of map elements that with key
key
. Forstd::map
, this is always 0 or 1. map.find(key)
- Return an iterator pointing at the map element with key
key
, ormap.end()
if there is no such element. map.erase(key)
- Remove the element with key
key
, if there is one. Returns the number of elements removed, which is 0 or 1. (Note thatstd::map
also supportserase
methods that take iterator arguments.)
Note that unlike with vector
, changing the size of a map
will not
invalidate all iterators into the map
.
#include <map>
#include <string>
#include <cstdio>
#include <cassert>
int main() {
// Map strings to integers
std::map<std::string, int> m;
// Insert without overwriting (leaves map unchanged if key present)
m.insert({"one", 1});
// Number of keys in map
assert(m.size() == 1);
assert(!m.empty());
printf("m.size = %zu\n", m.size());
// Test if key is in map
size_t exists = m.count("one");
assert(exists == 1);
exists = m.count("two");
assert(exists == 0);
// Find matching element; returns `m.end()` if not found
auto it0 = m.find("one");
assert(it0 != m.end());
// Iterator points to a key-value pair: `first` is the key, `second` the value
assert(it0->first == "one");
assert(it0->second == 1);
printf("m.find(\"one\") -> %d\n", it0->second);
auto it1 = m.find("two");
assert(it1 == m.end());
// Array syntax inserts or modifies
m["one"] = 61; // Insert into map (with overwrite semantics)
assert(m["one"] == 61);
// But beware; array syntax inserts a default if not found!
int two_value = m["two"];
assert(two_value == 0);
assert(m.size() == 2);
assert(m.find("two") != m.end());
// Remove key
m.erase("two");
assert(m.size() == 1);
// Iterate in sorted order
m.insert({"two", 2});
m.insert({"three", 3});
m.insert({"four", 4});
m.insert({"five", 5});
for (auto it = m.begin(); it != m.end(); ++it) {
// `it->first` is the key, `it->second` the value
// (`it->first.c_str()` transforms a C++ string to printf form)
printf("Found %s -> %d\n", it->first.c_str(), it->second);
}
printf("Done!\n");
}
Go to the cs61-sections/s01
directory, run make
, and then run ./map1
to check that the assertions all succeed.
Exercise. Fill in the
?
s. Check your answers by modifyingmapq.cc
and runningmake mapq && ./mapq
.std::map<int, int> m; m.insert({1, 2}); assert(m[1] == ?); assert(m.count(1) == ?); assert(m.count(2) == ?); assert(m.size() == ?); int x = m[2]; assert(x == ?); assert(m.size() == ?); m.insert({1, 3}); assert(m[1] == ?); m.erase(1); assert(m.size() == ?); m.insert({1, 3}); assert(m[1] == ?);
Home Exercise. Use
hexdump_object
to print out the representation for a (non-empty) map object. Also print out the representations of a map iterator (it
) and element (*it
), for all positions in the map; and print out the representation ofm.end()
. What conclusions can you draw? How is a map represented in memory? (Maps are more complicated than vectors, so don’t knock yourself out—just try to make some observations.)
Learn more:
- Tutorial on
std::map
from riptutorial.com- Reference material (comprehensive but scary): cplusplus.com, cppreference.com
std::unordered_map
(hash table)
The std::unordered_map<KEY, VALUE>
data structure also maps keys to values,
but it’s a hash table, not an ordered map, so it can be faster when order is
not required. (That is, lookup can be take O(1) time
in the number of elements, rather than O(log N) time
like std::map
.) std::unordered_map
has the same methods as std::map
, but
instead of just requiring comparison on KEY
s, it also requires a hash
function. If you see a horrible error like
error: static_assert failed "the specified hash does not meet the Hash requirements"
or (EUUUUUGGGGGHHHHHHHHH—this kind of error is one of the very worst aspects of C++)
/usr/include/c++/7/bits/hashtable_policy.h: In instantiation of ‘struct std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > >’:
/usr/include/c++/7/type_traits:143:12: required from ‘struct std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > >’
/usr/include/c++/7/type_traits:154:31: required from ‘struct std::__not_<std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > > >’
/usr/include/c++/7/bits/unordered_map.h:103:66: required from ‘class std::unordered_map<std::pair<int, int>, int>’
membug8.cc:4:54: required from here
/usr/include/c++/7/bits/hashtable_policy.h:87:34: error: no match for call to ‘(const std::hash<std::pair<int, int> >) (const std::pair<int, int>&)’
that means that the key type has no hash function yet. You can use a
std::map
(which doesn’t require a hash function, but features slower
lookup), or write your own hash function, using syntax like this:
namespace std {
template <> struct hash<MY_TYPE> {
size_t operator()(const MY_TYPE& x) const {
return ... whatever ...;
}
};
}
Here is a pretty good hash function for pairs (based on the one in Boost). Do not ask your TF about it during section but feel free to ask us offline :)
namespace std {
template <typename T, typename U> struct hash<pair<T, U>> {
size_t operator()(const pair<T, U>& x) const {
size_t h1 = std::hash<T>{}(x.first);
size_t h2 = std::hash<U>{}(x.second);
size_t k = 0xC6A4A7935BD1E995UL;
h2 = ((h2 * k) >> 47) * k;
return (h1 ^ h2) * k;
}
};
}
And here is a garbage hash function. You could use it as a placeholder if you like making your computer work hard.
namespace std {
template <> struct hash<MY_TYPE> {
size_t operator()(const MY_TYPE& x) const {
return 0;
}
};
}
Exercise. The
s01/uomap1.cc
file is exactly likemap1.cc
, but usesstd::unordered_map
rather thanstd::map
. Run./uomap1
; how does its output differ from./map1
, and why?
std::set
, std::unordered_set
The std::set<T>
and std::unordered_set<T>
types represent sets of T
objects. They are implemented like std::map
and std::unordered_map
, but
since they don’t store values, their iterator types point to T
objects
rather than pairs and they do not support map[key]
bracket notation.
Exercise. Look at
s01/uoset1.cc
and compare it tos01/uomap1.cc
. How is it similar and how is it different?
Standard algorithms
The <algorithm>
header defines useful algorithms that work on all kinds of
C++ data structures, and even on C-style arrays. Here are some of the best.
(The names first
and last
denote two iterators into the same container;
they delimit a range of elements starting at first
, and going up to, but not
including, last
. Most often you will pass container.begin()
and
container.end()
for these arguments.)
std::sort(first, last)
- Sort the elements of the range. Element objects must be comparable using
<
. std::sort(first, last, compare)
- Sort the elements of the range according to a
compare
function.compare(e1, e2)
should return true iffe1
sorts belowe2
. std::find(first, last, value)
- Return an iterator to the first element of the range that equals
value
, orlast
if no element is found. std::find_if(first, last, predicate)
- Return an iterator to the first element of the range for which
predicate(*it)
returns true, orlast
if no element is found. std::lower_bound(first, last, value)
- (The range between
first
andlast
must be sorted.) Return an iterator to the first position in the range that is greater than or equal tovalue
, orlast
if all the elements of the range are less thanvalue
. Uses binary search, so it’s efficient if used on a vector or array.
Functions like std::find_if
are often given lambda functions as
arguments. Lambda functions bring functional programming techniques to C++.
Some programming languages have nice syntax for lambda functions; here is a
Javascript function that returns true iff its argument is even:
(x) => x % 2 === 0
C++’s syntax is weirder.
[&] (int x) { return x % 2 == 0; }
Here’s a use of that syntax to sort an array of int
s by their distance from
int center
, so that elements e appear in
increasing order by \left| e - \texttt{center}
\right|.
void sort_by_distance_from_center(std::vector<int>& a, int center) {
std::sort(a.begin(), a.end(), [&] (int x, int y) {
return abs(x - center) < abs(y - center);
});
}