This section is about memory: what’s stored in memory, how to explore memory, and how dynamic memory works. It also covers aspects of the C++ library’s standard data structures that you may find useful for the problem set.
Section material is handed out in the cs61/cs61-sections GitHub repository.
Objects
An object in C and C++ is a region of memory that contains a value, such as the integer 12. (Specifically, the C++ standard defines “object” as “a region of data storage in the execution environment, the contents of which can represent values”.) The C abstract machine requires that distinct objects that exist simultaneously must occupy distinct memory.
Question. The following user code, in
s01/objects.cc
, defines five objects. What are they?#include <cstdlib> #include <cstdio> char global_ch = 65; const char const_global_ch = 66; int main() { char local_ch = 67; char* allocated_ch = (char*) malloc(sizeof(char)); *allocated_ch = 68; printf("Hello\n"); }
Thanks to pointers, references, and array notation, one C++ object can have many different names.
Question. Extend the following program to use different names for
x
.#include <cstdlib> #include <cstdio> int main() { int x = 61; printf("x = %d\n", x); // Change the following lines to use different names for `x`. Each line // should print the same *object*, but using a different *name*. // You may add additional declarations, such as `int* y = &x;`. // Create as many names as you can! printf("Via name 1: %d\n", x); printf("Via name 2: %d\n", x); printf("Via name 3: %d\n", x); }
Not all C++ expressions correspond to objects. For instance, some expressions are pure values, things that have no separate storage of their own.
Question. Which of the following data expressions correspond to objects? Assume that this code goes in
s01/names.cc
.
x
x + 1
&x
61
- The string
"x = %d\n"
Types, sizes, and addresses
Every object has a type. C++ is a statically typed language, so each object’s type is defined by the programmer, explicitly, when the object is declared. The type isn’t stored in memory1: the compiler simply knows what type each object has.
Every object also has a size and an alignment. These are positive
numbers that describe how objects are laid out in memory. An object’s size and
alignment depend only on its type: every object of the same type has the same
size and alignment. Inside a C++ program, you can find an object’s size with
sizeof(OBJECT)
and you can find its alignment with alignof(OBJECT)
. These
language constructs also apply to types. They return numbers of type size_t
,
which is a kind of unsigned integer.
And every object has an address. On most machines, an address is a number
of type uintptr_t
. The bytes of storage that contain the object’s value
begin at its address, and continue up to the address plus the size. To obtain
an object’s address, you first use the &
operator to create a pointer to the
object, and then cast the resulting expression to type uintptr_t
, as
follows:
int x = 61; // `x` is an object
int* ptr = &x; // `ptr` is an object whose value is a pointer to `x`
uintptr_t addr = (uintptr_t) ptr; // `addr` is an object whose value is the address of `x`
// These are equivalent ways of computing `addr`:
uintptr_t addr = (uintptr_t) &x;
uintptr_t addr = reinterpret_cast<uintptr_t>(&x); // most verbose, but best C++ style
A pointer can be printed with printf
specifier %p
. This prints the
pointer’s address value as a hexadecimal number like 0xabcde
. A size or
alignment can be printed with printf
specifier %zu
.
Question. Modify
s01/objects.cc
to print the addresses of all five objects. Which object has the smallest address? Which one has the largest?
Question. Modify
s01/objects.cc
to print the sizes and alignments of all five objects. Which has the largest size? Does there appear to be a relationship between size and alignment?
C++’s Standard Template Library (STL)
C++ comes with a large library of useful data structures, including growable
arrays (std::vector
), linked lists (std::list
), ordered search trees
(std::map
), and hash tables (std::unordered_map
). It also comes with a
library of useful algorithms, including std::sort
(sorting) and
std::lower_bound
(binary searching). You may notice these structures and
algorithms in handout code, and you may want to use these data structures
yourself.
STL vector
(growable array)
std::vector<T>
is a growable array abstraction that holds objects of type
T
. Access into std::vector
is as fast as access into a normal array, but
unlike a normal array, it’s possible to add and remove elements dynamically at
runtime.
#include <vector>
#include <cstdio>
#include <cassert>
int main() {
// A vector of integers
std::vector<int> my_vec = { 1, 2, 3, 4 };
int element_2 = my_vec[2]; // Reading from vector
assert(element_2 == 3);
my_vec[3] = 4; // Writing to vector
assert(my_vec[3] == 4);
// The vector `[]` operator is like array dereference: the caller must
// check bounds. But there’s another call that always checks bounds.
element_2 = my_vec.at(2);
assert(element_2 == 3);
my_vec.at(3) = 61;
assert(my_vec[3] == 61);
my_vec.size(); // Return number of elements
assert(my_vec.size() == 4);
my_vec.push_back(5); // Adds element to the end of the vector
assert(my_vec.size() == 5);
assert(my_vec[4] == 5);
my_vec.back(); // Return the last element of the vector (must not be empty)
assert(my_vec.back() == 5);
my_vec.pop_back(); // Remove element at the end
assert(my_vec.size() == 4);
my_vec.empty(); // Return true iff `size() == 0`
assert(!my_vec.empty());
my_vec.clear(); // Erases all elements
assert(my_vec.empty());
printf("Done!\n");
}
Iterators
STL containers and algorithms rely on the iterator abstraction. An iterator indicates a current position in the container. In a vector, iterators are like pointers into arrays.
The most important iterator methods are container.begin()
, which returns an
iterator that points to the “beginning” of the container (in a vector, this is
the first element), and container.end()
, which returns an iterator that
points to the “end” of the container and also represents absent elements (in a
vector, this points one past the last element).
These codes behave the same:
int my_array[4] = { 1, 2, 3, 4 };
// first iterate using an index
for (int i = 0; i != 4; ++i) {
printf("%d\n", my_array[i]);
}
// that’s equivalent to iterate using a pointer into the array,
// thanks to pointer arithmetic (to be discussed this week)!
for (int* a = my_array; a != &my_array[4] /* one past end */; ++a) {
printf("%d\n", *a);
}
// the C++ vector version looks like the “pointer arithmetic”
// version, and can be just as fast, but safer.
std::vector<int> my_vec = { 1, 2, 3, 4 };
for (auto it = my_vec.begin(); it != my_vec.end(); ++it) {
printf("%d\n", *it);
}
C++’s “for-each” loops also use iterators behind the scenes.
for (auto& a : my_vec) {
printf("%d\n", a);
}
// which behaves like:
for (auto __iter = my_vec.begin(); __iter != my_vec.end(); ++__iter) {
auto& a = *__iter; // obtain reference to container element
printf("%d\n", a);
}
Some other common iterator methods:
Lists and vectors:
container.insert(ITERATOR, VALUE)
adds a copy ofVALUE
to the container, immediately after theITERATOR
position. It returns an iterator that points at the new value.container.erase(ITERATOR)
removes the element pointed to byITERATOR
. It returns an iterator pointing to the “next” element afterITERATOR
.container.erase(FIRST, LAST)
removes all elements of the container starting atFIRST
(an iterator) and going up to, but not including,LAST
(another iterator). It returns an iterator pointing to the new location ofLAST
. (This might be different fromLAST
if the erase operation moves other elements around, as it will for a vector.)
Exercises
Question. Fill in the
?
s! You may modifystl0.cc
to find the answers through experiment.std::vector<int> v = { 1, 2, 3, 4, 5 }; v.size() == ? v[3] == ? v.push_back(6); v.size() == ? v.back() == ? v.erase(v.begin(), v.begin() + 1); v.size() == ? v.back() == ? v[0] == ? v.insert(v.begin() + 1, 10); v.size() == ? v[0] == ? v[1] == ?
If i
is an iterator that points at an element of a vector, then you can
obtain an actual pointer to the element with the trick &*i
. (Note that
this trick doesn’t work for v.end()
, which doesn’t actually point at an
element.)
Question. Use this trick to run experiments that print the address of every element of a vector. How do the addresses relate? How do they relate to the address of the vector itself?
Question. For a vector
v
, how doessizeof(v)
relate tov.size()
? Which of these values changes as the vector grows and shrinks? Can you use this information to guess at how the vector’s data is represented in memory?
STL map
(ordered search tree)
std::map<KEY, VALUE>
is a search data structure that maps KEY
s to
VALUE
s, or, more precisely, objects of KEY
type to objects of VALUE
type. For example, std::map<std::string, unsigned>
maps std::string
(C++
strings) to unsigned numbers, and could be used to track the number of times
different words appear in a text. Objects of KEY
type must be comparable.
The contents of std::map
are key–value pairs. Unlike vectors and lists,
std::map
has a find
method, which can find a key quickly using a binary
search algorithm. std:🗺:find
returns an iterator:
auto it = my_map.find(2);
if (it != my_map.end()) {
// Then `2` is present in the map as a key.
assert(it->first == 2);
// And we can access the value.
printf("%s\n", it->second.c_str());
} else {
// it == my_map.end() -- key is not found
}
As usual begin()
and end()
can iterate over all the elements of the map.
#include <map>
#include <string>
#include <cstdio>
#include <cassert>
int main() {
// Map strings to integers
std::map<std::string, int> my_map;
// Insert without overwriting (leaves map unchanged if key present)
// Argument is a {key, value} pair
my_map.insert({"one", 1});
// Number of keys in map
assert(my_map.size() == 1);
assert(!my_map.empty());
// Test if key is in map
size_t exists = my_map.count("one");
assert(exists == 1);
exists = my_map.count("two");
assert(exists == 0);
// Find matching element; returns `my_map.end()` if not found
auto it0 = my_map.find("one");
assert(it0 != my_map.end());
// Iterator points to a {key, value} pair
assert(it0->first == "one");
assert(it0->second == 1);
auto it1 = my_map.find("two");
assert(it1 == my_map.end());
// Array syntax is a convenient way to insert or modify
my_map["one"] = 61; // Insert into map (with overwrite semantics)
assert(my_map["one"] == 61);
// But beware; array syntax inserts a default if not found!
assert(my_map["two"] == 0);
assert(my_map.size() == 2);
assert(my_map.find("two") != my_map.end());
// Remove key
my_map.erase("two");
assert(my_map.size() == 1);
// Iterate in sorted order
my_map.insert({"two", 2});
my_map.insert({"three", 3});
my_map.insert({"four", 4});
my_map.insert({"five", 5});
for (auto it = my_map.begin(); it != my_map.end(); ++it) {
// `it->first` is the key, `it->second` the value
// (`it->first.c_str()` transforms a C++ string to printf form)
printf("Found %s -> %d\n", it->first.c_str(), it->second);
}
printf("Done!\n");
}
Question. Fill in the
?
s! You may uses01/stl1.cc
to find the answers through experiment.std::map<int, int> m; m.insert({1, 2}); m[1] == ? m.count(1) == ? m.count(2) == ? m.size() == ? int x = m[2]; x == ? m.size() == ? m.insert({1, 3}); m[1] == ? m.erase(1); m.size() == ? m.insert({1, 3}); m[1] == ?
STL unordered_map
(hash table)
The std::unordered_map<KEY, VALUE>
data structure also maps keys to values,
but it’s a hash table, not an ordered map, so it can be somewhat faster when
order is not required. It uses the same methods as std::map
, but instead of
just requiring comparison on KEY
s, it also requires a hash function. That
means that if you see a horrible error like
error: static_assert failed "the specified hash does not meet the Hash requirements"
or (EUUUUUGGGGGHHHHHHHHH—this kind of error is one of the very worst aspects of C++)
/usr/include/c++/7/bits/hashtable_policy.h: In instantiation of ‘struct std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > >’:
/usr/include/c++/7/type_traits:143:12: required from ‘struct std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > >’
/usr/include/c++/7/type_traits:154:31: required from ‘struct std::__not_<std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > > >’
/usr/include/c++/7/bits/unordered_map.h:103:66: required from ‘class std::unordered_map<std::pair<int, int>, int>’
membug8.cc:4:54: required from here
/usr/include/c++/7/bits/hashtable_policy.h:87:34: error: no match for call to ‘(const std::hash<std::pair<int, int> >) (const std::pair<int, int>&)’
that means that the key type has no hash function yet. You can use a
std::map
(which doesn’t require a hash function, but features slower
lookup), or write your own hash function, using syntax like this:
namespace std {
template <> struct hash<MY_TYPE> {
size_t operator()(const MY_TYPE& x) const {
return ... whatever ...;
}
};
}
Here is a pretty good hash function for pairs (based on the one in Boost). Do not ask your TF about it during section but feel free to ask us offline :)
namespace std {
template <typename T, typename U> struct hash<pair<T, U>> {
size_t operator()(const pair<T, U>& x) const {
size_t h1 = std::hash<T>{}(x.first);
size_t h2 = std::hash<U>{}(x.second);
size_t k = 0xC6A4A7935BD1E995UL;
h2 = ((h2 * k) >> 47) * k;
return (h1 ^ h2) * k;
}
};
}
And here is a garbage hash function. You could use it as a placeholder if you like making your computer work hard.
namespace std {
template <> struct hash<MY_TYPE> {
size_t operator()(const MY_TYPE& x) const {
return 0;
}
};
}
Question. Run
./stl2
. How does its output differ from./stl1
?
Addresses and performance
In the first lecture, we ran several programs that created sorted collections
of integers. One set of interesting experiments involved the testinsert0
program, which used a linked list data structure. We saw that:
Inserting integers in ascending order (
./testinsert0 -u N
) was slow for largeN
. We argued that this was because every insertion had to traverse the entire list to find the insertion point.Inserting integers in descending order (
./testinsert0 -d N
) was fast. We argued that this was because the insertion didn’t have to traverse anything to find the insertion point.Inserting integers is random order (
./testinsert0 -r N
) was very slow for largeN
—slower even than ascending order, despite requiring less traversal than ascending order!
Can you use address printing to develop a potential explanation?
Question. Modify
s01/testinsert0.cc
to compute and print the number of distinct traversal steps made during the insertion. (Hint: You’ll modify the “lambda function”—the funky thing starting with[&] (int x)
that’s called once per traversal.)
Question. Modify
s01/testinsert0.cc
to print out the addresses of (the first 100) integers in the list. Run experiments with the three flags (-u
,-d
,-r
). Does this help explain why-r
is slower than-d
at largeN
, even though-r
makes fewer traversal steps?
Greetings!
It is extremely illegal to access memory that doesn’t belong to a live object. In fact, it’s so illegal that the resulting behavior is undefined. Different kinds of illegal accesses have different names; for example:
- Use-after-free: Using an object that has already been destroyed (by
free
, by the C++ equivalentdelete
, or by the object going out of scope). - Wild read: Reading memory that doesn’t belong to any object.
- Wild write: Writing memory that doesn’t belong to any object.
- Out-of-bounds access: Reading or writing memory beyond the bounds of an array.
- Double free: Freeing dynamically-allocated memory that has already been freed.
- Invalid free: Passing a pointer to
free
(ordelete
) that was not allocated bymalloc
(ornew
).
On that note, here’s a short program. Try running it see what it does.
#include <cstdio>
#include <cstdlib>
void greet() {
char buf[16];
printf("Hello! What is your name?\n");
scanf("%s", buf);
printf("Nice to meet you, %s!\n", buf);
}
int main() {
greet();
return 0;
}
Question. Can you provide an input that crashes this program?
Question. Why did it crash, and what are the implications?
Question. How can you make this program safe? (Hint:
man scanf
!)
Memory bugs
Question. The
cs61-sections/s01
directory has several files namedmembug?.cc
. What memory bugs are present in each file? Before running, predict the effect of the memory bug; then run the program to observe its effects.
membug1.cc
:membug2.cc
:membug3.cc
:membug4.cc
:membug5.cc
:membug6.cc
:membug7.cc
:membug8.cc
:
- The compiler does store in memory the types of certain objects that require advanced object-oriented features. Specifically, it stores the types of objects of polymorphic class type, which are the C++ types that support dynamic dispatch (or, in C++ terms, virtual functions). You can learn more about these kinds of objects in CS 152 and CS 153. [return]