Section 1: Objects and C++

This section is about memory: what’s stored in memory, how to explore memory, and how dynamic memory works. It also covers aspects of the C++ library’s standard data structures that you may find useful for the problem set.

Section material is handed out in the cs61/cs61-sections GitHub repository.

Objects

An object in C and C++ is a region of memory that contains a value, such as the integer 12. (Specifically, the C++ standard defines “object” as “a region of data storage in the execution environment, the contents of which can represent values”.) The C abstract machine requires that distinct objects that exist simultaneously must occupy distinct memory.

Question. The following user code, in s01/objects.cc, defines five objects. What are they?

s01/objects.cc

#include <cstdlib>
#include <cstdio>

char global_ch = 65;
const char const_global_ch = 66;

int main() {
    char local_ch = 67;
    char* allocated_ch = (char*) malloc(sizeof(char));
    *allocated_ch = 68;

    printf("Hello\n");
}

Thanks to pointers, references, and array notation, one C++ object can have many different names.

Question. Extend the following program to use different names for x.

s01/names.cc

#include <cstdlib>
#include <cstdio>

int main() {
    int x = 61;
    printf("x = %d\n", x);

    // Change the following lines to use different names for `x`. Each line
    // should print the same *object*, but using a different *name*.
    // You may add additional declarations, such as `int* y = &x;`.
    // Create as many names as you can!
    printf("Via name 1: %d\n", x);
    printf("Via name 2: %d\n", x);
    printf("Via name 3: %d\n", x);
}

Not all C++ expressions correspond to objects. For instance, some expressions are pure values, things that have no separate storage of their own.

Question. Which of the following data expressions correspond to objects? Assume that this code goes in s01/names.cc.

x

x + 1

&x

61

The string "x = %d\n"

Types, sizes, and addresses

Every object has a type. C++ is a statically typed language, so each object’s type is defined by the programmer, explicitly, when the object is declared. The type isn’t stored in memory¹: the compiler simply knows what type each object has.

Every object also has a size and an alignment. These are positive numbers that describe how objects are laid out in memory. An object’s size and alignment depend only on its type: every object of the same type has the same size and alignment. Inside a C++ program, you can find an object’s size with sizeof(OBJECT) and you can find its alignment with alignof(OBJECT). These language constructs also apply to types. They return numbers of type size_t, which is a kind of unsigned integer.

And every object has an address. On most machines, an address is a number of type uintptr_t. The bytes of storage that contain the object’s value begin at its address, and continue up to the address plus the size. To obtain an object’s address, you first use the & operator to create a pointer to the object, and then cast the resulting expression to type uintptr_t, as follows:

int x = 61;                        // `x` is an object
int* ptr = &x;                     // `ptr` is an object whose value is a pointer to `x`
uintptr_t addr = (uintptr_t) ptr;  // `addr` is an object whose value is the address of `x`
// These are equivalent ways of computing `addr`:
uintptr_t addr = (uintptr_t) &x;
uintptr_t addr = reinterpret_cast<uintptr_t>(&x);   // most verbose, but best C++ style

A pointer can be printed with printf specifier %p. This prints the pointer’s address value as a hexadecimal number like 0xabcde. A size or alignment can be printed with printf specifier %zu.

Question. Modify s01/objects.cc to print the addresses of all five objects. Which object has the smallest address? Which one has the largest?

Question. Modify s01/objects.cc to print the sizes and alignments of all five objects. Which has the largest size? Does there appear to be a relationship between size and alignment?

C++’s Standard Template Library (STL)

C++ comes with a large library of useful data structures, including growable arrays (std::vector), linked lists (std::list), ordered search trees (std::map), and hash tables (std::unordered_map). It also comes with a library of useful algorithms, including std::sort (sorting) and std::lower_bound (binary searching). You may notice these structures and algorithms in handout code, and you may want to use these data structures yourself.

STL `vector` (growable array)

std::vector<T> is a growable array abstraction that holds objects of type T. Access into std::vector is as fast as access into a normal array, but unlike a normal array, it’s possible to add and remove elements dynamically at runtime.

s01/stl0.cc

#include <vector>
#include <cstdio>
#include <cassert>

int main() {
    // A vector of integers
    std::vector<int> my_vec = { 1, 2, 3, 4 };

    int element_2 = my_vec[2]; // Reading from vector
    assert(element_2 == 3);

    my_vec[3] = 4;             // Writing to vector
    assert(my_vec[3] == 4);

    // The vector `[]` operator is like array dereference: the caller must
    // check bounds. But there’s another call that always checks bounds.
    element_2 = my_vec.at(2);
    assert(element_2 == 3);
    my_vec.at(3) = 61;
    assert(my_vec[3] == 61);

    my_vec.size();         // Return number of elements
    assert(my_vec.size() == 4);

    my_vec.push_back(5);   // Adds element to the end of the vector
    assert(my_vec.size() == 5);
    assert(my_vec[4] == 5);

    my_vec.back();         // Return the last element of the vector (must not be empty)
    assert(my_vec.back() == 5);

    my_vec.pop_back();     // Remove element at the end
    assert(my_vec.size() == 4);

    my_vec.empty();        // Return true iff `size() == 0`
    assert(!my_vec.empty());

    my_vec.clear();        // Erases all elements
    assert(my_vec.empty());

    printf("Done!\n");
}

Iterators

STL containers and algorithms rely on the iterator abstraction. An iterator indicates a current position in the container. In a vector, iterators are like pointers into arrays.

The most important iterator methods are container.begin(), which returns an iterator that points to the “beginning” of the container (in a vector, this is the first element), and container.end(), which returns an iterator that points to the “end” of the container and also represents absent elements (in a vector, this points one past the last element).

These codes behave the same:

int my_array[4] = { 1, 2, 3, 4 };
// first iterate using an index
for (int i = 0; i != 4; ++i) {
    printf("%d\n", my_array[i]);
}

// that’s equivalent to iterate using a pointer into the array,
// thanks to pointer arithmetic (to be discussed this week)!
for (int* a = my_array; a != &my_array[4] /* one past end */; ++a) {
    printf("%d\n", *a);
}

// the C++ vector version looks like the “pointer arithmetic”
// version, and can be just as fast, but safer.
std::vector<int> my_vec = { 1, 2, 3, 4 };
for (auto it = my_vec.begin(); it != my_vec.end(); ++it) {
    printf("%d\n", *it);
}

C++’s “for-each” loops also use iterators behind the scenes.

for (auto& a : my_vec) {
    printf("%d\n", a);
}

// which behaves like:
for (auto __iter = my_vec.begin(); __iter != my_vec.end(); ++__iter) {
    auto& a = *__iter;     // obtain reference to container element
    printf("%d\n", a);
}

Some other common iterator methods:

Lists and vectors: container.insert(ITERATOR, VALUE) adds a copy of VALUE to the container, immediately after the ITERATOR position. It returns an iterator that points at the new value.
container.erase(ITERATOR) removes the element pointed to by ITERATOR. It returns an iterator pointing to the “next” element after ITERATOR.
container.erase(FIRST, LAST) removes all elements of the container starting at FIRST (an iterator) and going up to, but not including, LAST (another iterator). It returns an iterator pointing to the new location of LAST. (This might be different from LAST if the erase operation moves other elements around, as it will for a vector.)

Exercises

Question. Fill in the ?s! You may modify stl0.cc to find the answers through experiment.

std::vector<int> v = { 1, 2, 3, 4, 5 };

v.size() == ?
v[3] == ?

v.push_back(6);

v.size() == ?
v.back() == ?

v.erase(v.begin(), v.begin() + 1);

v.size() == ?
v.back() == ?
v[0] == ?

v.insert(v.begin() + 1, 10);
v.size() == ?
v[0] == ?
v[1] == ?

If i is an iterator that points at an element of a vector, then you can obtain an actual pointer to the element with the trick &*i. (Note that this trick doesn’t work for v.end(), which doesn’t actually point at an element.)

Question. Use this trick to run experiments that print the address of every element of a vector. How do the addresses relate? How do they relate to the address of the vector itself?

Question. For a vector v, how does sizeof(v) relate to v.size()? Which of these values changes as the vector grows and shrinks? Can you use this information to guess at how the vector’s data is represented in memory?

STL `map` (ordered search tree)

std::map<KEY, VALUE> is a search data structure that maps KEYs to VALUEs, or, more precisely, objects of KEY type to objects of VALUE type. For example, std::map<std::string, unsigned> maps std::string (C++ strings) to unsigned numbers, and could be used to track the number of times different words appear in a text. Objects of KEY type must be comparable.

The contents of std::map are key–value pairs. Unlike vectors and lists, std::map has a find method, which can find a key quickly using a binary search algorithm. std:🗺:find returns an iterator:

auto it = my_map.find(2);
if (it != my_map.end()) {
    // Then `2` is present in the map as a key.
    assert(it->first == 2);
    // And we can access the value.
    printf("%s\n", it->second.c_str());
} else {
    // it == my_map.end() -- key is not found
}

As usual begin() and end() can iterate over all the elements of the map.

s01/stl1.cc

#include <map>
#include <string>
#include <cstdio>
#include <cassert>

int main() {
    // Map strings to integers
    std::map<std::string, int> my_map;

    // Insert without overwriting (leaves map unchanged if key present)
    // Argument is a {key, value} pair
    my_map.insert({"one", 1});

    // Number of keys in map
    assert(my_map.size() == 1);
    assert(!my_map.empty());

    // Test if key is in map
    size_t exists = my_map.count("one");
    assert(exists == 1);

    exists = my_map.count("two");
    assert(exists == 0);

    // Find matching element; returns `my_map.end()` if not found
    auto it0 = my_map.find("one");
    assert(it0 != my_map.end());
    // Iterator points to a {key, value} pair
    assert(it0->first == "one");
    assert(it0->second == 1);

    auto it1 = my_map.find("two");
    assert(it1 == my_map.end());

    // Array syntax is a convenient way to insert or modify
    my_map["one"] = 61;               // Insert into map (with overwrite semantics)
    assert(my_map["one"] == 61);
    // But beware; array syntax inserts a default if not found!
    assert(my_map["two"] == 0);
    assert(my_map.size() == 2);
    assert(my_map.find("two") != my_map.end());

    // Remove key
    my_map.erase("two");
    assert(my_map.size() == 1);

    // Iterate in sorted order
    my_map.insert({"two", 2});
    my_map.insert({"three", 3});
    my_map.insert({"four", 4});
    my_map.insert({"five", 5});
    for (auto it = my_map.begin(); it != my_map.end(); ++it) {
        // `it->first` is the key, `it->second` the value
        // (`it->first.c_str()` transforms a C++ string to printf form)
        printf("Found %s -> %d\n", it->first.c_str(), it->second);
    }

    printf("Done!\n");
}

Question. Fill in the ?s! You may use s01/stl1.cc to find the answers through experiment.

std::map<int, int> m;

m.insert({1, 2});
m[1] == ?
m.count(1) == ?
m.count(2) == ?
m.size() == ?

int x = m[2];
x == ?
m.size() == ?

m.insert({1, 3});
m[1] == ?

m.erase(1);
m.size() == ?
m.insert({1, 3});
m[1] == ?

STL `unordered_map` (hash table)

The std::unordered_map<KEY, VALUE> data structure also maps keys to values, but it’s a hash table, not an ordered map, so it can be somewhat faster when order is not required. It uses the same methods as std::map, but instead of just requiring comparison on KEYs, it also requires a hash function. That means that if you see a horrible error like

error: static_assert failed "the specified hash does not meet the Hash requirements"

or (EUUUUUGGGGGHHHHHHHHH—this kind of error is one of the very worst aspects of C++)

/usr/include/c++/7/bits/hashtable_policy.h: In instantiation of ‘struct std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > >’:
/usr/include/c++/7/type_traits:143:12:   required from ‘struct std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > >’
/usr/include/c++/7/type_traits:154:31:   required from ‘struct std::__not_<std::__and_<std::__is_fast_hash<std::hash<std::pair<int, int> > >, std::__detail::__is_noexcept_hash<std::pair<int, int>, std::hash<std::pair<int, int> > > > >’
/usr/include/c++/7/bits/unordered_map.h:103:66:   required from ‘class std::unordered_map<std::pair<int, int>, int>’
membug8.cc:4:54:   required from here
/usr/include/c++/7/bits/hashtable_policy.h:87:34: error: no match for call to ‘(const std::hash<std::pair<int, int> >) (const std::pair<int, int>&)’

that means that the key type has no hash function yet. You can use a std::map (which doesn’t require a hash function, but features slower lookup), or write your own hash function, using syntax like this:

namespace std {
     template <> struct hash<MY_TYPE> {
         size_t operator()(const MY_TYPE& x) const {
             return ... whatever ...;
         }
     };
}

Here is a pretty good hash function for pairs (based on the one in Boost). Do not ask your TF about it during section but feel free to ask us offline :)

namespace std {
    template <typename T, typename U> struct hash<pair<T, U>> {
        size_t operator()(const pair<T, U>& x) const {
            size_t h1 = std::hash<T>{}(x.first);
            size_t h2 = std::hash<U>{}(x.second);
            size_t k = 0xC6A4A7935BD1E995UL;
            h2 = ((h2 * k) >> 47) * k;
            return (h1 ^ h2) * k;
        }
    };
}

And here is a garbage hash function. You could use it as a placeholder if you like making your computer work hard.

namespace std {
     template <> struct hash<MY_TYPE> {
         size_t operator()(const MY_TYPE& x) const {
             return 0;
         }
     };
}

Question. Run ./stl2. How does its output differ from ./stl1?

Addresses and performance

In the first lecture, we ran several programs that created sorted collections of integers. One set of interesting experiments involved the testinsert0 program, which used a linked list data structure. We saw that:

Inserting integers in ascending order (./testinsert0 -u N) was slow for large N. We argued that this was because every insertion had to traverse the entire list to find the insertion point.
Inserting integers in descending order (./testinsert0 -d N) was fast. We argued that this was because the insertion didn’t have to traverse anything to find the insertion point.
Inserting integers is random order (./testinsert0 -r N) was very slow for large N—slower even than ascending order, despite requiring less traversal than ascending order!

Can you use address printing to develop a potential explanation?

Question. Modify s01/testinsert0.cc to compute and print the number of distinct traversal steps made during the insertion. (Hint: You’ll modify the “lambda function”—the funky thing starting with [&] (int x) that’s called once per traversal.)

Question. Modify s01/testinsert0.cc to print out the addresses of (the first 100) integers in the list. Run experiments with the three flags (-u, -d, -r). Does this help explain why -r is slower than -d at large N, even though -r makes fewer traversal steps?

Greetings!

It is extremely illegal to access memory that doesn’t belong to a live object. In fact, it’s so illegal that the resulting behavior is undefined. Different kinds of illegal accesses have different names; for example:

Use-after-free: Using an object that has already been destroyed (by free, by the C++ equivalent delete, or by the object going out of scope).
Wild read: Reading memory that doesn’t belong to any object.
Wild write: Writing memory that doesn’t belong to any object.
Out-of-bounds access: Reading or writing memory beyond the bounds of an array.
Double free: Freeing dynamically-allocated memory that has already been freed.
Invalid free: Passing a pointer to free (or delete) that was not allocated by malloc (or new).

On that note, here’s a short program. Try running it see what it does.

s01/greet.cc

#include <cstdio>
#include <cstdlib>

void greet() {
    char buf[16];

    printf("Hello! What is your name?\n");
    scanf("%s", buf);
    printf("Nice to meet you, %s!\n", buf);
}

int main() {
    greet();
    return 0;
}

Question. Can you provide an input that crashes this program?

Question. Why did it crash, and what are the implications?

Question. How can you make this program safe? (Hint: man scanf!)

Memory bugs

Question. The cs61-sections/s01 directory has several files named membug?.cc. What memory bugs are present in each file? Before running, predict the effect of the memory bug; then run the program to observe its effects.

membug1.cc:

membug2.cc:

membug3.cc:

membug4.cc:

membug5.cc:

membug6.cc:

membug7.cc:

membug8.cc:

The compiler does store in memory the types of certain objects that require advanced object-oriented features. Specifically, it stores the types of objects of polymorphic class type, which are the C++ types that support dynamic dispatch (or, in C++ terms, virtual functions). You can learn more about these kinds of objects in CS 152 and CS 153. ^[return]