Problem set 0 – CS 61 2021

The goal of this ungraded problem set is to introduce you to course infrastructure, and to warm up your skills in the programming language we use, C++, and the kinds of programs we’ll write. The programming is not intended to be difficult, but don’t settle for sloppy results. The course aims to teach you robust, secure, and understandable coding, as well as fast and correct coding.

Solutions are provided, and you should look at them. Try to work through the problems yourself first, but don’t worry if you get stuck and peek at the solution. Understanding the solution is already a great step. The problems get progressively harder. If you get through all of the problems you are definitely ready for the course; if you get through part 4, you are very likely ready for the course.

Overview

This course teaches you the fundamentals of computer systems and systems programming, using C++ as its primary implementation language. C++ is a popular language for low-level systems coding, and unlike C, its predecessor, it comes with an extensive data structure library and high-level abstraction facilities.

Though the course uses C++, it is about systems programming. We don’t dive deep into the language, we do not use its object-oriented features, and we attempt to avoid its most difficult byways.

Some familiarity with C++’s data structures and libraries will be useful. You can get this from free resources on the web, although a textbook and reference is better. Some web resources, such as reference pages, contain an almost overwhelming amount of information. Don’t despair! A good way to use reference pages is to scan down for example code, which can be more concise and clear than the English-language reference material.

C++ tutorials from LearnCpp.com
C++ tutorial from cplusplus.com
C++ tutorial from w3schools.com

LearnCpp.com has an extensive set of tutorials. The cplusplus.com one is a good short tutorial.
C++ reference from cplusplus.com
C++ reference from cppreference.com

As of 2021, cplusplus.com’s text is more approachable, though its layout is uglier and its technical material slightly out of date. Eddie prefers cppreference.com.

It will also be useful to have some familiarity with Unix shell syntax. You should know what standard input and standard output mean, and how to perform simple redirections. Specifically:

A simple shell command like prog1 runs prog1 in the current context: its standard input and standard output are inherited from the shell, which usually means that its input reads from the keyboard and its output is printed to the terminal.
Typing prog1 < IN redirects prog1’s standard input to come from a file. In this instance, prog1 will observe the contents of the IN file when reading from its standard input.
Similarly, prog1 > OUT redirects prog1’s standard output to go to a file. If the file OUT already exists, its contents are overwritten.
prog1 < IN > OUT performs both redirections.
prog1 | prog2 (pronounced “prog1 pipe to prog2”) redirects prog2’s standard input to read from prog1’s standard output. Thus, all the bytes written by prog1 are read by prog2 in order.

Part 0. Course setup and Git

Objective: Prepare a CS 61 environment and get familiar with the terminal.

Follow the GitHub setup instructions.
Follow the Docker setup instructions.
Create a file in your cs61-psets/pset0 directory called hello.txt that contains one line, the word “awesome”. You can use your favorite text editor, or use an editor included in the CS 61 Docker (Nano, Emacs, or vi).

Check your work by entering the Docker environment and running the command make test-hello. This should print something like this:
```
cs61-user@04feeb3aadd2:~/cs61-psets/pset0$ make test-hello
✅ Cool! hello.txt exists and has the right contents.
```

The git status command will show you that hello.txt is not yet part of your official repository:

cs61-user@9ba17834d87b:~/cs61-psets/pset0$ git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    hello.txt

no changes added to commit (use "git add" and/or "git commit -a")

Add the hello.txt file to your repository and push it to GitHub (that is, upload it to the shared GitHub repository). Verify that it has arrived on GitHub by looking at the web page for your repository (its link will resemble “https://github.com/cs61/cs61-f21-psets-YOURUSERNAME”).

Solution

If you had trouble finding the command to add hello.txt to the repository, check out the helpful information git status printed out :)

$ git add hello.txt
$ git commit -a
... enter message ...
[main 1f6643c] your message here
 1 file changed, 1 insertion(+)
 create mode 100644 pset0/hello.txt
$ git push
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 979 bytes | 979.00 KiB/s, done.
Total 7 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 3 local objects.
To github.com:cs61/cs61-f21-psets-YOURUSERNAME
   f93f0de..1f6643c  main -> main

Part 1. Invoke the compiler

Objective: Use the C++ compiler and make to build a program.

Create a C++ source file exiter.cc containing the following code.

#include <cstdlib>

int main() {
    exit(0);
}

QUESTION. What would this program do?

Then compile the program and run it. Do this two ways.

First, call the compiler directly and run the program on the terminal. Here’s how:
```
$ c++ -std=gnu++2a -Wall -g -O3 exiter.cc -o exiter
$ ./exiter
```
The first command invokes the compiler, asking it to compile your source file (exiter.cc) into an executable (a program that can be run, here called exiter). The flags supplied to the compiler are:
- -std=gnu++2a selects a recent version of the C++ standard.
- -Wall asks the compiler to warn you about possible mistakes in your code. You should always make sure that your code compiles without warnings.
- -g adds debugging information to the compiler’s output.
- -O3 asks the compiler to optimize its code. This makes the code run faster, but can make the result harder to debug and can slow down compilation.
The compiler exits silently if can complete its work without encountering an error or warning, so be excited if there’s no output! If there is an error, you’ll need to fix that error and invoke the compiler again.

The second command (./exiter) invokes the compiled executable, which, as advertised, does nothing.
Second, use our provided makefile to build the program. Here’s how:
```
$ rm -f exiter
$ make exiter
  COMPILE exiter.cc
  LINK exiter
$ ./exiter
```
The first command, rm -f exiter, removes the old executable if you made one. The second command, make exiter, builds the new executable from rules in the provided GNUmakefile.

Why?

Makefiles and the make program are often used to build software. A makefile defines a set of recipes for building programs and other targets; the make program reads these recipes and builds targets according to their rules. Crucially, make analyzes which sources change from run to run and uses this information to speed up the build process. We provide makefiles for all problem sets and lecture codes in this course, and these makefiles have recipes for invoking the compiler, so you will not generally need to invoke the compiler yourself. But it’s super useful to know how to do so!

Our makefiles often hide the arguments of the programs they run. This is to help focus your attention on any warnings or errors. If you want to see the arguments, supply V=1 as an argument to make. For instance:

$ make clean
$ make V=1 exiter
g++  -std=gnu++2a -Wall -Wextra -Wshadow -g  -fsanitize=undefined -MD -MF .deps/exiter.d -MP -O3 -o exiter.o -c exiter.cc
g++ -std=gnu++2a -Wall -Wextra -Wshadow -g  -fsanitize=undefined  -O3 exiter.o -o exiter

As you can see, the makefile supplies more arguments than the simple compiler invocation! For fun, why not try to figure out what those arguments do? (-Wextra and -Wshadow especially.)

Unlike the direct invocation in step 1, our makefile runs the compiler twice to produce an executable. The first invocation, COMPILE exiter.cc, produces an intermediate file called an object file, exiter.o; the second invocation, LINK exiter, uses the object file to produce an executable. This is unnecessary for a single-file program like exiter.cc, but useful for programs generated from multiple source files.

Part 2. Say hello

Objective: Recall C++’s output facility.

Create a program hello that prints hello, kitty to the terminal (the standard output) and then exits. That is, write a C++ source file hello.cc, then compile it into a hello executable that behaves like this:

$ make hello
$ ./hello
hello, kitty
$

It’s important that the “hello” message appears on its own line—this output is wrong:

cs61-user@caddb9e3ab14:~/cs61-psets/pset0$ ./hello
hello, kittycs61-user@caddb9e3ab14:~/cs61-psets/pset0$

Hints:

Start with the code from exiter.cc.
Use the fprintf library function.
To get access to fprintf, you will need to #include <cstdio>.

Solution
#include <cstdio>
#include <cstdlib>

int main() {
    fprintf(stdout, "hello, kitty\n");
    exit(0);
}
Note the \n at the end of the format string: that’s what starts a new line. In C and C++, you must explicitly say when to end a line (as distinct from Python’s print statement).

Part 3. Count bytes

Objective: Recall more of C++’s input/output facilities.

Now let’s start really programming. Write a program bc61 that counts the number of bytes in its standard input and prints the result to standard output. Examples:

$ make bc61
$ echo foo | ./bc61
       4
$ ./bc61 < /dev/null
       0
$ yes | head -c 1000 | ./bc61
    1000
$ ./hello | ./bc61
      13

Use the fgetc and fprintf library functions for C-style I/O. Make sure you test your bc61 program with different inputs.

Because your bc61 program waits for input, running bc61 on its own will appear to hang the terminal. This can be confusing!
$ ./bc61
[...... nothing happens here forever!!! .......]
When this happens, press Control-C to kill bc61 without waiting for it to complete its work. Alternately, type stuff, press Return to start a new line, and then press Control-D to pass bc61 some input. (Pressing Control-D at the beginning of a line informs the terminal that bc61’s input has ended; if bc61 attempts to read more characters, it will receive an end-of-file indication. Pressing Control-D anywhere but at the beginning of a blank line will simply release whatever you've typed on the current line so that bc61 can read it.)

Solution
#include <cstdio>

int main() {
    unsigned long n = 0;
    while (fgetc(stdin) != EOF) {
        ++n;
    }
    fprintf(stdout, "%8lu\n", n);
}
We used an unsigned long to keep track of the number of characters. Unsigned numbers cannot be negative, and are often used to count things. The fprintf specifier for unsigned long is lu, where l means long and u means unsigned. It’s required that fprintf specifiers match with the actual arguments, so the compiler would produce a warning if you left the l off. (Try it to see!)

Part 4. Count words and lines (harder)

Objective: Reason through a nontrivial specification for an input/output program.

Write a program wc61 that counts words and lines as well as bytes. A word is a sequence of one or more non-whitespace characters, and a line is zero or more bytes terminated by the newline character \n. The printout should give lines, words, and bytes in that order. Examples:

$ echo foo | ./wc61
       1       1       4
$ ./wc61 < /dev/null
       0       0       0
$ yes | head -c 1000 | ./wc61
     500     500    1000
$ yes Hello | head -c 1000 | ./wc61
     166     167    1000
$ ./hello | ./wc61
       1       2      13
$ echo "  hi  " | ./wc61
       1       1       7

Use the isspace library function, which is declared in the <cctype> header, to detect whitespace.

Though this seems like a simple specification, getting simple things right on all input cases is always hard, and it may take you a while to get the answer right! Make sure you create your own test cases; the echo " WORDS " | ./wc61 pattern is quite useful.

Solution
#include <cstdio>
#include <cctype>

int main() {
    unsigned long nc = 0, nw = 0, nl = 0;
    bool in_spaces = true;
    while (true) {
        int ch = fgetc(stdin);
        if (ch == EOF) {
            break;
        }
        ++nc;

        bool this_space = isspace((unsigned char) ch);   // `ch` would be OK too
        if (in_spaces && !this_space) {
            ++nw;
        }
        in_spaces = this_space;

        if (ch == '\n') {
            ++nl;
        }
    }
    fprintf(stdout, "%8lu %7lu %7lu\n", nl, nw, nc);
}
Some notes:
You must #include <cctype> to gain access to isspace. Reference material for a library function will usually tell you what to include for that library function; for instance, in a manual page:
SYNOPSIS
     #include <ctype.h>

     int
     isspace(int c);
Or on https://cppreference.com, look for the “Defined in header” note.
The C and C++ languages have unfortunate features that can trip up an inexperienced programmer, and that an experienced programmer will treat defensively. Here, we deal with a design flaw in <cctype> functions like isspace. Even though their argument type is int, these functions can crash or cause a security hole if the actual argument has an unusual value like -2. (“The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.”) Because we’ve been bitten by this flaw, we often cast the argument of isspace to unsigned char if it isn’t an unsigned char already. We hope the class will introduce you to these kinds of defensive patterns.

The fprintf format "%8lu %7lu %7lu\n" is better than the more-obvious "%8lu%8lu%8lu\n" because it ensures there’s at least one space between the counts, even when standard input has more than 9999999 words or bytes.

Part 5. Implement a library function (hard)

Objective: Understand a library function’s specification, reason about special cases, and practice pointer notation.

Implement a function mystrstr that has exactly the same behavior as strstr.

This driver program strstr61.cc may be useful for testing:

#include <cstring>
#include <cassert>
#include <cstdio>

char* mystrstr(const char* s1, const char* s2) {
    // Your code here
    return nullptr;
}

int main(int argc, char* argv[]) {
    assert(argc == 3);
    printf("strstr(\"%s\", \"%s\") = %p\n",
           argv[1], argv[2], strstr(argv[1], argv[2]));
    printf("mystrstr(\"%s\", \"%s\") = %p\n",
           argv[1], argv[2], mystrstr(argv[1], argv[2]));
    assert(strstr(argv[1], argv[2]) == mystrstr(argv[1], argv[2]));
}

Examples:

$ make strstr61
$ ./strstr61 "hello, world" "world"
strstr("hello, world", "world") = 0x7ffee2c97bca
mystrstr("hello, world", "world") = 0x7ffee2c97bca
$ ./strstr61 "nfsnafndkanfkan" ""
strstr("nfsnafndkanfkan", "") = 0x7ffee579bbcb
mystrstr("nfsnafndkanfkan", "") = 0x7ffee579bbcb

No matter what arguments you supply, the strstr and mystrstr lines should print the same result.

First, implement mystrstr using array notation (brackets []). Make sure you read the specification for strstr carefully so you catch all special cases; do not call any library functions.

Solution

char* mystrstr(const char* s1, const char* s2) {
    // loop over `s1` (C strings end at the character with value 0)
    for (size_t i = 0; s1[i] != 0; ++i) {
        // loop over `s2` until `s2` ends or differs from `s1`
        size_t j = 0;
        while (s2[j] != 0 && s2[j] == s1[i + j]) {
            ++j;
        }
        // success if we got to the end of `s2`
        if (!s2[j]) {
            return (char*) &s1[i];
        }
    }
    // special case
    if (!s2[0]) {
        return (char*) s1;
    }
    return nullptr;
}

This function would almost meet the specification without the last “special case” if statement. Can you figure out what arguments require that special case?

Second, implement mystrstr exclusively using pointer operations: do not use brackets at any point.

Solution

char* mystrstr(const char* s1, const char* s2) {
    // loop over `s1`
    while (*s1) {
        // loop over `s2` until `s2` ends or differs from `s1`
        const char* s1try = s1;
        const char* s2try = s2;
        while (*s2try && *s2try == *s1try) {
            ++s2try;
            ++s1try;
        }
        // success if we got to the end of `s2`
        if (!*s2try) {
            return (char*) s1;
        }
        ++s1;
    }
    // special case
    if (!*s2) {
        return (char*) s1;
    }
    return nullptr;
}

Part 6. Sort arguments (hard)

Objective: Handle C strings, program arguments, and more complex library functions. Implement algorithms described in pseudocode (such as insertion sort).

Write a program sortargs61 that prints out its command-line arguments, one per line, in dictionary order (also called lexicographic order). Examples:

$ ./sortargs61 a b c
a
b
c
$ ./sortargs61 e d c b a
a
b
c
d
e
$ ./sortargs61 A a A a A a
A
A
A
a
a
a
$ ./sortargs61
$ ./sortargs61 hello, world, this is a program
a
hello,
is
program
this
world,

A C or C++ program’s command-line arguments are passed in an array called argv that is passed to main. Up til now, your main functions have taken no arguments; sortargs61’s main function should take two arguments, as follows:
int main(int argc, char* argv[])
argc is the number of arguments, and argv[I] is the Ith argument represented as a C string. Note that the argv[0] argument is always the name of the program and should generally not be considered a true argument. (Tutorial)

First, use a minimal subset of the C library: use strcmp and an O(n^2) algorithm such as insertion sort or selection sort. (Recall that strcmp(x, y) returns an integer that is less than zero when x < y in lexicographic order, equal to zero when x and y have the same characters in the same order, and greater than zero when x > y in lexicographic order.)

Solution

This solution doesn’t sort the arguments in place; instead, it uses selection sort to repeatedly select and print the smallest argument, removing it after printing.

#include <cstring>
#include <cstdio>

int main(int argc, char* argv[]) {
    while (argc > 1) {
        // find the smallest argument
        int smallest = 1;
        for (int i = 2; i < argc; ++i) {
            if (strcmp(argv[i], argv[smallest]) < 0) {
                smallest = i;
            }
        }

        // print it
        fprintf(stdout, "%s\n", argv[smallest]);

        // remove it from the argument set
        argv[smallest] = argv[argc - 1];
        --argc;
    }
}

This solution sorts the arguments in place using insertion sort.

#include <cstring>
#include <cstdio>

int main(int argc, char* argv[]) {
    // insertion sort
    for (int i = 2; i < argc; ++i) {
        char* x = argv[i];
        int j = i - 1;
        while (j >= 0 && strcmp(argv[j], x) > 0) {
            argv[j + 1] = argv[j];
            --j;
        }
        argv[j + 1] = x;
    }

    for (int i = 1; i < argc; ++i) {
        fprintf(stdout, "%s\n", argv[i]);
    }
}

(More advanced) Now do it using C++ library features: use std::string (the C++ library type for automatically-managed strings), std::vector (the C++ library type for automatically-managed, dynamically-sized arrays), std::sort (the C++ library’s sort algorithm), and C++-style basic input/output. (Those links are to portions of a C++ tutorial, so do follow them if you’re new to C++.)

Solution

#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

int main(int argc, char** argv) {
    // create a vector of the arguments as C++ strings
    std::vector<std::string> args;
    for (int i = 1; i < argc; ++i) {
        args.push_back(argv[i]);
    }

    // sort the vector’s contents
    std::sort(args.begin(), args.end());

    // print the vector’s contents
    for (size_t i = 0; i != args.size(); ++i) {
        std::cout << args[i] << '\n';
    }
}

Concise solution using advanced C++ features

More concise code can be written by using more advanced features, such as auto and range initialization. You don’t need to understand these features for this class—we will explain them in lecture as required—but they do make C++ more pleasant to program.
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

int main(int argc, char** argv) {
    std::vector<std::string> args(&argv[1], &argv[argc]); // range initialization
    std::sort(args.begin(), args.end());
    for (auto& s : args) { // range `for`
        std::cout << s << '\n';
    }
}

Going further

Here are some other things you should be able to do in C++:

Dynamically allocate and free an object with new and delete.
Dynamically allocate and free an array of objects with new[] and delete[].
Define a structure type.
Use the C library’s string functions correctly, including memory management.
Write a function that takes three arguments, a size and two void* pointers to objects, and exchanges the values of the two objects (each of which have the given size).

Recommended reference: C for Java Programmers

This pset was created for CS61.

Problem set 0: Warmup

Overview

Part 0. Course setup and Git

Solution

Part 1. Invoke the compiler

Why?

Part 2. Say hello

Solution

Part 3. Count bytes

Solution

Part 4. Count words and lines (harder)

Solution

Part 5. Implement a library function (hard)

Solution

Solution

Part 6. Sort arguments (hard)

Solution

Solution

Concise solution using advanced C++ features

Going further