Problem set 0 – CS 61 2020

The goal of this ungraded problem set is to warm up your skills in the programming language we use, C++, and the kinds of programs we’ll write. The programming is not intended to be difficult, but don’t settle for sloppy results. The course aims to teach you robust, secure, and understandable coding, as well as fast and correct coding. Solutions are provided, and you should look at them, but work through the problems yourself first.

Overview

This course teaches you the fundamentals of computer systems and systems programming, using C++ as its primary implementation language. C++ is a popular language for low-level systems coding, and unlike C, its predecessor, it comes with an extensive data structure library and high-level abstraction facilities.

Though the course uses C++, it is about systems programming. We don’t dive deep into the language, we do not use its object-oriented features, and we attempt to avoid its most difficult byways.

Some familiarity with C++’s data structures and libraries will be useful. You can get this from free resources on the web, although a textbook and reference is better. Some web resources are almost overwhelmingly detailed, but don’t despair! It often works to scroll through reference pages for example code, which can be more concise and clear than the English-language reference material.

C++ tutorials from LearnCpp.com
C++ tutorial from cplusplus.com
C++ tutorial from w3schools.com

LearnCpp.com has an extensive set of tutorials. The cplusplus.com one is a good short tutorial.
C++ reference from cplusplus.com
C++ reference from cppreference.com

As of 2019, cplusplus.com’s text is more approachable, though its layout is uglier and its technical material slightly out of date. Eddie prefers cppreference.com.

It will also be useful to have some familiarity with Unix shell syntax. You should know what standard input and standard output mean, and how to perform simple redirections. Specifically:

A simple shell command like prog1 runs prog1 in the current context: its standard input and standard output are inherited from the shell, which usually means that its input reads from the keyboard and its output is printed to the terminal.
Typing prog1 < IN redirects prog1’s standard input to come from a file. In this instance, prog1 will observe the contents of the IN file when reading from its standard input.
Similarly, prog1 > OUT redirects prog1’s standard output to go to a file. If the file OUT already exists, its contents are overwritten.
prog1 < IN > OUT performs both redirections.
prog1 | prog2 (pronounced “prog1 pipe to prog2”) redirects prog2’s standard input to read from prog1’s standard output. Thus, all the bytes written by prog1 are read by prog2 in order.

Part 1. Byte counting

Write a program wc61 that counts the number of bytes in its standard input and prints the result to standard output. Examples:

$ echo foo | ./wc61
       4
$ ./wc61 < /dev/null
       0
$ yes | head -c 1000 | ./wc61
    1000

Use the fgetc and fprintf library functions for C-style I/O.

How to write, compile, and run a simple C++ program

For most of our problem sets, our handout code will come with a Makefile that builds your code for you. But for pset 0, you should compile and run your code yourself, using command line tools or your favorite IDE (Xcode, Visual Studio, Atom, Eclipse, etc.). Here are instructions for command line tools that should work on Mac OS X or Linux, once you have installed a C++ compiler.
Write all your code in a file called wc61.cc.
Compile this code into an executable—a program that can be run—by invoking the C++ compiler:
$ c++ -std=gnu++1z -Wall -g -O3 wc61.cc -o wc61
c++ is the name of our C++ compiler (yours might be clang++ or g++, but they all take similar arguments). Here’s what the arguments mean: -std=gnu++1z tells the compiler to use C++17, the current version of the C++ language standard. -Wall turns on a large set of warnings, so that the compiler tells you when it thinks you might be making a mistake; it’s very important to pay attention to these warnings and fix them. -g tells the compiler to include debugging support in the executable output. -O3 tells it to perform code optimizations as it compiles. wc61.cc is the name of the source file. And -o wc61 tells the compiler to store its output—the runnable executable—in a file called wc61 in the current directory.

The compiler works silently unless it finds a warning or error, so you should be excited if it exits without any visible output!
Once the compilation succeeds (i.e., produces wc61 without warnings), test your code by running ./wc61 at the shell prompt. The initial ./ tells the shell to run the wc61 executable located in the current directory; for security reasons, the shell normally ignores programs lying around in the current directory.

Running ./wc61 with no arguments will appear to do nothing! The shell starts the ./wc61 program, which reads from standard input, which is connected to your keyboard:
$ ./wc61
[...... nothing happens here forever!!! .......]
This sometimes still confuses us after 30+ years of programming. Press Control-C to kill the running program without waiting for it to complete its work, or type some characters, press Return, and then press Control-D to run it on some input. If you press Control-D anywhere but at the beginning of a blank line, you have simply released whatever you've typed on the current line so that wc61 can read it. Typing a Return also sends what you've typed to wc61. Typing Control-D when there's no typing waiting to be sent to wc61 tells the shell that it can alert wc61 that there is no more input coming.

Solution
#include <cstdio>

int main() {
    unsigned long n = 0;
    while (true) {
        if (fgetc(stdin) == EOF) {
            break;
        }
        ++n;
    }
    fprintf(stdout, "%8lu\n", n);
}
Some notes:

In C++, #include <cstdio> provides access to the C standard I/O library. (In C, this would be written #include <stdio.h>, but #include <cstdio> is more idiomatic in C++.)

We used an unsigned long to keep track of the number of characters. Unsigned numbers cannot be negative, and are often used to count things. The fprintf specifier for unsigned long is lu, where l means long and u means unsigned. It’s required that fprintf specifiers match with the actual arguments, so the compiler would produce a warning if you left the l off. (Try it to see!)

Part 2. Word and line counting

Change wc61 to count words and lines as well as bytes. A word is a sequence of one or more non-whitespace characters, and a line is zero or more bytes terminated by the newline character \n. The printout should give lines, words, and bytes in that order. Examples:

$ echo foo | ./wc61
       1       1       4
$ ./wc61 < /dev/null
       0       0       0
$ yes | head -c 1000 | ./wc61
     500     500    1000
$ yes Hello | head -c 1000 | ./wc61
     166     167    1000

Use the isspace library function to detect whitespace.

Solution
#include <cstdio>
#include <cctype>

int main() {
    unsigned long nc = 0, nw = 0, nl = 0;
    bool inspace = true;
    while (true) {
        int ch = fgetc(stdin);
        if (ch == EOF) {
            break;
        }
        ++nc;

        bool thisspace = isspace((unsigned char) ch);
        if (inspace && !thisspace) {
            ++nw;
        }
        inspace = thisspace;

        if (ch == '\n') {
            ++nl;
        }
    }
    fprintf(stdout, "%8lu %7lu %7lu\n", nl, nw, nc);
}
Some notes:
You must #include <cctype> to gain access to isspace. Reference material for a library function will usually tell you what to include for that library function; for instance, in a manual page:
SYNOPSIS
 #include <ctype.h>

 int
 isspace(int c);
Or on cppreference.com, look for the “Defined in header” note.
The C and C++ languages have unfortunate features that can trip up an inexperienced programmer, and that an experienced programmer will treat defensively. Here, we deal with a design flaw in <cctype> functions like isspace. Even though their argument type is int, these functions can crash or cause a security hole if the actual argument has an unusual value like -2. (“The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.”) Because we’ve been bitten by this flaw, we always cast the argument of isspace to unsigned char if it isn’t an unsigned char already. We hope the class will introduce you to these kinds of defensive patterns.

The fprintf format "%8lu %7lu %7lu\n" is better than the more-obvious "%8lu%8lu%8lu\n" because it ensures there’s at least one space between the counts, even when standard input has more than 9999999 words or bytes.

Part 3. File concatenation

Write a program cat61 that behaves like simple Unix cat: it should read each file named on the command line in order, printing their contents to the standard output.

When run with no arguments, cat61 should read from standard input; and an argument that equals a single dash - means “read from standard input here.” Many cat implementations support a large variety of flag arguments, but yours doesn’t need to.

Use the fopen, fclose, fread, and fwrite library functions for C-style I/O. Read and write more than one character at a time.

It’s important when programming to handle errors, such as incorrect file names and errors when reading or writing. We don’t specify exactly what you must do, but we expect that your program will not crash on any input. Document the error handling you do so that others who read or use your code know about the decisions you made. (You may want to run experiments to see how the Unix cat utility handles errors; its choices are probably good to emulate.)

#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <cerrno>

static void transfer(const char* filename) {
    // open input file
    FILE* in;
    if (strcmp(filename, "-") == 0) {
        in = stdin;
    } else {
        in = fopen(filename, "r");
    }
    if (!in) {
        fprintf(stderr, "%s: %s\n", filename, strerror(errno));
        exit(EXIT_FAILURE);
    }

    // transfer data until end-of-file or error
    while (!feof(in) && !ferror(in) && !ferror(stdout)) {
        char buf[BUFSIZ];
        size_t nr = fread(buf, 1, BUFSIZ, in);
        (void) fwrite(buf, 1, nr, stdout);
    }

    // exit on error
    if (ferror(in) || ferror(stdout)) {
        fprintf(stderr, "%s\n", strerror(errno));
        exit(EXIT_FAILURE);
    }

    // close input file if required
    if (in != stdin) {
        fclose(in);
    }
}

int main(int argc, char** argv) {
    if (argc == 1) {
        transfer("-");
    }
    for (int i = 1; i < argc; ++i) {
        transfer(argv[i]);
    }
}
We exit the program with an error message as soon as an error is encountered. (exit(EXIT_FAILURE), which is the same as exit(1), exits the program with the conventionally-agreed “error status” of 1.) The strerror function and errno variable are used to print a meaningful error message; for example, ./cat61 /fdasnkjfnakjndfas prints /fdasnkjfnakjndfas: No such file or directory.

The (void) cast on fwrite is worth noting. fwrite returns the number of elements written, which “may be less than nitems if a write error is encountered.” It’s generally dangerous to ignore return values when they can indicate errors. However, this return value is safe to ignore since the loop checks ferror(stdout) explicitly. The (void) cast tells readers—and the compiler!—that we’re ignoring the return value on purpose.

The BUFSIZ constant is defined in <cstdio> as a sensible size for I/O buffers. Typical values for BUFSIZ are 8192 and 1024.

Part 4. Argument sorting

Write a program sortargs61 that prints out its command-line arguments, one per line, in sorted order. Examples:

$ ./sortargs61 a b c
a
b
c
$ ./sortargs61 e d c b a
a
b
c
d
e
$ ./sortargs61 A a A a A a
A
A
A
a
a
a
$ ./sortargs61
$ ./sortargs61 hello, world, this is a program
a
hello,
is
program
this
world,

First, do it using a minimal subset of the C library: use strcmp and an $O(n^2)$ algorithm.

Solution

#include <cstring>
#include <cstdio>

int main(int argc, char** argv) {
    while (argc > 1) {
        // find the smallest argument
        int smallest = 1;
        for (int i = 2; i < argc; ++i) {
            if (strcmp(argv[i], argv[smallest]) < 0) {
                smallest = i;
            }
        }

        // print it
        fprintf(stdout, "%s\n", argv[smallest]);

        // remove it from the argument set
        argv[smallest] = argv[argc - 1];
        --argc;
    }
}

Then, do it using C++ library features: use std::string (the C++ library type for automatically-managed strings), std::vector (the C++ library type for automatically-managed, dynamically-sized arrays), std::sort (the C++ library’s sort algorithm), and C++-style basic input/output. (Those links are to portions of a C++ tutorial, so do follow them if you’re new to C++.)

Solution

#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

int main(int argc, char** argv) {
    // create a vector of the argument strings
    // (argv[0] holds the program name, which doesn’t count)
    std::vector<std::string> args;
    for (int i = 1; i < argc; ++i) {
        args.push_back(argv[i]);
    }

    // sort the vector’s contents
    std::sort(args.begin(), args.end());

    // print the vector’s contents
    for (size_t i = 0; i != args.size(); ++i) {
        std::cout << args[i] << '\n';
    }
}

Concise solution using advanced C++ features

More concise code can be written by using more advanced features, such as auto and range initialization. You don’t need to understand these features for this class—we will explain them in lecture as required—but they do make C++ more pleasant to program.
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

int main(int argc, char** argv) {
    std::vector<std::string> args(&argv[1], &argv[argc]); // range initialization
    std::sort(args.begin(), args.end());
    for (auto& s : args) { // range `for`
        std::cout << s << '\n';
    }
}

Part 5. `strstr`

Implement a function mystrstr that has exactly the same behavior as strstr.

This driver program may be useful for testing:

#include <cstring>
#include <cassert>
#include <cstdio>

char* mystrstr(const char* s1, const char* s2) {
    // Your code here
    return nullptr;
}

int main(int argc, char** argv) {
    assert(argc == 3);
    printf("strstr(\"%s\", \"%s\") = %p\n",
           argv[1], argv[2], strstr(argv[1], argv[2]));
    printf("mystrstr(\"%s\", \"%s\") = %p\n",
           argv[1], argv[2], mystrstr(argv[1], argv[2]));
    assert(strstr(argv[1], argv[2]) == mystrstr(argv[1], argv[2]));
}

Examples:

$ ./strstr61 "hello, world" "world"
strstr("hello, world", "world") = 0x7ffee2c97bca
mystrstr("hello, world", "world") = 0x7ffee2c97bca
$ ./strstr61 "nfsnafndkanfkan" ""
strstr("nfsnafndkanfkan", "") = 0x7ffee579bbcb
mystrstr("nfsnafndkanfkan", "") = 0x7ffee579bbcb

(The hexadecimal numbers will differ from run to run; we’ll find out why in Unit 1.)

First, implement mystrstr using array notation (brackets []). Make sure you read the specification for strstr carefully so you catch all special cases; do not call any library functions.

Solution

char* mystrstr(const char* s1, const char* s2) {
    // loop over `s1` (C strings end at the character with value 0)
    for (size_t i = 0; s1[i]; ++i) {
        // loop over `s2` until `s2` ends or differs from `s1`
        size_t j = 0;
        while (s2[j] && s2[j] == s1[i + j]) {
            ++j;
        }
        // success if we got to the end of `s2`
        if (!s2[j]) {
            return (char*) &s1[i];
        }
    }
    // special case
    if (!s2[0]) {
        return (char*) s1;
    }
    return nullptr;
}

This function would almost meet the specification without the last “special case” if statement. Can you figure out what arguments require that special case?

Second, implement mystrstr exclusively using pointer operations: do not use brackets at any point. (The purpose of this exercise is to re-familiarize you with C/C++ pointer notation.)

Solution

char* mystrstr(const char* s1, const char* s2) {
    // loop over `s1`
    while (*s1) {
        // loop over `s2` until `s2` ends or differs from `s1`
        const char* s1try = s1;
        const char* s2try = s2;
        while (*s2try && *s2try == *s1try) {
            ++s2try;
            ++s1try;
        }
        // success if we got to the end of `s2`
        if (!*s2try) {
            return (char*) s1;
        }
        ++s1;
    }
    // special case
    if (!*s2) {
        return (char*) s1;
    }
    return nullptr;
}

Going further

Here are some other things you should be able to do in C++:

Write the “Hello, World” program.
Write a function.
Dynamically allocate and free an object with new and delete.
Dynamically allocate and free an array of objects with new[] and delete[].
Define a structure type.
Use the C library’s string functions correctly, including memory management.
Write a function that takes three arguments, a size and two void* pointers to objects, and exchanges the values of the two objects (each of which have the given size).

Recommended reference: C for Java Programmers

This pset was created for CS61.

Problem set 0: Warmup

Overview

Part 1. Byte counting

How to write, compile, and run a simple C++ program

Solution

Part 2. Word and line counting

Solution

Part 3. File concatenation

Part 4. Argument sorting

Solution

Solution

Concise solution using advanced C++ features

Part 5. strstr

Solution

Solution

Going further

Part 5. `strstr`