The goal of this ungraded problem set is to warm up your skills in the programming language we use, C++, and the kinds of programs we’ll write. The programming is not intended to be difficult, but don’t settle for sloppy results. The course aims to teach you robust, secure, and understandable coding, as well as fast and correct coding. Solutions are provided, and you should look at them, but work through the problems yourself first.
Overview
This course teaches you the fundamentals of computer systems and systems programming, using C++ as its primary implementation language. C++ is a popular language for low-level systems coding, and unlike C, its predecessor, it comes with an extensive data structure library and high-level abstraction facilities.
Though the course uses C++, it is about systems programming. We don’t dive deep into the language, we do not use its object-oriented features, and we attempt to avoid its most difficult byways.
Some familiarity with C++’s data structures and libraries will be useful. You can get this from free resources on the web, although a textbook and reference is better. Some web resources are almost overwhelmingly detailed, but don’t despair! It often works to scroll through reference pages for example code, which can be more concise and clear than the English-language reference material.
C++ tutorial from w3schools.com
LearnCpp.com has an extensive set of tutorials. The cplusplus.com one is a good short tutorial.
C++ reference from cppreference.com
As of 2019, cplusplus.com’s text is more approachable, though its layout is uglier and its technical material slightly out of date. Eddie prefers cppreference.com.
It will also be useful to have some familiarity with Unix shell syntax. You should know what standard input and standard output mean, and how to perform simple redirections. Specifically:
A simple shell command like
prog1
runsprog1
in the current context: its standard input and standard output are inherited from the shell, which usually means that its input reads from the keyboard and its output is printed to the terminal.Typing
prog1 < IN
redirectsprog1
’s standard input to come from a file. In this instance,prog1
will observe the contents of theIN
file when reading from its standard input.Similarly,
prog1 > OUT
redirectsprog1
’s standard output to go to a file. If the fileOUT
already exists, its contents are overwritten.prog1 < IN > OUT
performs both redirections.prog1 | prog2
(pronounced “prog1
pipe toprog2
”) redirectsprog2
’s standard input to read fromprog1
’s standard output. Thus, all the bytes written byprog1
are read byprog2
in order.
Part 1. Byte counting
Write a program wc61
that counts the number of bytes in its standard
input and prints the result to standard output. Examples:
$ echo foo | ./wc61
4
$ ./wc61 < /dev/null
0
$ yes | head -c 1000 | ./wc61
1000
Use the fgetc
and fprintf
library functions for
C-style I/O.
How to write, compile, and run a simple C++ program
For most of our problem sets, our handout code will come with a Makefile that builds your code for you. But for pset 0, you should compile and run your code yourself, using command line tools or your favorite IDE (Xcode, Visual Studio, Atom, Eclipse, etc.). Here are instructions for command line tools that should work on Mac OS X or Linux, once you have installed a C++ compiler.
Write all your code in a file called
wc61.cc
.Compile this code into an executable—a program that can be run—by invoking the C++ compiler:
$ c++ -std=gnu++1z -Wall -g -O3 wc61.cc -o wc61
c++
is the name of our C++ compiler (yours might beclang++
org++
, but they all take similar arguments). Here’s what the arguments mean:-std=gnu++1z
tells the compiler to use C++17, the current version of the C++ language standard.-Wall
turns on a large set of warnings, so that the compiler tells you when it thinks you might be making a mistake; it’s very important to pay attention to these warnings and fix them.-g
tells the compiler to include debugging support in the executable output.-O3
tells it to perform code optimizations as it compiles.wc61.cc
is the name of the source file. And-o wc61
tells the compiler to store its output—the runnable executable—in a file calledwc61
in the current directory.The compiler works silently unless it finds a warning or error, so you should be excited if it exits without any visible output!
Once the compilation succeeds (i.e., produces
wc61
without warnings), test your code by running./wc61
at the shell prompt. The initial./
tells the shell to run thewc61
executable located in the current directory; for security reasons, the shell normally ignores programs lying around in the current directory.Running
./wc61
with no arguments will appear to do nothing! The shell starts the./wc61
program, which reads from standard input, which is connected to your keyboard:$ ./wc61 [...... nothing happens here forever!!! .......]
This sometimes still confuses us after 30+ years of programming. Press Control-C to kill the running program without waiting for it to complete its work, or type some characters, press Return, and then press Control-D to run it on some input. If you press Control-D anywhere but at the beginning of a blank line, you have simply released whatever you've typed on the current line so that
wc61
can read it. Typing a Return also sends what you've typed towc61
. Typing Control-D when there's no typing waiting to be sent towc61
tells the shell that it can alertwc61
that there is no more input coming.
Solution
#include <cstdio> int main() { unsigned long n = 0; while (true) { if (fgetc(stdin) == EOF) { break; } ++n; } fprintf(stdout, "%8lu\n", n); }
Some notes:
- In C++,
#include <cstdio>
provides access to the C standard I/O library. (In C, this would be written#include <stdio.h>
, but#include <cstdio>
is more idiomatic in C++.)- We used an
unsigned long
to keep track of the number of characters. Unsigned numbers cannot be negative, and are often used to count things. Thefprintf
specifier forunsigned long
islu
, wherel
means long andu
means unsigned. It’s required thatfprintf
specifiers match with the actual arguments, so the compiler would produce a warning if you left thel
off. (Try it to see!)
Part 2. Word and line counting
Change wc61
to count words and lines as well as bytes. A word is a sequence
of one or more non-whitespace characters, and a line is zero or more bytes
terminated by the newline character \n
. The printout should give lines,
words, and bytes in that order. Examples:
$ echo foo | ./wc61
1 1 4
$ ./wc61 < /dev/null
0 0 0
$ yes | head -c 1000 | ./wc61
500 500 1000
$ yes Hello | head -c 1000 | ./wc61
166 167 1000
Use the isspace
library function to detect whitespace.
Solution
#include <cstdio> #include <cctype> int main() { unsigned long nc = 0, nw = 0, nl = 0; bool inspace = true; while (true) { int ch = fgetc(stdin); if (ch == EOF) { break; } ++nc; bool thisspace = isspace((unsigned char) ch); if (inspace && !thisspace) { ++nw; } inspace = thisspace; if (ch == '\n') { ++nl; } } fprintf(stdout, "%8lu %7lu %7lu\n", nl, nw, nc); }
Some notes:
You must
#include <cctype>
to gain access toisspace
. Reference material for a library function will usually tell you what to include for that library function; for instance, in a manual page:SYNOPSIS #include <ctype.h> int isspace(int c);
Or on cppreference.com, look for the “Defined in header” note.
The C and C++ languages have unfortunate features that can trip up an inexperienced programmer, and that an experienced programmer will treat defensively. Here, we deal with a design flaw in
<cctype>
functions likeisspace
. Even though their argument type isint
, these functions can crash or cause a security hole if the actual argument has an unusual value like -2. (“The behavior is undefined if the value ofch
is not representable asunsigned char
and is not equal toEOF
.”) Because we’ve been bitten by this flaw, we always cast the argument ofisspace
tounsigned char
if it isn’t anunsigned char
already. We hope the class will introduce you to these kinds of defensive patterns.The
fprintf
format"%8lu %7lu %7lu\n"
is better than the more-obvious"%8lu%8lu%8lu\n"
because it ensures there’s at least one space between the counts, even when standard input has more than 9999999 words or bytes.
Part 3. File concatenation
Write a program cat61
that behaves like simple Unix
cat
: it should read each
file named on the command line in order, printing their contents to the
standard output.
When run with no arguments, cat61
should read from standard input; and an
argument that equals a single dash -
means “read from standard input here.”
Many cat
implementations support a large variety of flag arguments, but
yours doesn’t need to.
Use the fopen
, fclose
, fread
, and
fwrite
library functions for C-style I/O. Read and write more than
one character at a time.
It’s important when programming to handle errors, such as incorrect file names
and errors when reading or writing. We don’t specify exactly what you must
do, but we expect that your program will not crash on any input.
Document the error handling you do so that others who read or use your
code know about the decisions you made.
(You may want to run experiments to see how the
Unix cat
utility handles errors; its choices are probably good to emulate.)
#include <cstdio> #include <cstring> #include <cstdlib> #include <cerrno> static void transfer(const char* filename) { // open input file FILE* in; if (strcmp(filename, "-") == 0) { in = stdin; } else { in = fopen(filename, "r"); } if (!in) { fprintf(stderr, "%s: %s\n", filename, strerror(errno)); exit(EXIT_FAILURE); } // transfer data until end-of-file or error while (!feof(in) && !ferror(in) && !ferror(stdout)) { char buf[BUFSIZ]; size_t nr = fread(buf, 1, BUFSIZ, in); (void) fwrite(buf, 1, nr, stdout); } // exit on error if (ferror(in) || ferror(stdout)) { fprintf(stderr, "%s\n", strerror(errno)); exit(EXIT_FAILURE); } // close input file if required if (in != stdin) { fclose(in); } } int main(int argc, char** argv) { if (argc == 1) { transfer("-"); } for (int i = 1; i < argc; ++i) { transfer(argv[i]); } }
We exit the program with an error message as soon as an error is encountered. (
exit(EXIT_FAILURE)
, which is the same asexit(1)
, exits the program with the conventionally-agreed “error status” of 1.) Thestrerror
function anderrno
variable are used to print a meaningful error message; for example,./cat61 /fdasnkjfnakjndfas
prints/fdasnkjfnakjndfas: No such file or directory
.The
(void)
cast onfwrite
is worth noting.fwrite
returns the number of elements written, which “may be less thannitems
if a write error is encountered.” It’s generally dangerous to ignore return values when they can indicate errors. However, this return value is safe to ignore since the loop checksferror(stdout)
explicitly. The(void)
cast tells readers—and the compiler!—that we’re ignoring the return value on purpose.The
BUFSIZ
constant is defined in<cstdio>
as a sensible size for I/O buffers. Typical values forBUFSIZ
are 8192 and 1024.
Part 4. Argument sorting
Write a program sortargs61
that prints out its
command-line arguments,
one per line, in sorted order. Examples:
$ ./sortargs61 a b c
a
b
c
$ ./sortargs61 e d c b a
a
b
c
d
e
$ ./sortargs61 A a A a A a
A
A
A
a
a
a
$ ./sortargs61
$ ./sortargs61 hello, world, this is a program
a
hello,
is
program
this
world,
First, do it using a minimal subset of the C library: use
strcmp
and an
\(O(n^2)\) algorithm.
Solution
#include <cstring> #include <cstdio> int main(int argc, char** argv) { while (argc > 1) { // find the smallest argument int smallest = 1; for (int i = 2; i < argc; ++i) { if (strcmp(argv[i], argv[smallest]) < 0) { smallest = i; } } // print it fprintf(stdout, "%s\n", argv[smallest]); // remove it from the argument set argv[smallest] = argv[argc - 1]; --argc; } }
Then, do it using C++ library features: use
std::string
(the C++ library type for automatically-managed strings),
std::vector
(the C++ library type for automatically-managed, dynamically-sized arrays),
std::sort
(the C++ library’s sort algorithm),
and C++-style basic
input/output. (Those links are to portions of a C++
tutorial, so do follow them if you’re new to C++.)
Solution
#include <string> #include <vector> #include <algorithm> #include <iostream> int main(int argc, char** argv) { // create a vector of the argument strings // (argv[0] holds the program name, which doesn’t count) std::vector<std::string> args; for (int i = 1; i < argc; ++i) { args.push_back(argv[i]); } // sort the vector’s contents std::sort(args.begin(), args.end()); // print the vector’s contents for (size_t i = 0; i != args.size(); ++i) { std::cout << args[i] << '\n'; } }
Concise solution using advanced C++ features
More concise code can be written by using more advanced features, such as
auto
and range initialization. You don’t need to understand these features for this class—we will explain them in lecture as required—but they do make C++ more pleasant to program.#include <string> #include <vector> #include <algorithm> #include <iostream> int main(int argc, char** argv) { std::vector<std::string> args(&argv[1], &argv[argc]); // range initialization std::sort(args.begin(), args.end()); for (auto& s : args) { // range `for` std::cout << s << '\n'; } }
Part 5. strstr
Implement a function mystrstr
that has exactly the same behavior as
strstr
.
This driver program may be useful for testing:
#include <cstring>
#include <cassert>
#include <cstdio>
char* mystrstr(const char* s1, const char* s2) {
// Your code here
return nullptr;
}
int main(int argc, char** argv) {
assert(argc == 3);
printf("strstr(\"%s\", \"%s\") = %p\n",
argv[1], argv[2], strstr(argv[1], argv[2]));
printf("mystrstr(\"%s\", \"%s\") = %p\n",
argv[1], argv[2], mystrstr(argv[1], argv[2]));
assert(strstr(argv[1], argv[2]) == mystrstr(argv[1], argv[2]));
}
Examples:
$ ./strstr61 "hello, world" "world"
strstr("hello, world", "world") = 0x7ffee2c97bca
mystrstr("hello, world", "world") = 0x7ffee2c97bca
$ ./strstr61 "nfsnafndkanfkan" ""
strstr("nfsnafndkanfkan", "") = 0x7ffee579bbcb
mystrstr("nfsnafndkanfkan", "") = 0x7ffee579bbcb
(The hexadecimal numbers will differ from run to run; we’ll find out why in Unit 1.)
First, implement mystrstr
using array notation (brackets []
). Make
sure you read the specification for strstr
carefully so you catch
all special cases; do not call any library functions.
Solution
char* mystrstr(const char* s1, const char* s2) { // loop over `s1` (C strings end at the character with value 0) for (size_t i = 0; s1[i]; ++i) { // loop over `s2` until `s2` ends or differs from `s1` size_t j = 0; while (s2[j] && s2[j] == s1[i + j]) { ++j; } // success if we got to the end of `s2` if (!s2[j]) { return (char*) &s1[i]; } } // special case if (!s2[0]) { return (char*) s1; } return nullptr; }
This function would almost meet the specification without the last “special case” if statement. Can you figure out what arguments require that special case?
Second, implement mystrstr
exclusively using pointer operations: do not
use brackets at any point. (The purpose of this exercise is to re-familiarize
you with C/C++ pointer notation.)
Solution
char* mystrstr(const char* s1, const char* s2) { // loop over `s1` while (*s1) { // loop over `s2` until `s2` ends or differs from `s1` const char* s1try = s1; const char* s2try = s2; while (*s2try && *s2try == *s1try) { ++s2try; ++s1try; } // success if we got to the end of `s2` if (!*s2try) { return (char*) s1; } ++s1; } // special case if (!*s2) { return (char*) s1; } return nullptr; }
Going further
Here are some other things you should be able to do in C++:
- Write the “Hello, World” program.
- Write a function.
- Dynamically allocate and free an object with
new
anddelete
. - Dynamically allocate and free an array of objects with
new[]
anddelete[]
. - Define a structure type.
- Use the C library’s string functions correctly, including memory management.
- Write a function that takes three arguments, a size and two
void*
pointers to objects, and exchanges the values of the two objects (each of which have the given size).
Recommended reference: C for Java Programmers
This pset was created for CS61.