Section 1: Pointers, sizes, and alignment

Skills

Sizes of fundamental C types
C layout rules
C alignment
Simple string parsing
sizeof(T), __alignof__(T)
Pointer arithmetic

Section Repo

By now, you should have your VM and GitHub account set up for the course. If you're having infrastructure issues, please see http://cs61.seas.harvard.edu/wiki/2017/Infrastructure or contact course staff.

Today you'll be adding access to the section repo. In your VM, open a terminal and enter

git clone <https://github.com/cs61/cs61-sections.git>

to fetch the section materials.

If you need a reference for git commands, please see http://cs61.seas.harvard.edu/wiki/2017/Git.

Background

Types, Sizing, and Alignment

You may have heard the phrase "data is bits + context." In C, one of the ways that we give context to a sequence of bits is providing a type to each object.

Every type in C has an associated size (in bytes). In order to make pointer arithmetic and hardware design simpler and more efficient, types in C are also "aligned" to certain addresses in memory. For example, a variable of type int cannot start on an arbitrary address. Rather, it must begin at an address divisible by (on x86-64) 4.

The specific size and alignment of types in C depend on a machine's architecture, but for this course, we will assume the x86-64 architecture unless otherwise noted. On this architecture, the following sizes and alignments hold true for the fundamental types of C, where size and alignment are provided in bytes.


Type	Size	Alignment
`(unsigned) char`	1	1
`(unsigned) short`	2	2
`(unsigned) int`	4	4
`(unsigned) long`	8	8
`float`	4	4
`double`	8	8
`T\*`	8	8

Note that all pointers are of size 8 bytes, regardless of what data type they point to, meaning

sizeof(int\*) == sizeof(double\*) == sizeof(void\*) == ...

The above introduces the sizeof() operator in C. As an example, to calculate the size of a variable float foo in C, you can either use sizeof(foo) or sizeof(float). Either is acceptable. Some prefer using the variable name (as opposed to using its type in the sizeof() statement) so that they don't have to go back and change sizeof() statements if they change the type of foo. Others prefer explicitly writing the type for clarity.

We can similarly use __alignof__() with either a type or a variable name to determine the alignment of a type. Note thatsizeof(T) is always a multiple of __alignof__(T).

Pointer Arithmetic

Pointers can explicitly access specific memory addresses on the machine. It is a unique and powerful aspect of the C language that pointers also behave like indexes into an array. For example, given the following memory layout


Address	Value
0x601060UL	16
0x601064UL	5
0x601068UL	-3
0x60106CUL	9

along with int\* ptr = (int\*) 0x601060UL and sizeof(int) == 4 on x86-64, then for addresses we have the following

ptr + 0 == &ptr[0] == ptr == (int\*) 0x601060UL
ptr + 1 == &ptr[1] == (int\*) 0x601064UL
ptr + 2 == &ptr[2] == (int\*) 0x601068UL
ptr + 3 == &ptr[3] == (int\*) 0x60106CUL

and for values we have

ptr[0] == \*ptr == 16
ptr[1] == \*(ptr + 1) == 5
ptr[2] == \*(ptr + 2) == -3
ptr[3] == \*(ptr + 3) == 9

The UL suffix, which means “this integer has type unsigned long”, will avoid warnings on some compilers.

Assignment

In the cs61-sections/s01 directory, you’ll find a program skeleton called c-sizer.c. Complete this program.

c-sizer parses C type specifications from strings. The exercise has two parts. The first part, the spec_size function, calculates the size of a specified types, as a compiler for x86-64 would compute. The second part, the spec_print function, prints a piece of memory according to the given specification.

Part 1: C sizer

A type specification is built from these characters:


`c`	`char` (or `signed char`, `unsigned char`)
`s`	`short` (or `unsigned short`)
`i`	`int` (or `unsigned int`)
`l`	`long` (or `unsigned long`)
`z`	`size_t`
`f`	`float`
`d`	`double`
`p`	a pointer type

A specification with more than one character describes the C struct with those components. For instance, this struct:

struct x {
    char a;
    char b;
    int c;
    char d;
    float e;
    struct x* next;
};

would have the specification "ccicfp". And since this program:

#include <stdio.h>
struct x {
    char a;
    char b;
    int c;
    char d;
    float e;
    struct x* next;
};
int main(int argc, char* argv[]) {
    printf("%zu\n", sizeof(struct x));
}

prints 24 when compiled and run on an x86-64 machine, running ./c-sizer ccicfp should also print 24:

$ make
$ ./c-sizer ccicfp
     24 ccicfp

What does %zu mean? man printf, or search the web for %zu!

To complete this exercise you’ll need to modify the spec_size function as directed. Your function may return 0 if spec is invalid (or it may crash, or do anything else).

You should implement this function using computer arithmetic and your knowledge of C data representation. You can test your knowledge using the C compiler and programs such as the one above. Try also using __alignof__(TYPE).

Part 2: C printer

To practice using pointer arithmetic in combination with size and alignment conventions, you will write a function spec_print(spec, ptr) that takes a struct specification (as above), and a pointer to that struct, and prints its elements. For each member in the struct, the function should print the address of that member (as a hex pointer), the type of that member, and the value of that member.

For example:

struct { char c; int d; } x = { 'A', 24 };
spec_print("ci", &x);

Should print (though addresses may vary):

7ffffffabc char A
7ffffffac0 int 24

If you support unions (see below), print all possible values. For this part of the assignment, it may be helpful to consult man printf to find format specifiers for different types.

Recommended workflow

You should be able to implement and test the basic c-sizer functionality in section. If you have time, move on to an advanced feature.

Trouble getting started? Use the strategy of divide and conquer: break the problem into smaller pieces and iterate. We’ve observed that students often try to solve a problem set or exercise all at once, when it is often easier and more fun to iterate—first design and implement a solution for a small piece of the problem, then test your solution, then go back and solve a little bit more, building on what you’ve already got. For this exercise, we might:

Make c-sizer work for single-character specifications.
Then, make it work for multi-character specifications where all the characters are the same (e.g., ccccc, i, iiiiiii).
Then, make it work for two-character specifications.
Then, solve the general problem.
Next, progress to c-printer using the system you created for the sizer. You may be able to reuse much of your code for finding the sizes and alignment of variables.

Advanced features

Try out the self-check comprehension questions. You’d also get good practice from completing one or more of these advanced features, either in section if you have time or later. Nested structs are particularly instructive. This work is optional.

Support nested structs, which are set off by braces. For example, "c{ci}" declares a struct with two components, a char and a nested struct with a char and an int. You might declare this struct like so: struct x { char c1; struct { char c2; int i; } nested_struct; };
Support unions, which are set off by angle brackets. For example, "<ci>" declares a union with two components, a char and an int.
Support arrays, which are defined by a repeat count after a spec component. For example, "c20" is an array of 20 chars, and "{ci}20" is an array of {ci} structures.
Although you should be able to complete the assignment using computer arithmetic, it is also instructive to complete the assignment using the C compiler as a helper process. Change spec_size (or make a copy of spec_size) to compute its result using the C compiler. It should (1) write a simple C program to a temporary file, (2) run the C compiler on that file, (3) run the corresponding executable, (4) parse its output, and (5) return the result. This will teach you lots of useful things about running helper programs from C. Some interesting library functions (try reading their manuals): mkstemp, popen/pclose, system, unlink (to clean up). gcc’s -x c option may be of use.

Self-check comprehension questions

What is the size for "c" (the size printed for that specification)?
What fundamental types have the same size?
What is the size for "c{ci}" (where {ci} is a nested struct; see above)?
What is the size for "cci"?
Why do the sizes for "c{ci}" and "cci" differ?
What is the size for "<ci>" (where <ci> is a union; see above)?
Why do the sizes for "ci" and "<ci>" differ?
What can you say about the alignments of "XXXX" and "<XXXX>", for all XXXX?

Type in the C program above that prints the size for the specification "ccicfp" and save it in a file, such as mytest.c. Now change the %zu (in the printf line near the end) to %u and compile mytest.c again. What was reported and why?

C Hints

Q1. Is this the right way to iterate over the input string?

for (int i = 0; i < strlen(spec); ++i) {
    switch (spec[i]) {
        // More stuff here
    }
}

This works, but it’s not idiomatic—i.e., not the way most such loops are written in C. The serious flaw: unless the compiler is smart, the loop will compute strlen(spec) once per character, and in C, strlen takes time proportional to the length of the string. This means the loop takes at least O(N²) computation complexity where N is the length of the string, rather than O(N).

In C, you’ll frequently see such loops written using character pointers. The string ends when the character at the pointer equals '\0', a null character. So:

for (const char* cp = spec; *cp != '\0'; ++cp) {
    switch (*cp) {
        // More stuff here
    }
}

Or, equivalently:

for (const char* cp = spec; *cp; ++cp) {
    switch (*cp) {
        // More stuff here
    }
}

But this is reasonable too, and it’s easier to read if you’re more used to languages like Java where strings are not character arrays.

int len = strlen(spec);
for (int i = 0; i < len; ++i) {
    switch (spec[i]) {
        // More stuff here
    }
}

(The version with \*cp might still be faster than the version with strlen, although both versions have the same O(N) minimum computational complexity. Why?)

Q2. Is it OK to iterate over the specification string more than once?

While it is acceptable, it's unnecessary -- see if you can do the calculations as you go.

Q3. I calculate the largest alignment of any field and then multiply that by the number of characters in the specification. Is that correct?

No, that doesn’t quite work. Try testing with strings "csi" and "ccsi". These should have the same size, 8.

Q4. Should I assume an x86 or x86_64 system?

In this class, we assume x86_64. However, you should attempt to implement the assignment such that you do not have to hardcode the sizes of different data types.