Section 1: Pointers, sizes, and alignment
Skills
- Sizes of fundamental C types
- C layout rules
- C alignment
- Simple string parsing
sizeof(T)
,__alignof__(T)
- Pointer arithmetic
Section Repo
By now, you should have your VM and GitHub account set up for the course. If you're having infrastructure issues, please see http://cs61.seas.harvard.edu/wiki/2018/Infrastructure or contact course staff.
Today you'll be adding access to the section repo. In your VM, open a terminal and enter
git clone <https://github.com/cs61/cs61-sections.git>
to fetch the section materials.
If you need a reference for git commands, please see http://cs61.seas.harvard.edu/wiki/2018/Git.
Background
Types, Sizing, and Alignment
You may have heard the phrase "data is bits + context." In C, one of the ways that we give context to a sequence of bits is providing a type to each object.
Every type in C has an associated size (in bytes). In order to make
pointer arithmetic and hardware design simpler and more efficient, types
in C are also "aligned" to certain addresses in memory. For example, a
variable of type int
cannot start on an arbitrary address. Rather,
it must begin at an address divisible by (on x86-64) 4.
The specific size and alignment of types in C depend on a machine's architecture, but for this course, we will assume the x86-64 architecture unless otherwise noted. On this architecture, the following sizes and alignments hold true for the fundamental types of C, where size and alignment are provided in bytes.
Type | Size | Alignment |
(unsigned) char |
1 | 1 |
(unsigned) short |
2 | 2 |
(unsigned) int |
4 | 4 |
(unsigned) long |
8 | 8 |
float |
4 | 4 |
double |
8 | 8 |
T\* |
8 | 8 |
Note that all pointers are of size 8 bytes, regardless of what data type they point to, meaning
sizeof(int\*) == sizeof(double\*) == sizeof(void\*) == ...
The above introduces the sizeof()
operator in C. As an example, to
calculate the size of a variable float foo
in C, you can either use
sizeof(foo)
or sizeof(float)
. Either is acceptable. Some prefer
using the variable name (as opposed to using its type in the
sizeof()
statement) so that they don't have to go back and change
sizeof()
statements if they change the type of foo
. Others
prefer explicitly writing the type for clarity.
We can similarly use __alignof__()
with either a type or a
variable name to determine the alignment of a type. Note
thatsizeof(T)
is always a multiple of __alignof__(T)
.
Pointer Arithmetic
Pointers can explicitly access specific memory addresses on the machine. It is a unique and powerful aspect of the C language that pointers also behave like indexes into an array. For example, given the following memory layout
Address | Value |
0x601060UL | 16 |
0x601064UL | 5 |
0x601068UL | -3 |
0x60106CUL | 9 |
along with int\* ptr = (int\*) 0x601060UL
and sizeof(int) == 4
on x86-64, then for addresses we have the following
ptr + 0 == &ptr[0] == ptr == (int\*) 0x601060UL
ptr + 1 == &ptr[1] == (int\*) 0x601064UL
ptr + 2 == &ptr[2] == (int\*) 0x601068UL
ptr + 3 == &ptr[3] == (int\*) 0x60106CUL
and for values we have
ptr[0] == \*ptr == 16
ptr[1] == \*(ptr + 1) == 5
ptr[2] == \*(ptr + 2) == -3
ptr[3] == \*(ptr + 3) == 9
The UL
suffix, which means “this integer has type unsigned long
”, will avoid warnings on some compilers.
See Also
Lecture 2 notes, some slides from last year, section 3.8.2 of CS:APP3e
Assignment
In the cs61-sections/s01
directory, you’ll find a program skeleton
called c-sizer.c
. Complete this program.
c-sizer
parses C type specifications from strings. The exercise
has two parts. The first part, the spec_size
function, calculates
the size of a specified types, as a compiler for x86-64 would compute.
The second part, the spec_print
function, prints a piece of memory
according to the given specification.
Part 1: C sizer
A type specification is built from these characters:
c |
char (or signed char , unsigned char ) |
s |
short (or unsigned short ) |
i |
int (or unsigned int ) |
l |
long (or unsigned long ) |
z |
size_t |
f |
float |
d |
double |
p |
a pointer type |
A specification with more than one character describes the C struct
with those components. For instance, this struct:
struct x {
char a;
char b;
int c;
char d;
float e;
struct x* next;
};
would have the specification "ccicfp"
. And since this program:
#include <stdio.h>
struct x {
char a;
char b;
int c;
char d;
float e;
struct x* next;
};
int main(int argc, char* argv[]) {
printf("%zu\n", sizeof(struct x));
}
prints 24 when compiled and run on an x86-64 machine, running
./c-sizer ccicfp
should also print 24:
$ make
$ ./c-sizer ccicfp
24 ccicfp
What does %zu
mean? man printf
, or search the web for %zu
!
To complete this exercise you’ll need to modify the spec_size
function as directed. Your function may return 0 if spec
is invalid
(or it may crash, or do anything else).
You should implement this function using computer arithmetic and your
knowledge of C data representation. You can test your knowledge using
the C compiler and programs such as the one above. Try also using
__alignof__(TYPE)
.
Part 2: C printer
To practice using pointer arithmetic in combination with size and
alignment conventions, you will write a function spec_print(spec, ptr)
that takes a struct specification (as above), and a pointer to
that struct, and prints its elements. For each member in the struct, the
function should print the address of that member (as a hex pointer), the
type of that member, and the value of that member.
For example:
struct { char c; int d; } x = { 'A', 24 };
spec_print("ci", &x);
Should print (though addresses may vary):
7ffffffabc char A
7ffffffac0 int 24
If you support unions (see below), print all possible values. For this
part of the assignment, it may be helpful to consult man printf
to
find format specifiers for different types.
Recommended workflow
You should be able to implement and test the basic c-sizer
functionality in section. If you have time, move on to an advanced
feature.
Trouble getting started? Use the strategy of divide and conquer: break the problem into smaller pieces and iterate. We’ve observed that students often try to solve a problem set or exercise all at once, when it is often easier and more fun to iterate—first design and implement a solution for a small piece of the problem, then test your solution, then go back and solve a little bit more, building on what you’ve already got. For this exercise, we might:
- Make
c-sizer
work for single-character specifications. - Then, make it work for multi-character specifications where all the
characters are the same (e.g.,
ccccc
,i
,iiiiiii
). - Then, make it work for two-character specifications.
- Then, solve the general problem.
- Next, progress to
c-printer
using the system you created for the sizer. You may be able to reuse much of your code for finding the sizes and alignment of variables.
Advanced features
Try out the self-check comprehension questions. You’d also get good practice from completing one or more of these advanced features, either in section if you have time or later. Nested structs are particularly instructive. This work is optional.
- Support nested structs, which are set off by braces. For example,
"c{ci}"
declares a struct with two components, achar
and a nested struct with achar
and anint
. You might declare this struct like so: struct x { char c1; struct { char c2; int i; } nested_struct; }; - Support unions, which are set off by angle brackets. For example,
"<ci>"
declares a union with two components, achar
and anint
. - Support arrays, which are defined by a repeat count after a spec
component. For example,
"c20"
is an array of 20char
s, and"{ci}20"
is an array of{ci}
structures. - Although you should be able to complete the assignment using
computer arithmetic, it is also instructive to complete the
assignment using the C compiler as a helper process. Change
spec_size
(or make a copy ofspec_size
) to compute its result using the C compiler. It should (1) write a simple C program to a temporary file, (2) run the C compiler on that file, (3) run the corresponding executable, (4) parse its output, and (5) return the result. This will teach you lots of useful things about running helper programs from C. Some interesting library functions (try reading their manuals):mkstemp
,popen
/pclose
,system
,unlink
(to clean up).gcc
’s-x c
option may be of use.
Self-check comprehension questions
- What is the size for
"c"
(the size printed for that specification)? - What fundamental types have the same size?
- What is the size for
"c{ci}"
(where{ci}
is a nested struct; see above)? - What is the size for
"cci"
? - Why do the sizes for
"c{ci}"
and"cci"
differ? - What is the size for
"<ci>"
(where<ci>
is a union; see above)? - Why do the sizes for
"ci"
and"<ci>"
differ? - What can you say about the alignments of
"XXXX"
and"<XXXX>"
, for allXXXX
?
- Type in the C program above that prints the size for the specification
"ccicfp"
and save it in a file, such asmytest.c
. Now change the%zu
(in theprintf
line near the end) to%u
and compilemytest.c
again. What was reported and why?
C Hints
Q1. Is this the right way to iterate over the input string?
for (int i = 0; i < strlen(spec); ++i) {
switch (spec[i]) {
// More stuff here
}
}
This works, but it’s not idiomatic—i.e., not the way most such loops are
written in C. The serious flaw: unless the compiler is smart, the loop
will compute strlen(spec)
once per character, and in C, strlen
takes time proportional to the length of the string. This means the loop
takes at least O(N2) computation complexity where N is
the length of the string, rather than O(N).
In C, you’ll frequently see such loops written using character pointers.
The string ends when the character at the pointer equals '\0'
, a null
character. So:
for (const char* cp = spec; *cp != '\0'; ++cp) {
switch (*cp) {
// More stuff here
}
}
Or, equivalently:
for (const char* cp = spec; *cp; ++cp) {
switch (*cp) {
// More stuff here
}
}
But this is reasonable too, and it’s easier to read if you’re more used to languages like Java where strings are not character arrays.
int len = strlen(spec);
for (int i = 0; i < len; ++i) {
switch (spec[i]) {
// More stuff here
}
}
(The version with \*cp
might still be faster than the version with
strlen
, although both versions have the same O(N) minimum
computational complexity. Why?)
Q2. Is it OK to iterate over the specification string more than once?
While it is acceptable, it's unnecessary -- see if you can do the calculations as you go.
Q3. I calculate the largest alignment of any field and then multiply that by the number of characters in the specification. Is that correct?
No, that doesn’t quite work. Try testing with strings "csi"
and
"ccsi"
. These should have the same size, 8.
Q4. Should I assume an x86 or x86_64 system?
In this class, we assume x86_64. However, you should attempt to implement the assignment such that you do not have to hardcode the sizes of different data types.