C sizer
In the cs61-exercises/fundamentals2x
directory, you’ll find a
program skeleton called c-sizer.c
. Complete this program.
c-sizer
should parse C type specifications from strings and output
the sizes of the corresponding types, as a compiler for x86-64 would
compute.
A type specification is built from these characters:
c |
char (or signed char , unsigned char ) |
s |
short (or unsigned short ) |
i |
int (or unsigned int ) |
l |
long (or unsigned long ) |
f |
float |
d |
double |
p |
a pointer type |
A specification with more than one character describes the C struct
with those components. For instance, this struct:
struct x {
char a;
char b;
int c;
char d;
float e;
struct x* next;
};
would have the specification "ccicfp"
. And since this program:
#include <stdio.h>
struct x {
char a;
char b;
int c;
char d;
float e;
struct x* next;
};
int main(int argc, char* argv[]) {
printf("%zu\n", sizeof(struct x));
}
prints 24 when compiled and run on an x86-64 machine, running
./c-sizer ccicfp
should also print 24:
$ ./c-sizer ccicfp
24 ccicfp
What’s %zu
mean? man printf
, or search the web for %zu
!
To complete this exercise you’ll need to modify the spec_size
function as directed. Your function may return 0 if spec
is invalid
(or it may crash, or do anything else).
You should implement this function using computer arithmetic and your
knowledge of C data representation. You can test your knowledge using
the C compiler and programs such as the one above. Try also using
__alignof__(TYPE)
.
Not sure how to run the C compiler? Try looking at the GNUmakefile
in the fundamentals-2
directory. It should be easy for you to modify
GNUmakefile
so that make mytest
builds a mytest
executable
from a file called mytest.c
.
Skills
- Sizes of fundamental C types
- C layout rules
- C alignment
- Simple string parsing
sizeof(T)
,__alignof__(T)
- C library functions for interacting with helper programs
In-class work
In 45 minutes, you should be able to implement and test the basic
c-sizer
functionality. If you have time, move on to an advanced
feature.
Trouble getting started? Use the strategy of divide and conquer: break the problem into smaller pieces and iterate. We’ve observed that students often try to solve a problem set or exercise all at once, when it is often easier and more fun to iterate—first design and implement a solution for a small piece of the problem, then test your solution, then go back and solve a little bit more, building on what you’ve already got. For this exercise, we might:
- Make
c-sizer
work for single-character specifications. - Then, make it work for multi-character specifications where all the
characters are the same (e.g.,
ccccc
,i
,iiiiiii
). - Then, make it work for two-character specifications.
- Then, solve the general problem.
Advanced features for after-class work
Try out the self-check comprehension questions. You’d also get good practice from completing one or more of these advanced features. Nested structs are particularly instructive. This work is optional.
- Support nested structs, which are set off by braces. For example,
"c{ci}"
declares a struct with two components, achar
and a nested struct with achar
and anint
. You might declare this struct like so: struct x { char c1; struct { char c2; int i; } nested_struct; }; - Support unions, which are set off by angle brackets. For example,
"<ci>"
declares a union with two components, achar
and anint
. - Support arrays, which are defined by a repeat count after a spec
component. For example,
"c20"
is an array of 20char
s, and"{ci}20"
is an array of{ci}
structures. - Although you should be able to complete the assignment using
computer arithmetic, it is also instructive to complete the
assignment using the C compiler as a helper process. Change
spec_size
(or make a copy ofspec_size
) to compute its result using the C compiler. It should (1) write a simple C program to a temporary file, (2) run the C compiler on that file, (3) run the corresponding executable, (4) parse its output, and (5) return the result. This will teach you lots of useful things about running helper programs from C. Some interesting library functions (try reading their manuals):mkstemp
,popen
/pclose
,system
,unlink
(to clean up).gcc
’s-x c
option may be of use.
Self-check comprehension questions
- What is the size for
"c"
(the size printed for that specification)? - What fundamental types have the same size?
- What is the size for
"c{ci}"
(where{ci}
is a nested struct; see above)? - What is the size for
"cci"
? - Why do the sizes for
"c{ci}"
and"cci"
differ? - What is the size for
"<ci>"
(where<ci>
is a union; see above)? - Why do the sizes for
"ci"
and"<ci>"
differ? - What can you say about the alignments of
"XXXX"
and"<XXXX>"
, for allXXXX
?
- Type in the C program above that prints the size for the specification
"ccicfp"
and save it in a file, such asmytest.c
. Now change the%zu
(in theprintf
line near the end) to%u
and compilemytest.c
again. What was reported and why?
FAQ
Q1. Is this the right way to iterate over the input string?
for (int i = 0; i < strlen(spec); ++i) {
switch (spec[i])
// More stuff here
}
This works, but it’s not idiomatic—i.e., not the way most such loops are
written in C. The serious flaw: unless the compiler is very smart, the
loop will compute strlen(spec)
once per character, and in C,
strlen
takes time proportional to the length of the string. This
means the loop takes at least O(N2) computation
complexity where N is the length of the string, rather than O(N).
In C, you’ll frequently see such loops written using character pointers.
The string ends when the character at the pointer equals '\0'
, a null
character. So:
for (const char* cp = spec; *cp != '\0'; ++cp) {
switch (*cp)
// More stuff here
}
Or, equivalently:
for (const char* cp = spec; *cp; ++cp) {
switch (*cp)
// More stuff here
}
But this is reasonable too, and it’s easier to read if you’re more used to languages like Java where strings are not character arrays.
int len = strlen(spec);
for (int i = 0; i < len; ++i) {
switch (spec[i])
// More stuff here
}
(The version with \*cp
might still be faster than the version with
strlen
, although both versions have the same O(N) minimum
computational complexity. Why?)
Q2. Is it OK to iterate over the specification string more than once?
While it is acceptable, it's unnecessary -- see if you can do the calculations as you go.
Q3. I calculate the largest alignment of any field and then multiply that by the number of characters in the specification. Is that correct?
No, that doesn’t quite work. Try testing with strings "csi"
and
"ccsi"
. These should have the same size, 8.
Solution video
Subjects include alignment rules, strlen
vs. string iteration (Q1
above), command line &&
.