2016/Fundamentals2X

From CS61
Jump to: navigation, search

C sizer

In the cs61-exercises/fundamentals2x directory, you’ll find a program skeleton called c-sizer.c. Complete this program.

c-sizer should parse C type specifications from strings and output the sizes of the corresponding types, as a compiler for x86-64 would compute.

A type specification is built from these characters:

c char (or signed char, unsigned char)
s short (or unsigned short)
i int (or unsigned int)
l long (or unsigned long)
f float
d double
p a pointer type

A specification with more than one character describes the C struct with those components. For instance, this struct:

struct x {
    char a;
    char b;
    int c;
    char d;
    float e;
    struct x* next;
};

would have the specification "ccicfp". And since this program:

#include <stdio.h>
struct x {
    char a;
    char b;
    int c;
    char d;
    float e;
    struct x* next;
};
int main(int argc, char* argv[]) {
    printf("%zu\n", sizeof(struct x));
}

prints 24 when compiled and run on an x86-64 machine, running ./c-sizer ccicfp should also print 24:

$ ./c-sizer ccicfp
     24 ccicfp
What’s %zu mean? man printf, or search the web for %zu!

To complete this exercise you’ll need to modify the spec_size function as directed. Your function may return 0 if spec is invalid (or it may crash, or do anything else).

You should implement this function using computer arithmetic and your knowledge of C data representation. You can test your knowledge using the C compiler and programs such as the one above. Try also using __alignof__(TYPE).

Not sure how to run the C compiler? Try looking at the GNUmakefile in the fundamentals-2 directory. It should be easy for you to modify GNUmakefile so that make mytest builds a mytest executable from a file called mytest.c.

Skills

  • Sizes of fundamental C types
  • C layout rules
  • C alignment
  • Simple string parsing
  • sizeof(T), __alignof__(T)
  • C library functions for interacting with helper programs

In-class work

In 45 minutes, you should be able to implement and test the basic c-sizer functionality. If you have time, move on to an advanced feature.

Trouble getting started? Use the strategy of divide and conquer: break the problem into smaller pieces and iterate. We’ve observed that students often try to solve a problem set or exercise all at once, when it is often easier and more fun to iterate—first design and implement a solution for a small piece of the problem, then test your solution, then go back and solve a little bit more, building on what you’ve already got. For this exercise, we might:

  1. Make c-sizer work for single-character specifications.
  2. Then, make it work for multi-character specifications where all the characters are the same (e.g., ccccc, i, iiiiiii).
  3. Then, make it work for two-character specifications.
  4. Then, solve the general problem.

Advanced features for after-class work

Try out the self-check comprehension questions. You’d also get good practice from completing one or more of these advanced features. Nested structs are particularly instructive. This work is optional.

  1. Support nested structs, which are set off by braces. For example, "c{ci}" declares a struct with two components, a char and a nested struct with a char and an int. You might declare this struct like so:
    struct x {
        char c1;
        struct {
            char c2;
            int i;
        } nested_struct;
    };
  2. Support unions, which are set off by angle brackets. For example, "<ci>" declares a union with two components, a char and an int.
  3. Support arrays, which are defined by a repeat count after a spec component. For example, "c20" is an array of 20 chars, and "{ci}20" is an array of {ci} structures.
  4. Although you should be able to complete the assignment using computer arithmetic, it is also instructive to complete the assignment using the C compiler as a helper process. Change spec_size (or make a copy of spec_size) to compute its result using the C compiler. It should (1) write a simple C program to a temporary file, (2) run the C compiler on that file, (3) run the corresponding executable, (4) parse its output, and (5) return the result. This will teach you lots of useful things about running helper programs from C. Some interesting library functions (try reading their manuals): mkstemp, popen/pclose, system, unlink (to clean up). gcc’s -x c option may be of use.

Self-check comprehension questions

  • What is the size for "c" (the size printed for that specification)?
  • What fundamental types have the same size?
  • What is the size for "c{ci}" (where {ci} is a nested struct; see above)?
  • What is the size for "cci"?
  • Why do the sizes for "c{ci}" and "cci" differ?
  • What is the size for "<ci>" (where <ci> is a union; see above)?
  • Why do the sizes for "ci" and "<ci>" differ?
  • What can you say about the alignments of "XXXX" and "<XXXX>", for all XXXX?
  • Type in the C program above that prints the size for the specification "ccicfp" and save it in a file, such as mytest.c. Now change the %zu (in the printf line near the end) to %u and compile mytest.c again. What was reported and why?

FAQ

Q1. Is this the right way to iterate over the input string?

for (int i = 0; i < strlen(spec); ++i) {
    switch (spec[i])
        // More stuff here
}

This works, but it’s not idiomatic—i.e., not the way most such loops are written in C. The serious flaw: unless the compiler is very smart, the loop will compute strlen(spec) once per character, and in C, strlen takes time proportional to the length of the string. This means the loop takes at least O(N2) computation complexity where N is the length of the string, rather than O(N).

In C, you’ll frequently see such loops written using character pointers. The string ends when the character at the pointer equals '\0', a null character. So:

for (const char* cp = spec; *cp != '\0'; ++cp) {
    switch (*cp)
        // More stuff here
}

Or, equivalently:

for (const char* cp = spec; *cp; ++cp) {
    switch (*cp)
        // More stuff here
}

But this is reasonable too, and it’s easier to read if you’re more used to languages like Java where strings are not character arrays.

int len = strlen(spec);
for (int i = 0; i < len; ++i) {
    switch (spec[i])
        // More stuff here
}

(The version with *cp might still be faster than the version with strlen, although both versions have the same O(N) minimum computational complexity. Why?)

Q2. Is it OK to iterate over the specification string more than once?

While it is acceptable, it's unnecessary -- see if you can do the calculations as you go.

Q3. I calculate the largest alignment of any field and then multiply that by the number of characters in the specification. Is that correct?

No, that doesn’t quite work. Try testing with strings "csi" and "ccsi". These should have the same size, 8.

Solution video

Subjects include alignment rules, strlen vs. string iteration (Q1 above), command line &&.