Data Representation 9/10/15

Learning Objectives

Convert between binary, decimal, and hexadecimal
Explain how both fundamental and derived data types are laid out in memory
Use pointer arithmetic correctly
Write code that places things in memory how you want them

Warmup: Converting between number bases

1. Write the following decimal numbers in binary.

16
32
64
128
256
512
1024
2048
4096
8192

2. Now convert each of the binary numbers in the previous exercise to hexadecimal.

Note: You should learn to recognize these numbers. Some people know larger powers of two, but these should be second nature to you.

3. One megabyte is 2 to what power?

4. One gigabyte is 2 to what power?

5. One terabyte is 2 to what power?

Note: You'll want to have these memorized as well.

6. Write 1 MB, 1 GB, and 1 TB in hexadecimal.

In the following exercises, numbers prefaced with 0x are in hexadecimal and numbers without preface are decimal. Compute the following:

7. 0x80480004 + 12

8. 0x804810FC + 8

9. 0xbfff801B + 16

10. 0xbfff8234 - 40

Examining data in memory

What is the largest value an unsigned char can hold?

In the cs61-exercises/l03 directory, you will find the source code and makefile for today's coding. Examine the program charsize.c and then build it. Explain why hexdump printed out what it did.

Next, predict what will happen when you change the number on line 5 to 256.

Now, try it.

Explain what happened and why.

Now, let's look a little more closely at the output of hexdump. Take a look at the program charval. Predict what will happen when you run it and then run it.

Discussion question: At the level of abstraction of the machine, can it distinguish between the number 0x47 and the ASCII character "G"?

If the machine can distinguish between them, how does it do it? If the machine cannot distinguish between them, then how do we ascribe meaning to the values in a program?

Next, let's examine an array in memory. Read, build, and run the program carray.c. How many bytes were printed out? Why?

Can you determine what the addresses are of each of the array elements? Use gdb to confirm that you got the right addresses.

Next, compare carray.c to carray2.c. Before you make and run carray2, predict what it will print out. After you run it, explain why it printed out what it did.

Now, examine the program iarray. How many bytes will it print out? What do you suppose the addresses of the array elements will be?

Build and run the program. Did it behave as you expected it to? Run it in gdb and examine the addresses of the array elements.

Write me down

We will check these answers at the end of class.

Assume that A is an array of type T. Write an expression that tells me how to compute the address of the i^th element of A.

Pointer Arithmetic

The compiler knows the types of the variables it manipulates and even more importantly, it knows their size. As a result adding and subtracting numbers to pointers is a bit like magic. Read and run the program parith.c.

Take note of the difference between the values of the addresses in each pair.

Now, add a printf statement to print the difference between the two pointers. For example, for the character case, add the line:

printf("&carray[5] - carray = %zu\n", &carray[5] - carray);

Add code to compute and print the same difference for the other two cases.

What do you notice? Can you explain why you get the results you got?

The type uintptr_t represents a value guaranteed to be large enough to hold a pointer, but treats the pointers as integers. Write a function that takes in two pointers to adjacent elements in an array and computes the sizeof the elements in the array. In other words, if I call your function(&iarray[0], &iarray[1]) you should tell me that integers are four bytes. But if I call your function(&darray[0], &darray[1]), you should tell me that doubles are 8 bytes (you might not know the name of the types).

Write me down

If P is a pointer to a type T and i is an index, what is the difference between &P[i] and P + i?

Next, answer all the problems in ExerciseP and then check your answers here. If you are still having difficulty, review this.

Laying out items in memory

Now, let's dig a bit more deeply into precisely how values larger than a byte are laid out in memory. Examine the source code to iexperiment.c, and then build and run it. The value stored in i, 0x89ABCDEF, occupies four bytes; what are the addresses of those four bytes? Which address contains the most significant digits of the value?

A machine uses a big endian representation if the most significant bits of a multi-byte value reside in the first byte of the value (i.e., the one with the lowest address). A machine uses a little endian representation if the least significant bits of a multi-byte value reside in the byte with the first byte of the value (i.e., the one with the lowest address).

Write me down

Is the CS50 appliance a big endian or a little endian machine?

Next, look at the program layout.c.

Explain the printout.

Then edit the program, placing the s structure into the midst of the character array and assigning some distinct values to the fields. What does the hexdump reveal this time? Was there anything unusual? Can you come up with an explanation for what you observed? You will find it informative to print out the size of the structure (rather than the size of the character array).

Now experiment with the contents and ordering of the structure s; try placing a variety of different types in the structure and rearrange the fields in different orders. In particular, create fields of different sizes and see what happens when you rearrange them. Experiment enough so that you can derive a set of rules or how structures are allocated in memory. Write your rules down.

Next, do the same thing you did experimenting with struct s to experiment with the u union.

Write me down

Based on what you observed, state how you believe structs and unions are laid out in C.

Summing Up

The machine does not ascribe any interpretation to the values it stores; it doesn't know if something is intended to be an integer, a character, part of a structure, etc. It is the programming language constructs that give data its meaning.

Please take a minute to fill out the post-class survey.