CS 61 2015

Principles

C pointer arithmetic behavior follows logically from several principles.

Array layout

Arrays in C are laid out in memory using contiguous allocation: element n+1 is placed in memory immediately after element n.

The sizeof operator determines an object’s size in and out of arrays. Let array be an array of objects of type T (e.g., T array[100];), and assume element 0 of the array is located at address ((uintptr_t) X). Then array element n is located at address ((uintptr_t) X) + n*sizeof(T).

Pointer–array equivalence

C pointers can behave like array positions in expressions. Given the following declarations:

T* p;                   // pointer to type T
T array[100];           // array of T objects

You’d expect this to work, and it does:

p = &array[0];          // p now points to the first element of the array
` *p = 3;                 // assigns `array[0]` to 3 `
assert(array[0] == 3);  // OK

But C also lets you use array notation on the pointer. For instance:

` p[0] = 4;               // assigns `array[0]` to 4 `
assert(array[0] == 4);  // OK
                        // Note: When p is a pointer, p[0] and *p ALWAYS mean the same thing.
` p[1] = 5;               // assigns `array[1]` to 5 `
assert(array[1] == 5);  // OK

This works even in the middle of the array.

p = &array[5];          // p now points to the sixth element of the array
p[0] = 10;
assert(array[5] == 10); // OK
p[-1] = 9;
assert(array[4] == 9);  // OK

In fact, array variables in expressions behave like pointers. The assignment p = array is valid, and has the same meaning as p = &array[0]:

p = array;
assert(p == &array[0]);  // OK

The equivalence isn’t total. A C pointer behaves like an array position, not an array. You can’t assign an array variable from a pointer:

array = p;      // error: incompatible types when assigning to type ‘int[100]’ from type ‘int *’

Array position arithmetic

C allows you to compare pointers into the same array. The results are what you’d expect.

assert(&array[5] > &array[4]);      // 5 > 4, so &array[5] > &array[4]
assert(&array[0] == &array[0]);
assert(&array[9] < &array[10]);

Now, since &array[5] > &array[4], the laws of arithmetic say that &array[5] - &array[4] > 0. This is true in C. Subtraction on array positions as equivalent to subtraction on the corresponding array indexes. Thus:

ptrdiff_t i;
i = &array[5] - &array[4];
assert(i == 1);                   // == 5 - 4
i = &array[0] - &array[0];
assert(i == 0);                   // == 0 - 0
i = &array[9] - &array[10];
assert(i == -1);                  // == 9 - 10

(See C Patterns for more on ptrdiff_t.)

Now, since &array[5] - &array[4] == 1, we should expect that &array[4] + 1 == &array[5]. This is true too.

assert(&array[0] + 6 == &array[6]);
assert(6 + &array[0] == &array[6]);
assert(&array[4] - 1 == &array[3]);

Putting it together

Since C pointers behave like array positions, it must logically follow that C pointer arithmetic behaves like array position arithmetic. And it does.

p = &array[0];
T* q = &array[4];
assert(q - p == 4);
assert(q == p + 4);
assert((q + 1) - array == 5);

The `sizeof` wrinkle

These principles fit together quite logically, but one consequence endlessly surprises students: Calculations on C array positions and pointers usually return different results from calculations on addresses. The reason is contiguous layout.

For instance, consider this:

int iarr[10];                  // Assume the compiler puts this at address 0x1000.
printf("%zd\n", &iarr[2] - &iarr[1]);
                               // Prints “1”

int* p1 = &iarr[1];
int* p2 = &iarr[2];
printf("%p %p\n", p1, p2);     // Prints “0x1004 0x1008”
printf("%zd\n", p2 - p1);      // Prints “1”

Every step in this sequence is logical. Array position arithmetic is the same as array index arithmetic. Array elements are laid out contiguously in memory, and ints are 4 bytes big (sizeof(int) == 4), so the pointer to element 2 is four bytes after the pointer to element 1. Pointers behave like array positions. But admittedly the result is a little funky. We subtract 0x1008 - 0x1004 and get 1, not 4! The compiler has divided the difference in addresses, namely 4, by sizeof(int), which is also 4, to get 1. This will become second nature, but if it’s not, it still makes sense if you go step by step.

This also means that the result of a pointer arithmetic expression depends heavily on the types of the pointer arguments.

printf("%zd\n", (char*) p2 - (char*) p1);     // Prints “4”: 4/sizeof(char) == 4/1 == 4
printf("%zd\n", (short*) p2 - (short*) p1);   // Prints “2”: 4/sizeof(short) == 4/2 == 2
printf("%zd\n", (double*) p2 - (double*) p1); // Probably prints “0”

Rule of thumb

We find the following discipline useful for avoiding pointer errors.

Use sizeof in pointer arithmetic expressions only when the pointer has type char* or unsigned char*.