Lecture 5

Signed integer undefined behavior

Unsigned integers in C have relatively few undefined behaviors. (There are some; for instance, x / 0 is undefined.) But signed integers are another story. The compiler may assume that:

No integer is ever divided by zero.
No addition operation on signed integers ever overflows. This means that the compiler may assume, when it sees an expression x + y where x and y are signed, that the mathematical result of adding x and y is the same as the computer arithmetic result of adding x and y.
No subtraction or negation operation on signed integers ever overflows.
No multiplication operation on signed integers ever overflows.
No division or remainder operation on signed integers ever overflows. This means, for example, that INT_MIN / -1 is undefined behavior on two’s-complement machines.

Consider the program

int x = 1;
int y = 2;
int z = x + y;

This program has no overflow: the mathematical meaning of x + y is 1 + 2 = 3; and, since the number 3 can be represented as an int, that number is also the computer arithmetic result.

This program, though, has a signed integer overflow.

int x = 0x10000000;
int y = 0x7FFFFFFF;
int z = x + y;

The mathematical result of adding 0x10000000 = 2²⁸ and 0x7FFFFFFF = 2³¹–1 is 0x8FFFFFFF, which cannot be represented in a signed 32-bit int.

Now, performing this addition on an x86-64 machine may produce the result -0x70000001, which has the same memory representation as 0x8FFFFFFF. Executing x86-64 CPU instructions that add those numbers will always produce the result -0x700000001. But undefined behavior is not about what the machine actually does, it is about what the compiler may assume. And here, the compiler may assume that the signed addition never overflows. If your C code contains signed integer overflows, you are in for a world of confusion.

The assertion that wasn’t

For example:

int check_signed_increment(int x) {
    assert(x + 1 > x);
    return x + 1;
}
int main(int argc, char** argv) {
    int x = strtol(argv[1], NULL, 0);
    int x_incr = checked_signed_increment(x);
    printf("%d + 1 = %d\n", x, x_incr);
}

Compile this program without optimization, and run it as ./ubexplore 0x7FFFFFFF, and you get an expected result: the assertion in checked_signed_increment fails, because 0x7FFFFFFF + 1 in 32-bit two’s-complement computer arithmetic is 0x80000000, the smallest negative number representable in 32 bits.

But compile it with optimization and you get a different story: the assertion does not fail—implying that x_incr \> x—but then the program prints 2147483647 + 1 = -2147483648—showing that x \> x_incr!

Looking at the compiled instructions of these programs shows what’s happening. When optimization is on, the compiler is takes advantage of undefined behavior assumptions, and removes the assertion. It’s perfectly within its rights to do so! In signed computer arithmetic, x + 1 \> x is true unless the addition overflows. But signed integer overflow is undefined behavior in C, and undefined behavior in C is not allowed to happen. So in every allowed case, x + 1 \> x is true.

Loop optimization

Undefined behavior assumptions let compilers produce faster code. It’s super convenient for an optimizer to make the assumption that x + 1 \> x. And, in fact, if you turn off these kinds of assumptions, real code can run a lot slower: according to reports, SPEC CPU 2006 runs about 30% slower with integer overflow checks enabled.

Why so expensive? Real examples aren’t easy to come by, but consider this loop:

for (int i = n1; i <= n2; ++i) {
    printf("loop sees %d\n", i);
}

This loop would always terminate if addition were mathematical. (Every integer is finite, and in mathematical addition, adding 1 will never make i smaller.) If we run this loop with optimization and the bounds n1 = 0x7FFFFFFE, n2 = 0x7FFFFFFF on an x86-64 machine, we see:

loop sees 2147483646
loop sees 2147483647

OK, that’s the right answer. But it’s also impossible! No 32-bit integer is larger than 2147483647 (INT_MAX), so every integer i passes the test i \<= n2. The loop cannot have stopped, yet it stopped.

Running without optimization, on the other hand, we see:

loop sees 2147483646
loop sees 2147483647
loop sees -2147483648
loop sees -2147483647
loop sees -2147483646
loop sees -2147483645
...

and it runs forever.

The answer to this mystery lies in the compiler’s treatment of the loop index. The difference is subtle, but important. Here’s how the increment and test are implemented in the unoptimized version:

`  # loop index is held in 32-bit register `%eax`, `n2` on the stack at `-0x18(%rbp)` `
 40087e:       ...
`  400881:       3b 45 e8           cmp    -0x18(%rbp),%eax   # compare `i` with `n2` `
`  400884:       0f 8f 25 00 00 00  jg     4008af <main+0xaf> # if `i > n2`, exit loop `
 ... call printf ...
 4008a4:       83 c0 01           add    $0x1,%eax          # increment loop index
 4008a7:       ...
 4008aa:       e9 cf ff ff ff     jmpq   40087e <main+0x7e> # run loop again

The unoptimized version follows the plain meaning of the code; the test happens at the start of the loop, and the increment at the end, just before the test.

But in the optimized version:

`  # loop index is held in 32-bit register `%ebx`, `n2` in 32-bit register `%r14d` `
 400880:       ff c3              inc    %ebx               # increment loop index
 ... call printf ...
`  400890:       44 39 f3           cmp    %r14d,%ebx         # compare `i` with `n2` `
`  400893:       7c eb              jl     400880 <main+0x40> # if `i < n2`, run loop again `

The compiler has switched the order of the increment and the test. The increment happens at the top of the loop; then printf is called; and then the compiler checks whether i \< n2, rather than the i \<= n2 that the source code implies. These transformations make the code smaller (the optimized loop has 21 bytes of instruction rather than 49 bytes) and somewhat faster—but they are only possible because the compiler assumes that i + 1 \> i for all i!

If we had run other loops, we could also see other benefits to undefined behavior reasoning. For example, on x86-64, arithmetic on 64-bit registers can be faster than arithmetic on the 32-bit versions of those registers. Undefined behavior assumptions allow the use of 64-bit registers to represent 32-bit signed quantities; since the compiler can assume overflow never occurs, it can also assume that all arithmetic will have the same result in both register sizes.

Avoiding undefined behavior

It is still possible in C to compute with signed integers the way the CPU would, with overflow and no undefined behavior. You can compile the whole program with the -fwrapv flag, which means “signed integer overflow wraps around” (produces the same result as in two’s-complement arithmetic). This flag makes some undefined behaviors have defined results, but makes code slower. (For example, “simple code with loops and integer arithmetic is 15-20% slower in Rust”, because Rust lacks undefined behavior! Adding the -fwrapv flag to C “kills most of its advantage.”)

Or you can perform the computation you care about in unsigned arithmetic, which has fewer UBs, and cast the result back to a signed type, like this:

int x = 0x10000000;
int y = 0x7FFFFFFF;
int z = (int) ((unsigned) x + (unsigned) y);

That works on every compiler that matters. We would recommend in general to use a combination of unsigned arithmetic, explicit casts, and testing with sanitizers, since this mix catches problems and the documentation provided by casts is useful for readers.

Undefined behavior and shifts

The bitwise operations are generally free of undefined behaviors, except for shifts. Shifts get pretty nasty. You’d think that, given int x, the expression x \<\< 32 would equal 0: you shift 32 zero bits on the right, pushing the original value of x entirely off the end. But in fact that’s undefined behavior. Shift expressions, such as x \<\< i and x \>\> i, cause undefined behavior when i \< 0 and when i \>= W, where W is the width in bits of the type x.

Space Overhead of Memory Allocation

In order to measure space overhead of the system allocator, we can compare the performance of a program with N single byte allocations with that of a program with a 1 of N byte allocation. We can look at the returned pointers from malloc and use pointer arithmetic to calculate the total space used by each allocation scheme. The /usr/bin/time program already has this functionality. The function is called as follows:

"/usr/bin/time ./myprogram arg1 arg2 ..."

This function returns a value called maxresident that tells you the amount of allocated memory in kilobytes. Using these tools we conclude that the system allocator allocates 32 bytes for every single byte allocation. This is a factor of 32 overhead.

The operating system communicates with user programs through special escape hatches called system calls. The strace program allows you to trace system calls. You can strace the program like so:

"strace ./myprogram arg1 arg2 ..."

Using the strace command to trace a memory allocator allows us to see that memory allocation calls the brk() and mmap() system call. The mmap() system call only appears in the strace output when large allocations are made.

The SANITIZE option adds about 1.5x space overhead because it incorporates various checks on your program's memory management. The sanitizer must balance space and time overhead. Sacrificing more space would make the sanitizer run faster and vice versa. Sanitizer overhead is important for maintaining computer security in important programs like web browsers.

Standard I/O streams

STDIN, STDOUT, and STDERR are the I/O streams. STDERR and STDOUT will both output to your screen. However, redirecting either STDERR or STDOUT will change the output on your screen because they are two different output outlets. These I/O streams are associated with special numbers. STDIN=0, STDOUT=1, STDERR=2

Shell commands

wc : Takes input from STDIN and outputs three numbers (x lines, y words, z characters).

<: You can reroute the input to come from a file. "wc < myprogram.c" will return the count of the contents of myprogram.c. This redirects STDIN.

>: You can also store the return value of wc in another file. "wc < myprogram.c > x" will store the return output from the wc program in the x file. This redirects STDOUT. The command "2>&1" redirects STDERR to STDOUT. Since strace outputs to STDERR, calling strace with this redirection argument allows us to store the output of strace in some file x and call wc on that file to count the number of system calls.

|: This is a pipe. This symbol tells the shell that the left side of the pipe will serve as STDIN for the command on the right.

In class we used "strace echo foo 2>&1 | wc -l" to count the number of system calls that program foo executes.

References

Undefined Behavior in 2017
AddressSanitizer: A Fast Address Sanity Checker—a research paper describing how AddressSanitizer works; see also its related work for earlier references
Understanding Integer Overflow in C/C++