Lecture 5
Signed integer undefined behavior
Unsigned integers in C have relatively few undefined behaviors. (There
are some; for instance, x / 0
is undefined.) But signed integers are
another story. The compiler may assume that:
- No integer is ever divided by zero.
- No addition operation on signed integers ever overflows. This means
that the compiler may assume, when it sees an expression
x + y
wherex
andy
are signed, that the mathematical result of addingx
andy
is the same as the computer arithmetic result of addingx
andy
. - No subtraction or negation operation on signed integers ever overflows.
- No multiplication operation on signed integers ever overflows.
- No division or remainder operation on signed integers ever overflows.
This means, for example, that
INT_MIN / -1
is undefined behavior on two’s-complement machines.
Consider the program
int x = 1;
int y = 2;
int z = x + y;
This program has no overflow: the mathematical meaning of x + y
is
1 + 2 = 3; and, since the number 3 can be represented as an int
,
that number is also the computer arithmetic result.
This program, though, has a signed integer overflow.
int x = 0x10000000;
int y = 0x7FFFFFFF;
int z = x + y;
The mathematical result of adding 0x10000000 = 228 and
0x7FFFFFFF = 231–1 is 0x8FFFFFFF, which cannot be represented
in a signed 32-bit int
.
Now, performing this addition on an x86-64 machine may produce the
result -0x70000001
, which has the same memory representation as
0x8FFFFFFF. Executing x86-64 CPU instructions that add those numbers
will always produce the result -0x700000001
. But undefined
behavior is not about what the machine actually does, it is about what
the compiler may assume. And here, the compiler may assume that the
signed addition never overflows. If your C code contains signed integer
overflows, you are in for a world of confusion.
The assertion that wasn’t
For example:
int check_signed_increment(int x) {
assert(x + 1 > x);
return x + 1;
}
int main(int argc, char** argv) {
int x = strtol(argv[1], NULL, 0);
int x_incr = checked_signed_increment(x);
printf("%d + 1 = %d\n", x, x_incr);
}
Compile this program without optimization, and run it as ./ubexplore 0x7FFFFFFF
, and you get an expected result: the assertion in
checked_signed_increment
fails, because 0x7FFFFFFF + 1
in 32-bit
two’s-complement computer arithmetic is 0x80000000
, the smallest
negative number representable in 32 bits.
But compile it with optimization and you get a different story: the
assertion does not fail—implying that x_incr \> x
—but then the
program prints 2147483647 + 1 = -2147483648
—showing that x \> x_incr
!
Looking at the compiled instructions of these programs shows what’s
happening. When optimization is on, the compiler is takes advantage of
undefined behavior assumptions, and removes the assertion. It’s
perfectly within its rights to do so! In signed computer arithmetic,
x + 1 \> x
is true unless the addition overflows. But signed integer
overflow is undefined behavior in C, and undefined behavior in C is not
allowed to happen. So in every allowed case, x + 1 \> x
is true.
Loop optimization
Undefined behavior assumptions let compilers produce faster code. It’s
super convenient for an optimizer to make the assumption that x + 1 \> x
. And, in fact, if you turn off these kinds of assumptions, real code
can run a lot slower: according to reports, SPEC CPU 2006 runs about
30% slower with integer overflow
checks enabled.
Why so expensive? Real examples aren’t easy to come by, but consider this loop:
for (int i = n1; i <= n2; ++i) {
printf("loop sees %d\n", i);
}
This loop would always terminate if addition were mathematical. (Every
integer is finite, and in mathematical addition, adding 1 will never
make i
smaller.) If we run this loop with optimization and the
bounds n1 = 0x7FFFFFFE
, n2 = 0x7FFFFFFF
on an x86-64 machine, we
see:
loop sees 2147483646
loop sees 2147483647
OK, that’s the right answer. But it’s also impossible! No 32-bit
integer is larger than 2147483647 (INT_MAX
), so every integer
i
passes the test i \<= n2
. The loop cannot have stopped, yet it
stopped.
Running without optimization, on the other hand, we see:
loop sees 2147483646
loop sees 2147483647
loop sees -2147483648
loop sees -2147483647
loop sees -2147483646
loop sees -2147483645
...
and it runs forever.
The answer to this mystery lies in the compiler’s treatment of the loop index. The difference is subtle, but important. Here’s how the increment and test are implemented in the unoptimized version:
` # loop index is held in 32-bit register `%eax`, `n2` on the stack at `-0x18(%rbp)` `
40087e: ...
` 400881: 3b 45 e8 cmp -0x18(%rbp),%eax # compare `i` with `n2` `
` 400884: 0f 8f 25 00 00 00 jg 4008af <main+0xaf> # if `i > n2`, exit loop `
... call printf ...
4008a4: 83 c0 01 add $0x1,%eax # increment loop index
4008a7: ...
4008aa: e9 cf ff ff ff jmpq 40087e <main+0x7e> # run loop again
The unoptimized version follows the plain meaning of the code; the test happens at the start of the loop, and the increment at the end, just before the test.
But in the optimized version:
` # loop index is held in 32-bit register `%ebx`, `n2` in 32-bit register `%r14d` `
400880: ff c3 inc %ebx # increment loop index
... call printf ...
` 400890: 44 39 f3 cmp %r14d,%ebx # compare `i` with `n2` `
` 400893: 7c eb jl 400880 <main+0x40> # if `i < n2`, run loop again `
The compiler has switched the order of the increment and the test. The
increment happens at the top of the loop; then printf
is called; and
then the compiler checks whether i \< n2
, rather than the i \<= n2
that the source code implies. These transformations make the code
smaller (the optimized loop has 21 bytes of instruction rather than 49
bytes) and somewhat faster—but they are only possible because the
compiler assumes that i + 1 \> i
for all i
!
If we had run other loops, we could also see other benefits to undefined behavior reasoning. For example, on x86-64, arithmetic on 64-bit registers can be faster than arithmetic on the 32-bit versions of those registers. Undefined behavior assumptions allow the use of 64-bit registers to represent 32-bit signed quantities; since the compiler can assume overflow never occurs, it can also assume that all arithmetic will have the same result in both register sizes.
Avoiding undefined behavior
It is still possible in C to compute with signed integers the way the
CPU would, with overflow and no undefined behavior. You can compile the
whole program with the -fwrapv
flag, which means “signed integer
overflow wraps around” (produces the same result as in two’s-complement
arithmetic). This flag makes some undefined behaviors have defined
results, but makes code slower. (For example, “simple code with loops
and integer arithmetic is 15-20% slower in
Rust”, because Rust
lacks undefined behavior! Adding the -fwrapv
flag to C “kills most
of its advantage.”)
Or you can perform the computation you care about in unsigned arithmetic, which has fewer UBs, and cast the result back to a signed type, like this:
int x = 0x10000000;
int y = 0x7FFFFFFF;
int z = (int) ((unsigned) x + (unsigned) y);
That works on every compiler that matters. We would recommend in general to use a combination of unsigned arithmetic, explicit casts, and testing with sanitizers, since this mix catches problems and the documentation provided by casts is useful for readers.
Undefined behavior and shifts
The bitwise operations are generally free of undefined behaviors, except
for shifts. Shifts get pretty nasty. You’d think that, given int x
,
the expression x \<\< 32
would equal 0
: you shift 32 zero bits
on the right, pushing the original value of x
entirely off the end.
But in fact that’s undefined behavior. Shift expressions, such as x \<\< i
and x \>\> i
, cause undefined behavior when i \< 0
and
when i \>= W
, where W
is the width in bits of the type x
.
Space Overhead of Memory Allocation
In order to measure space overhead of the system allocator, we can compare the performance of a program with N single byte allocations with that of a program with a 1 of N byte allocation. We can look at the returned pointers from malloc and use pointer arithmetic to calculate the total space used by each allocation scheme. The /usr/bin/time program already has this functionality. The function is called as follows:
"/usr/bin/time ./myprogram arg1 arg2 ..."
This function returns a value called maxresident that tells you the amount of allocated memory in kilobytes. Using these tools we conclude that the system allocator allocates 32 bytes for every single byte allocation. This is a factor of 32 overhead.
The operating system communicates with user programs through special escape hatches called system calls. The strace program allows you to trace system calls. You can strace the program like so:
"strace ./myprogram arg1 arg2 ..."
Using the strace command to trace a memory allocator allows us to see that memory allocation calls the brk() and mmap() system call. The mmap() system call only appears in the strace output when large allocations are made.
The SANITIZE option adds about 1.5x space overhead because it incorporates various checks on your program's memory management. The sanitizer must balance space and time overhead. Sacrificing more space would make the sanitizer run faster and vice versa. Sanitizer overhead is important for maintaining computer security in important programs like web browsers.
Standard I/O streams
STDIN, STDOUT, and STDERR are the I/O streams. STDERR and STDOUT will both output to your screen. However, redirecting either STDERR or STDOUT will change the output on your screen because they are two different output outlets. These I/O streams are associated with special numbers. STDIN=0, STDOUT=1, STDERR=2
Shell commands
wc : Takes input from STDIN and outputs three numbers (x lines, y words, z characters).
<: You can reroute the input to come from a file. "wc < myprogram.c" will return the count of the contents of myprogram.c. This redirects STDIN.
>: You can also store the return value of wc in another file. "wc < myprogram.c > x" will store the return output from the wc program in the x file. This redirects STDOUT. The command "2>&1" redirects STDERR to STDOUT. Since strace outputs to STDERR, calling strace with this redirection argument allows us to store the output of strace in some file x and call wc on that file to count the number of system calls.
|: This is a pipe. This symbol tells the shell that the left side of the pipe will serve as STDIN for the command on the right.
In class we used "strace echo foo 2>&1 | wc -l" to count the number of system calls that program foo executes.
References
- Undefined Behavior in 2017
- AddressSanitizer: A Fast Address Sanity Checker—a research paper describing how AddressSanitizer works; see also its related work for earlier references
- Understanding Integer Overflow in C/C++