# Data representation 5: Undefined behavior and bits

## Overview

We explore undefined behavior in more depth and discuss bitwise arithmetic.

## Undefined behavior and abstract machine

• A specification imposes requirements on an implementation and on the user of that implementation
• C and C++ have a specification—the language standard
• The user’s job: Write a program that obeys the specification’s rules
• The user writes primarily for the abstract machine defined by the standard
• The compiler’s job: Translate any program that obeys the specification’s rules into an executable that performs the corresponding computation on the actual hardware
• The compiler must understand the abstract machine and the real machine
• The compiler may assume the user did their job

## Broad-brush model of programmer psychology

People want their code to:

1. Work relatively correctly on most inputs
2. Run fast
3. Work exactly correctly on all inputs

## Why?

• If code is wrong on most inputs, its speed doesn’t matter
• If code is too slow or expensive to use, its correctness doesn’t matter
• Sometimes exact correctness is difficult to define
• Of course you should aim for correctness and well-defined specifications though!

## Compiler quality

• People want their code to go fast
• Compiler writers are people and they get excited about producing faster code
• Compilers take advantage of many opportunities to speed up code by removing redundancy

## The as-if rule

• “conforming implementations are required to emulate (only) the observable behavior of the abstract machine”
• For instance, data written to files
• If the compiler can figure out that a function call is redundant, it can eliminate it!
• If the compiler can figure out that a computation is redundant, it can eliminate it!
• Including dynamic memory allocation!
• If the compiler can figure out a cheaper way to perform a computation, it can implement it!
• Examples: datarep5/opt1.cc, datarep5/opt2.cc

## Undefined behavior

• The C and C++ standards define certain errors as undefined behavior
• If a program executes undefined behavior:

Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended. (ref)

• Undefined behavior is not the same as implementation-defined behavior
• Implementation-defined behavior: Computing sizeof(int)
• Undefined behavior: Dereferencing a null pointer

## How undefined behavior and optimizing compilers collide

• Compilers may assume that their input programs never execute undefined behavior
• And they may optimize based on this assumption

## Example 1: Signed integer overflow

int x_incr = x + 1;
assert(x_incr > x);
printf("%d > %d - assertion passed\n", x_incr, x);


## Example 2: Null pointer dereference

static void __devexit agnx_pci_remove (struct pci_dev *pdev)
{
struct ieee80211_hw *dev = pci_get_drvdata(pdev);
struct agnx_priv *priv = dev->priv;

if (!dev) return;


## Example 3: Signed integer overflow in a loop

long sum = 0;
for (int i = n1; i <= n2; ++i) {
sum += f(i);
}
return sum;


## Example 4: Out-of-bounds access

int awesome[4] = {0, 1, 2, 3};

bool is_awesome(int v) {
for (int i = 0; i <= 4; ++i) {
if (awesome[i] == v)
return true;
}
return false;
}


## More examples of undefined behaviors

• Dividing an integer by 0
• Using an uninitialized variable
• Violating the live object rule
• Modifying a const object
• Comparing pointers to unrelated objects using < <= > >=
• Accessing an invalid iterator
• Accessing an object using a pointer of incompatible type
• Executing an infinite loop that doesn’t have any observable behavior (!)

## What to do?

• Sanitize
• Our makefiles allow you to compile with make SAN=1
• This turns on sanitizers, which are efficient checkers that catch many undefined behaviors
• Other powerful tools: valgrind, tis-interpreter, etc.

## Bitwise operators

• A bitwise operator is one that operates on the bits of its input(s) independently
• C and C++ bitwise operators:
• ~ (unary): complement
• &: bitwise and
• |: bitwise or
• ^: bitwise exclusive or
• <<: left shift
• >>: right shift

## Truth tables

~ output
0 1
1 0
& 0 1
0 0 0
1 0 1
| 0 1
0 0 1
1 1 1
^ 0 1
0 0 1
1 1 0

## Beware precedence

• C did not pick the right precedence for bitwise operators
• x & 1 != 0 is parsed as x & (1 != 0)
• Most uses of bitwise operators require parentheses

## Some mathematical properties

• &, |, and ^ are commutative: (a&b)==(b&a)
• &, |, and ^ are associative: (a&(b&c))==((a&b)&c)
• & and | are distributive: ((a&b)|(a&c)) == (a&(b|c)), ((a|b)&(a|c)) == (a|(b&c))
• & distributes over ^: ((a&b)^(a&c)) == (a&(b^c))
• 0 is the identity for | and ^: (x|0)==x, (x^0)==x
• -1 (=~0) is the identity for &: (x&-1)==x
• 0 is the absorbing or annihilating element for &: (x&0)==0
• -1 is the absorbing element for |: (x|-1)==-1

## Left and right shift

• a << i: Move the bits in a left by i places, shifting in 0s
• Like a = a * 2i
• a >> i: Move the bits in a right by i places
• If a is unsigned, shifting in 0s
• If a is signed, shifting in copies of the sign bit
• Like a = a / 2i rounded down
• On many computers, left and right shift are much faster than multiplication and (especially) division
• i must be nonnegative and less than 8 * sizeof(a)

## Aside: Sized integer types

• Normal C++ integers, like int, long, etc., have variable size
• The standard also defines a set of sized types in #include <cstdint>
• uint8_t: Unsigned 8-bit integer
• int32_t: Signed 16-bit integer
• Also 16, 64
• uintptr_t: Unsigned integer big enough to represent any pointer
• ptrdiff_t: Signed integer big enough to represent any difference between pointers
• size_t: Unsigned integer big enough to hold the size of any representable object
• ssize_t: Signed version of size_t

## Fun with bits

• Can you define unary negation -x using other operators (bitwise and +)?
• What does (x & 1) == 0 mean?
• What does x & 15 equal?
• What can you learn about x from the value x & (x - 1)?
• What does x & -x equal?

## Fun with bits (2)

• Can you define unary negation -x using other operators (bitwise and +)?
• -x~x + 1
• What does (x & 1) == 0 mean?
• It tests whether x is even
• What does x & 15 equal?
• x % 16
• In general, (x & (2i - 1))x mod 2i, but is much faster to compute
• What can you learn about x from the value x & (x - 1)?
• Whether x is a power of two. (x & (x - 1)) == 0 iff x is 0 or a power of 2
• What does x & -x equal?
• The value of the lowest 1 bit in x
• (5 & -5) == 1; (10 & -10) == 2; (8 & -8) == 8
• (This also equals the gcd (greatest common divisor) of x and -x!)

## Using bits

• Bitwise operators allow us to efficiently represent small sets
• A set is a collection of elements, each of which might be present or absent
• An integer is a collection of bits, each of which might be 1 or 0
• Set operations, such as union, intersection, and membership-test, correspond to bitwise operations

## Bitset example

• Consider sets taken from the universe {e, f, g, h}
• Assign each element m a distinct bit value ℬ(m) (which must be a power of two)
• For example, ℬ(e) = 1, ℬ(f) = 2, ℬ(g) = 4, ℬ(h) = 8
• A set of elements is represented as the sum of the ℬ values
• ℬ({}) = 0
• ℬ({e, g}) = ℬ(e) + ℬ(g) = 5
• ℬ({e, f, g, h}) = 15
• Union and intersection map to | and &
• ℬ(X ∪ Y) = (ℬ(X) | ℬ(Y))
• ℬ(X ∩ Y) = (ℬ(X) & ℬ(Y))
• For example, {e, f} ∪ {e, h} = {e, f, h}, and indeed (3 | 9) == 11
• Set difference maps to & with ~
• ℬ(X ∖ Y) = (ℬ(X) & ~ℬ(Y))
• Membership testing maps to &
• m ∈ X ⟺ (ℬ(m) & ℬ(X)) != 0

## WTF corner

(You don’t need to understand this, but it is fun!)

unsigned long wtf(unsigned long x) {
unsigned long y = x & -x;
unsigned long c = x + y;
return (((x ^ c) >> 2) / y) | c;
}