Assembly 4: Buffer overflows

The storage1/attackme.cc file contains a particularly dangerous kind of undefined behavior, a buffer overflow. In a buffer overflow, an untrusted input is transferred into a buffer, such as an array of characters on the stack, without checking to see whether the buffer is big enough for the untrusted input.

Historically buffer overflows have caused some of the worst, and most consequential, C and C++ security holes known. They are particularly bad when the untrusted input derives from the network: then breaking into a network can be as simple as sending a packet.

The buffer overflow in attackme.cc derives from a checksum function. Our simple checksum takes in a pointer to the buffer, then copies that buffer to a local variable, buf, and processes the copy. Unfortunately, checksum doesn’t verify that the input fits its buffer, and if the user supplies too big an argument, checksum’s buffer will overflow!

Buffer overflows cause undefined behavior—it’s illegal to access memory outside any object. Undefined behavior is always bad, but some is worse than others; and this particular buffer overflow allows the attacker to completely own the victim program, causing it to execute any shell command!

The attack works as follows.

The function’s entry sequence allocates local variable space with subq $112, %rsp. Examining the assembly we can infer that buf is stored at this %rsp.
The function’s return address is stored at 112(%rsp), which is buf.c + 112.
Any input arg with strlen(arg) > 99 will overflow the buffer. (The 100th character is the null character that ends the string). But args with 100 to 111 characters won’t cause problems, because the remaining 12 bytes of local variable space are unused (they’re present to ensure stack frame alignment).
args of 112 or more characters, however, will run into the stack slot reserved for checksum’s return address! An attacker can put any value they want in that location, as long as the value contains no null characters (since checksum’s buffer copy stops when it encounters the end of the input string).
Overflowing the return address slot causes harm when checksum returns. Examining the state during the exit sequence, we see that the immediately previous instructions load %rdi with buf and call finish_checksum. But what a coincidence—finish_checksum does not modify %rdi! Thus, when checksum returns, %rdi will be loaded with the address of buf.
Our attacker therefore supplies a carefully-chosen string that overwrites checksum’s return address with the address of the run_shell function. This will cause run_shell to run the programs defined in the initial portion of the string!

Thus, the attacker has taken control of the victim by returning to an unexpected library function, with carefully-chosen arguments. This attack is called a return-to-libc attack.

A version of checksum’s copy-to-aligned-buffer technique is actually useful in practice, but real code would use a smaller buffer, processed one slice at a time.

Defending against return-to-libc attacks and buffer overflows

Return-to-libc attacks used to be pretty trivial to find and execute, but recent years have seen large improvements in the robustness of typical C and C++ programs to attack. Here are just a few of the defenses we had to disable for the simple attackme attack to work.

Register arguments: In older architectures, function arguments were all passed on the stack. In those architectures attackers could easily determine not only the function returned to, but also its arguments (a longer buffer overflow would overwrite the stack region interpreted by the “return-callee” as arguments). attackme only works because finish_checksum happens not to modify its argument.

Modern operating systems and architectures support position-independent code and position-independent executables. These features make programs work independent of specific addresses for functions, and the operating system can put a program’s functions at different addresses each time the program runs. When a program is position-independent, the address of the attacker’s desired target function can’t be predicted, so the attacker has to get lucky. (The x86-64 designers were smart to add %rip-relative addressing, since that’s what enables efficient position-independent executables!)

Finally, modern compilers use a technique called stack protection or stack canaries to detect buffer overflows and stop retq from returning to a corrupted address. This is a super useful technique.

It’s called a “canary” in reference to the phrase “canary in a coal mine”.

The operating system reserves a special tiny memory segment when the program starts running. This tiny memory segment is used for special purposes, such as thread-local storage. Although this segment can be addressed normally—it exists in memory—the operating system also ensures it can be accessed through special instruction formats like this:
```
movq %fs:0x28, %rax
```
The %fs: prefix tells the processor that 0x28 (a memory offset) is to be measured relative to the start of the thread-local storage segment.
A specific memory word in thread-local storage is reserved for a canary value. The operating system sets this value to a random word when starting the program.

The compiler adds code to function entry and exit sequences that use this value. Specifically, something like this is added to entry:

movq %fs:0x28, REG    # load true canary to register
pushq REG             # push canary to stack

And something like this is added to exit:

popq REG              # pop canary from stack
xorq %fs:0x28, REG    # xor with true canary
jne fail              # fail if they differ

where the fail branch calls a library-provided function such as __stack_chk_fail.

The pushed canary is located between the function’s buffers (its local variables) and the function’s return address. Given that position, any buffer overflow that reaches the return address will also overwrite the canary! And the attacker can’t easily guess the canary or overwrite the true canary’s memory location, since both are random.

If the stack canary and the true canary don’t match, the function can infer that it executed some undefined behavior. Maybe the return address was corrupted and maybe it wasn’t; either way, executing undefined behavior means the program is broken, and it is safe to exit immediately.

These techniques, and others like them, are incredibly useful, important, fun, and good to understand. But they don’t make C programming safe: attackers are persistent and creative. (Check out return-oriented programming.) Memory-unsafe languages like C and C++ make programming inherently risky; only programs with no undefined behavior are safe. (And they’re only safe if the underlying libraries and operating system have no undefined behavior either!) Good C and C++ programmers need a healthy respect for—one might say fear of—their tools. Code cleanly, use standard libraries where possible, and test extensively under sanitizers to catch bugs like buffer overflows.

Or listen to cranky people like Trevor Jim, who says that “C is bad and you should feel bad if you don’t say it is bad” and also “Legacy C/C++ code is a nuclear waste nightmare that will make you WannaCry”.

On checksums

A checksum is a short, usually fixed-length summary of a data buffer. Checksums have two important properties:

If two buffers contain the same data (the same bytes in the same order), then their checksums must equal. (This property is mandatory.)
If two buffers contain different data, then their checksums should not be equal. (This property is optional: it is OK for distinct data to have equal checksums, though in a good checksum function, this should be rare.)

Checksums have many uses, but they are often used to detect errors in data transcription. For example, most network technologies use checksums to detect link errors (like bursts of noise). To send data over a network, the sender first computes a checksum of the data. Then it transmits both data and checksum over the possibly-noisy network channel. The receiver computes its own checksum of the received data, and compares that with the sender’s checksum. If the checksums match, that’s a good sign that the data was transmitted correctly. If the checksums don’t match, then the data can’t be trusted and the receiver throws it away.

Most checksum functions map from a large domain to a smaller range—for instance, from the set of all variable-length buffers to the set of 64-bit numbers. Such functions must sometimes map unequal data to equal checksums, so some errors must go undetected; but a good checksum function detects common corruption errors all, or almost all, the time. For example, some checksum functions can always detect a single flipped bit in the buffer.

The requirements on checksums are the same as the requirements for hash codes: equal buffers have equal checksums (hash codes), and distinct buffers should have distinct checksums (hash codes). A good hash code can be a good checksum and vice versa.

The most important characteristics of a checksum function are speed and strength. Fast checksums are inexpensive to compute, but easy to break, meaning that many errors in the data aren’t detected by the checksum function. Widely-used fast checksum functions include the IPv4/TCP/UDP checksum and cyclic redundancy checks; our checksum function is a lot like the IPv4 checksum. Strong checksums, also known as cryptographic hash functions, are hard to break. Ideally they are infeasible to break, meaning that no one knows a procedure for finding two buffers that have the same checksum. Widely-used strong checksum functions include SHA-256, SHA-512, and SHA-3. There are also many formerly-strong checksum functions, such as SHA-1, that have been broken—meaning that procedures are known for finding two buffers with the same checksum—but are still stronger in practice than fast checksums.

References: Checksums on Wikipedia; CS 225 explores some related theory.