Ancient CS 61 Content Warning!!!!!1!!!
This is not the current version of the class.
This site was automatically translated from a wiki. The translation may have introduced mistakes (and the content might have been wrong to begin with).

Section 3: StackSmash

Skills

Memory layout

We have already gone over memory layout in lecture. This diagram may help you to better understand it.

File:Screen_Shot_2017-09-14_at_4.10.56_PM.png

Remember that the stack (on most architectures) grows down, meaning the addresses farther into the stack are smaller than than those towards the base. The opposite is true for the heap, which grows up towards the stack. Also remember that local variables—automatic-lifetime data—go in the stack, and dynamically allocated memory goes in the heap.

Return addresses

The stack stores automatic-lifetime data, such as a running function’s local variables and scratch space. The part of the stack that belongs to a function execution is called that function’s stack frame.

Part of the compiler’s job is to lay out a function’s stack frame. It analyzes the function and decides where local variables are stored.

But a stack frame stores some control information about the function’s execution, as well as the function’s data. Specifically, each stack frame contains a return address: the address of the instruction that should run after the function completes its work.

For instance, consider this function:

int main() {
    int x = func() + 3 * func();
    ... printf("%d\n", x);
}

The compiler implements main using a sequence of simple instructions. For instance, it might do something like this:

1: call func()
2: store result of func() in temporary variable t1
3: call func() again
4: store result of func() in temporary variable t2
5: set t2 := 3 * t2
6: set t1 := t1 + t2
7: store t1 in x
8: ...

Each time func() returns, main should pick up where it left off. Thus, the first time func() is called, it should return to instruction #2, but the second time, it should return to instruction #4. But func() itself is the same in both cases! We can see that func() must return to an instruction defined at runtime, rather than a single instruction whose address is known in advance.

On x86-64 (and many modern processors), the return address is stored on the stack. The x86-64 callq instruction (which implements steps #1 and #3) pushes the return address onto the stack as part of its operation. Note that the return address for a function’s stack frame points to an instruction in its caller, so here func’s return addresses are in main, not in func itself.

Since the return address is on the stack, we can use gdb or cheat (by accessing out-of-bounds portions of the stack) to examine or even modify the return address. Hee hee.

Tools

Useful gdb commands

gdb is a useful tool that allows the programmers to step through the execution of the program and examine the memory at each point. There are many options and commands within gdb that allow one to accomplish specific tasks and fulfills different needs. Here are some commands to get you started. You can use either the full name of the command (e.g., run) or the shortcut (e.g., r) in parentheses.

gdb cheatsheet: http://darkdust.net/files/GDB%20Cheat%20Sheet.pdf

Valgrind

gdb runs your program as usual, but you can set breakpoints which will give gdb control so you can examine the state of memory, and gdb will take control if the program segmentation faults. valgrind does not run your program as usual, instead it emulates your program. Your program will now run much slower, but

valgrind` will catch some memory errors much sooner than `gdb

would, and it will even report some nasal demons that have no visible effect on most executions. You should fix all nasal demons because most of them are "time bombs" that an attacker could trigger later using a specially-crafted input. We will see examples later in this section.

valgrind provides a set of tools that perform debugging, profiling, or other similar tasks that help you improve your programs. We have seen memcheck tool in class, which is a memory debugging tool used primary for Linux and Mac. It simulates an executable and tracks memory errors (e.g., uninitialized variables, memory leaks, and buffer overflows). valgrind does not detect static (stack) memory errors, but only the ones that occur dynamically during runtime.

Usage

valgrind --tool=memcheck --leak-check=full 

Example Output

==9388== Syscall param exit_group(status) contains uninitialised byte(s) 
==9388== at 0x455F7364: _Exit (in /lib/libc-2.14.90.so) 
==9388== Use --track-origins=yes to see where uninitialised values come from
==9892== LEAK SUMMARY:
==9892== definitely lost: 64 bytes in 4 blocks
==9736== Invalid write of size 4 
==9736== at 0x8048435: main 
==9736== total heap usage: 1 allocs, 1 frees, 16 bytes 
==9736== All heap blocks were freed -- no leaks are possible 
==9736== ERROR SUMMARY: 1 errors from 1 contexts

Check out http://valgrind.org/docs/manual/valgrind_manual.pdf for other useful tools in valgrind!

objdump

objdump displays information about object file (usually files with extension .o). If used as a disassembler, it works like disas in gdb.

Usage:objdump -d <PROGRAM_NAME>(d means disassemble)

Assignment

Pull the latest version of the sections repository by entering in the terminal

git pull

The cs61-section/s03 directory contains a few programs that we will use to investigate representations of objects in memory and assembly language instructions. We will also learn about one of the most common types of security vulnerabilities: the buffer overflow, and what programs that are vulnerable to it might look like.

A buffer overflow is a type of attack in which someone takes advantage of C's memory layout in order to write past the end of a buffer, allowing them to make the computer execute malicious code. At the beginning of each function call, a return address is pushed onto the stack. Once the machine hits a perceived return address, it will begin to execute the next instructions it sees. An attacker must "simply" write past the end of a buffer (many functions in C, like gets(), have no boundary checking) and overwrite the return address to the address in memory where the malicious code is stored. Usually, the attacker has put this code in the buffer itself, but sometimes they will try to use existing code to construct an attack. The malicious code is frequently a system call to open a shell, which when executed will allow the attacker to potentially steal information or cause more trouble.

This section material requires Linux (although you can do the factorial portion on any x86-64 machine). Before running later parts, run sudo apt install gifsicle.

factorial.c (Part 1)

In this exercise, open factorial.c file and change the factorial(int n) function to print out not only the content of the buffer buf but also the return address of this function.

To do this exercise, you need to:

factorial.c (Part 2)

Now that you know the place where the factorial(int n) function stores its return address, your job is to modify the return address. Your goal is to add code to the end of factorial—without changing anything else!—so that running ./factorial prints HIJACKED! 24 instead of 5! == 120.

To do this exercise, you need to:

StackSmash (Part 3)

Stacksmash reads from the standard input. If you just run it, it will sit there, waiting for you to type something. So type something and see what it does! As an example:

echo Hi | ./stacksmash

Now try the following commands:

yes | tr -d '\n' | head -c 120 > y120.txt  //see section 'Helpful Unix Utility Programs'
./stacksmash < y120.txt
./stacksmash.safe < y120.txt

What is the difference between the output of ./stacksmash and ./stacksmash.safe?

Now, let's try different input file lengths to both programs. Try, for example, file lengths 99 and 119 (make yxxx.txt files as before).

Are the outputs different among various input file lengths? Why?

AttackMe (Part 4)

Now we proceed to this grand challenge.

First, examine the code in attackme.c and understand what it does. Then, run ./attackme program. This program has a secret. If you pass it a carefully-crafted input you can trigger it.

Can you figure out the trigger input using gdb, nm (see below section), objdump, printf, and whatever else?

Second, how would you change ./attackme so it became unattackable?

Helpful Unix Utility Programs

yes
tr -d '\n'

tr -d <STRING> deletes (-d) from the input all characters in <STRING>. trunderstands backslash escapes. Thusyes | tr -d '\n' removes all newlines ('\n'characters) from the output ofyes. The result is an endless stream of y` characters.

head
head -n 1000 //Print the first 1000 lines of the input.
head -c 1000 //Print the first 1000 characters of the input.

So, yes \| tr -d '\n' \| head -c 124 produces a file comprising 124 'y' characters with no terminating newline.

printf `<FORMAT>` <ARGS...>

It can be convenient for printing special characters. For instance, try printf '\x36\x31\n'.

nm 

This is called the program's symbol table. It includes functions and global variables defined by <PROGRAM>, as well as functions and global variables defined by system libraries and only referenced by <PROGRAM>. Here's some of its output:

U __libc_start_main@@GLIBC_2.0
08048567 T main
U printf@@GLIBC_2.0
U puts@@GLIBC_2.0
08048540 T read_line

The first column is the link address: where that symbol is located at run time. The second column defines the symbol type:

You can also use nm on individual object files.

Comprehension Questions