Section 3: StackSmash
Skills
- Learn how to use tools like
gdb
(GNU Debugger) - Extract instructions from memory
- Learn about simple cyber attacks
Memory layout
We have already gone over memory layout in lecture. This diagram may help you to better understand it.
Remember that the stack (on most architectures) grows down, meaning the addresses farther into the stack are smaller than than those towards the base. The opposite is true for the heap, which grows up towards the stack. Also remember that local variables—automatic-lifetime data—go in the stack, and dynamically allocated memory goes in the heap.
Return addresses
The stack stores automatic-lifetime data, such as a running function’s local variables and scratch space. The part of the stack that belongs to a function execution is called that function’s stack frame.
Part of the compiler’s job is to lay out a function’s stack frame. It analyzes the function and decides where local variables are stored.
But a stack frame stores some control information about the function’s execution, as well as the function’s data. Specifically, each stack frame contains a return address: the address of the instruction that should run after the function completes its work.
For instance, consider this function:
int main() {
int x = func() + 3 * func();
... printf("%d\n", x);
}
The compiler implements main
using a sequence of simple
instructions. For instance, it might do something like this:
1: call func()
2: store result of func() in temporary variable t1
3: call func() again
4: store result of func() in temporary variable t2
5: set t2 := 3 * t2
6: set t1 := t1 + t2
7: store t1 in x
8: ...
Each time func()
returns, main
should pick up where it left off.
Thus, the first time func()
is called, it should return to
instruction #2, but the second time, it should return to instruction
#4. But func()
itself is the same in both cases! We can see that
func()
must return to an instruction defined at runtime, rather
than a single instruction whose address is known in advance.
On x86-64 (and many modern processors), the return address is stored on
the stack. The x86-64 callq
instruction (which implements steps #1
and #3) pushes the return address onto the stack as part of its
operation. Note that the return address for a function’s stack frame
points to an instruction in its caller, so here func
’s return
addresses are in main
, not in func
itself.
Since the return address is on the stack, we can use gdb
or cheat
(by accessing out-of-bounds portions of the stack) to examine or even
modify the return address. Hee hee.
Tools
Useful gdb commands
gdb
is a useful tool that allows the programmers to step through the
execution of the program and examine the memory at each point. There are
many options and commands within gdb
that allow one to accomplish
specific tasks and fulfills different needs. Here are some commands to
get you started. You can use either the full name of the command (e.g.,
run
) or the shortcut (e.g., r
) in parentheses.
run
(r
): Executes file passed as command line argument togdb
backtrace
(bt
): Prints the call stackframe <NUMBER>
: Examines the stack frame at<NUMBER>
given by backtraceup <NUMBER>
: Moves up the call stack<NUMBER>
framesdown <NUMBER>
: Moves down the call stack<NUMBER>
framesframe
is absolute whileup
anddown
move relative to the current frame
break
(b
): Pauses execution when a particular point in the code is reachedbreak <FILENAME>:<LINE NUMBER>
break <FUNCTION NAME>
break <FILENAME>:<FUNCTION NAME>
step
(s
): Steps into a line of code (enters function calls)next
(n
): Steps to next line of code (does not enter function calls)continue
(c
): Resumes execution of the program but stops at the breakpoints; useful for passing over irrelevant code or finishing loopsadvance <NUMBER>
(adv
): Steps to a specific line of code (does not enter function calls); useful to break out of loopsfinish
: Runs until the current function returns then break and print out the return valueexamine
(x
): Examines memory- Syntax:
x/\<# of Elements\><Type of Elements><Additional Size Info>
- Simple Example:
x/d 0x123
: Prints memory at address0x123
as an integer - Complicated Example:
x/20xg 0x345
: Prints memory at address0x345
as 20 hexadecimal "giant" words (8 bytes)
- Syntax:
print <something>
(p
): Prints content of a variable/memory location/registerdisplay <something>
(disp
) - Just likeprint
, but gives the information on each step through the program; useful for keeping track of the value of something throughout execution.set var <VARIABLE_NAME>=<VALUE>
: Sets a value in the executing program; useful for keeping track of values- Example:
set var addr=0x123
- Example:
thread <THREAD_NUMBER>
: Switches between threads in a given program (we will cover threads later in the semester)q
- Quitsgdb
disassemble <FUNCTION_NAME>
ordisassemble <START_ADDR>,<END_ADDR>
ordisassemble <START_ADDR>,+<LENGTH>
(disas
): Prints assembly instructions within a function name, or within specific addressesCtrl + X 2
: Switch into the TUI, which allows you to view assembly and source code alongside the debug consoleCtrl + X 1
: Similar, but gives you only one viewCtrl + X O
: Switch the focus between views
gdb
cheatsheet: http://darkdust.net/files/GDB%20Cheat%20Sheet.pdf
Valgrind
gdb
runs your program as usual, but you can set breakpoints which
will give gdb
control so you can examine the state of memory, and
gdb
will take control if the program segmentation faults.
valgrind
does not run your program as usual, instead it
emulates your program. Your program will now run much slower, but
valgrind` will catch some memory errors much sooner than `gdb
would, and it will even report some nasal demons that have no visible effect on most executions. You should fix all nasal demons because most of them are "time bombs" that an attacker could trigger later using a specially-crafted input. We will see examples later in this section.
valgrind
provides a set of tools that perform debugging, profiling,
or other similar tasks that help you improve your programs. We have seen
memcheck
tool in class, which is a memory debugging tool used
primary for Linux and Mac. It simulates an executable and tracks memory
errors (e.g., uninitialized variables, memory leaks, and buffer
overflows). valgrind
does not detect static (stack) memory
errors, but only the ones that occur dynamically during runtime.
Usage
valgrind --tool=memcheck --leak-check=full
Example Output
- Uninitialized Variables:
==9388== Syscall param exit_group(status) contains uninitialised byte(s)
==9388== at 0x455F7364: _Exit (in /lib/libc-2.14.90.so)
==9388== Use --track-origins=yes to see where uninitialised values come from
- Memory leaks:
==9892== LEAK SUMMARY:
==9892== definitely lost: 64 bytes in 4 blocks
- Buffer Overflows/Invalid Writes:
==9736== Invalid write of size 4
==9736== at 0x8048435: main
==9736== total heap usage: 1 allocs, 1 frees, 16 bytes
==9736== All heap blocks were freed -- no leaks are possible
==9736== ERROR SUMMARY: 1 errors from 1 contexts
Check out http://valgrind.org/docs/manual/valgrind_manual.pdf for
other useful tools in valgrind
!
objdump
objdump
displays information about object file (usually files with
extension .o
). If used as a disassembler, it works like disas
in
gdb
.
Usage:objdump -d <PROGRAM_NAME>
(d
means disassemble)
Assignment
Pull the latest version of the sections repository by entering in the terminal
git pull
The cs61-section/s03
directory contains a few programs that we will
use to investigate representations of objects in memory and assembly
language instructions. We will also learn about one of the most common
types of security vulnerabilities: the buffer overflow, and what
programs that are vulnerable to it might look like.
A buffer overflow is a type of attack in which someone takes advantage
of C's memory layout in order to write past the end of a buffer,
allowing them to make the computer execute malicious code. At the
beginning of each function call, a return address is pushed onto the
stack. Once the machine hits a perceived return address, it will begin
to execute the next instructions it sees. An attacker must "simply"
write past the end of a buffer (many functions in C, like gets()
,
have no boundary checking) and overwrite the return address to the
address in memory where the malicious code is stored. Usually, the
attacker has put this code in the buffer itself, but sometimes they will
try to use existing code to construct an attack. The malicious code is
frequently a system call to open a shell, which when executed will allow
the attacker to potentially steal information or cause more trouble.
This section material requires Linux (although you can do the
factorial
portion on any x86-64 machine). Before running later
parts, run sudo apt install gifsicle
.
factorial.c (Part 1)
In this exercise, open factorial.c
file and change the
factorial(int n)
function to print out not only the content of the
buffer buf
but also the return address of this function.
To do this exercise, you need to:
- Have a clear picture of the memory layout in the stack
- Understand the concept of return addresses
- Know how to use
gdb
factorial.c (Part 2)
Now that you know the place where the factorial(int n)
function
stores its return address, your job is to modify the return address.
Your goal is to add code to the end of factorial
—without changing
anything else!—so that running ./factorial
prints HIJACKED! 24
instead of 5! == 120
.
To do this exercise, you need to:
- Know the exact place where
factorial(int n)
stores its return address - Know pointer arithmetic
- Know how to use
gdb
andobjdump
StackSmash (Part 3)
Stacksmash reads from the standard input. If you just run it, it will sit there, waiting for you to type something. So type something and see what it does! As an example:
echo Hi | ./stacksmash
Now try the following commands:
yes | tr -d '\n' | head -c 120 > y120.txt //see section 'Helpful Unix Utility Programs'
./stacksmash < y120.txt
./stacksmash.safe < y120.txt
What is the difference between the output of ./stacksmash
and
./stacksmash.safe
?
Now, let's try different input file lengths to both programs. Try, for
example, file lengths 99 and 119 (make yxxx.txt
files as before).
Are the outputs different among various input file lengths? Why?
AttackMe (Part 4)
Now we proceed to this grand challenge.
First, examine the code in attackme.c
and understand what it does.
Then, run ./attackme
program. This program has a secret. If you
pass it a carefully-crafted input you can trigger it.
Can you figure out the trigger input using gdb
, nm
(see below
section), objdump
, printf
, and whatever else?
Second, how would you change ./attackme
so it became unattackable?
Helpful Unix Utility Programs
- Print an endless stream of "y" lines.
yes
- Character-level (really byte-level) translator.
tr -d '\n'
tr -d <STRING>
deletes (-d
) from the input all characters in
<STRING>.
trunderstands backslash escapes. Thus
yes | tr -d
'\n' removes all newlines (
'\n'characters) from the output of
yes. The result is an endless stream of
y` characters.
- Print the first 10 lines of the input.
head
head -n 1000 //Print the first 1000 lines of the input.
head -c 1000 //Print the first 1000 characters of the input.
So, yes \| tr -d '\n' \| head -c 124
produces a file comprising 124
'y' characters with no terminating newline.
- A command-line version of the familiar
printf()
function
printf `<FORMAT>` <ARGS...>
It can be convenient for printing special characters. For instance, try
printf '\x36\x31\n'
.
- Print out a list of all the names defined by
<PROGRAM>
:
nm
This is called the program's symbol table. It includes functions and
global variables defined by <PROGRAM>
, as well as functions and
global variables defined by system libraries and only referenced by
<PROGRAM>
. Here's some of its output:
U __libc_start_main@@GLIBC_2.0
08048567 T main
U printf@@GLIBC_2.0
U puts@@GLIBC_2.0
08048540 T read_line
The first column is the link address: where that symbol is located at run time. The second column defines the symbol type:
T
means a text symbol (read-only global data, such as a function definition)D
means a data symbol (read/write global data)U
means undefined.- There are many more.
You can also use nm
on individual object files.
Comprehension Questions
- Notice from the makefile that the
.unsafe
extension is the same source file as the .c file, except that it has been compiled without boundary checking. What difference does this make during a buffer overflow? What additional information do you get from the "safe" version? - How might one use
gdb
's disassemble function orobjdump
to make a buffer overflow attack more strategic? - What sort of strategies have we learned so far for checking when memory has been overwritten? Which do you think might be useful for checking if a buffer overflow has taken place?