Assembler: Learning to Read
Learning Objectives
- Convert C programs into assembly
- Use gdbandobjdumpto examine the assembly underlying a C function/program (Note:objdumpis a GNU tool, so you need to usegccinstead ofclangto compile files if you want to useobjdump.
- Read simple assembly
- Find function parameters in assembly
- Have a good cheat sheet that should help you in your defusing of the bomb.
Getting Started
Pull today's exercise code from the cs61-exercises repository; we'll
be working in the asm1x directory. We strongly encourage you to use
the appliance today -- if you use your laptop, you are likely to get
assembly code that looks quite different from what we expect and the
observations we ask you to make and questions we ask you may be
difficult to answer.
Developing your Assembly Language Cheat Sheet
Over the course of today's and Thursday's exercises, you will develop a
cheat sheet, which you may find quite helpful as you work on defusing
your bomb. While you work through the following, have a copy of the file
cheatsheet.txt open so you can fill it in.
Registering registers in your brain
A lot of the actual computation that happens in your programs will take place in registers. Why? Because registers are fast! So, let's use GDB to help us remember the register layout of the x86-64 architecture.
In the cs61-exercises/asm1x directory, you'll find a simple program
called main. Build it (using make main), and then start gdb on
your main program, and set a breakpoint at main. (We are assuming that
after completing assignment 1, you know how to do these things. If you
do not, ask a table-mate or raise your hand -- you really want to know
how to use gdb by now.) Now, run the program, and when you come to the
breakpoint, display all the registers with the command info registers
(info r for short). This is how you will display the contents of all
your registers. For the most part, you will only be concerned with those
that appear before rip, although you may find eflags useful. For
today, we're concerned only with the general purpose registers, which
are those that appear before rip.
Fill in parts A1–A5 of your cheat sheet.
Next, let's examine the code at main by asking gdb to disassemble it
for us: disas(semble). Just by reading the code, answer the following
questions:
1. Are arguments to main passed the same way that they are into any other function?
2. What line led you to answer the previous question?
3. Where in the address space do you suppose that the function
print_nargs resides?
Passing arguments to functions
Next, dissassemble the function print_nargs. (There are two ways to do
this, see if you can figure out either!) The mov instructions that you
see in the code here do what you might expect: they move data into/out
of registers (they also move data into/out of memory, but we'll get to
that later). In our assembly syntax, they move from operand1 into
operand2.
You'll notice that the code does not call a function named printf, but
instead is calling something named __printf_chk. How many arguments do
you think it takes in this case? If you cannot remember how arguments
are passed, we've provided the same files that we used in lecture to
help you with this. They are in arg1.c through arg7.c. Build the
assembly for these (just type make). Now, use the assembly produced
to fill in questions B.1-B.7 on your cheatsheet. You may find it
helpful to keep your cheat sheet handy during the next several classes
so you can finish filling it out.
4. How many arguments are being passed to __printf_chk?
Let's look at those arguments in a bit more detail. Look at the 3rd instruction of the function. It should look like this:
   mov      $0x400628, %esi
That first value looks suspiciously like an address -- how can we find
out what's at that address? We use the x (examine) instruction to look
at memory by address:
   (gdb) x/40c 0x400628
(The c says that we want to print out the contents of the address as
characters; you can print things out in lots of different ways; check
out the documentation for details.)
5. What data is stored at the address referenced in that 3rd instruction?
C types in Assembly
In the file sub.c, you'll find a simple function that takes two
arguments, arg1 and arg2, and computes arg1 - arg2.
See how many different functions you can write that produce exactly
the same assembly code. Spend no more than five minutes on this.
Experiment with different parameter types and different ways of
instructing the compiler to produce code that subtracts two numbers. (If
you create files named things like sub1.c, sub2.c, etc, then typing
make will produce assembly for them.)
Clever Compilers and Multiplication Operations
Take a look at the file mul.c.
6. Based on what you've seen so far in class, predict what assembly code will be generated for this function.
Now, make the .s file.
7. What is the new instruction you see in the code? Can you figure out what it's doing? If you get stuck, Google the instruction and x86-64 or assembly and see what you can find.
Once you've figured out how this program is working, predict what will
happen if you change the number 16 in mul.c to 128. Produce the
assembly code and see if you are right.
Next, change the type in mul.c from long to int. Make a guess what the
assembly should look like and then check your guess.
8. How did the assembly change when you changed the type in C?
Check your knowledge of Assembly
Foreach of the files mystery1.S, mystery2.S, mystery3.S, and mystery4.S, see if you can write C code to generate identical assembly! If you encounter instructions whose meaning you do not understand, try googling! Mystery4 uses several features that we have not yet gone over, so if you don't understand it, don't worry, but if you get that far, have some fun and see if you can figure it out.
Summing Up
- You can read assembly for simple programs!
- You know how assembly instructions express arithmetic and logical and shift operators
- You know how arguments are passed in assembly language
Please complete this short survey.
FAQ
Q1: Why are some of the functions bracketed by:
  subq $8, %rsp
  ...
  addq $8, %rsp
A1: As you may recall from last Thursday's lecture, local space for functions is allocated on the stack. The subtraction provides 8 bytes of space for the function to use and the addition returns that stack space. In the code examples you examined, this space was necessary when the function called another function, because the call uses the stack to restore the address to which the function should return. We will go into this in much more depth next Tuesday.
Solution Walkthrough
The last three slides cover the mystery functions, which are probably the most useful!