Assembler: Learning to Read

Learning Objectives

Convert C programs into assembly
Use gdb and objdump to examine the assembly underlying a C function/program (Note: objdump is a GNU tool, so you need to use gcc instead of clang to compile files if you want to use objdump.
Read simple assembly
Find function parameters in assembly
Have a good cheat sheet that should help you in your defusing of the bomb.

Getting Started

Pull today's exercise code from the cs61-exercises repository; we'll be working in the asm1x directory. We strongly encourage you to use the appliance today -- if you use your laptop, you are likely to get assembly code that looks quite different from what we expect and the observations we ask you to make and questions we ask you may be difficult to answer.

Developing your Assembly Language Cheat Sheet

Over the course of today's and Thursday's exercises, you will develop a cheat sheet, which you may find quite helpful as you work on defusing your bomb. While you work through the following, have a copy of the file cheatsheet.txt open so you can fill it in.

Registering registers in your brain

A lot of the actual computation that happens in your programs will take place in registers. Why? Because registers are fast! So, let's use GDB to help us remember the register layout of the x86-64 architecture.

In the cs61-exercises/asm1x directory, you'll find a simple program called main. Build it (using make main), and then start gdb on your main program, and set a breakpoint at main. (We are assuming that after completing assignment 1, you know how to do these things. If you do not, ask a table-mate or raise your hand -- you really want to know how to use gdb by now.) Now, run the program, and when you come to the breakpoint, display all the registers with the command info registers (info r for short). This is how you will display the contents of all your registers. For the most part, you will only be concerned with those that appear before rip, although you may find eflags useful. For today, we're concerned only with the general purpose registers, which are those that appear before rip.

Fill in parts A1–A5 of your cheat sheet.

Next, let's examine the code at main by asking gdb to disassemble it for us: disas(semble). Just by reading the code, answer the following questions:

1. Are arguments to main passed the same way that they are into any other function?

2. What line led you to answer the previous question?

3. Where in the address space do you suppose that the function print_nargs resides?

Passing arguments to functions

Next, dissassemble the function print_nargs. (There are two ways to do this, see if you can figure out either!) The mov instructions that you see in the code here do what you might expect: they move data into/out of registers (they also move data into/out of memory, but we'll get to that later). In our assembly syntax, they move from operand1 into operand2.

You'll notice that the code does not call a function named printf, but instead is calling something named __printf_chk. How many arguments do you think it takes in this case? If you cannot remember how arguments are passed, we've provided the same files that we used in lecture to help you with this. They are in arg1.c through arg7.c. Build the assembly for these (just type make). Now, use the assembly produced to fill in questions B.1-B.7 on your cheatsheet. You may find it helpful to keep your cheat sheet handy during the next several classes so you can finish filling it out.

4. How many arguments are being passed to __printf_chk?

Let's look at those arguments in a bit more detail. Look at the 3rd instruction of the function. It should look like this:

   mov      $0x400628, %esi

That first value looks suspiciously like an address -- how can we find out what's at that address? We use the x (examine) instruction to look at memory by address:

   (gdb) x/40c 0x400628

(The c says that we want to print out the contents of the address as characters; you can print things out in lots of different ways; check out the documentation for details.)

5. What data is stored at the address referenced in that 3rd instruction?

C types in Assembly

In the file sub.c, you'll find a simple function that takes two arguments, arg1 and arg2, and computes arg1 - arg2.

See how many different functions you can write that produce exactly the same assembly code. Spend no more than five minutes on this. Experiment with different parameter types and different ways of instructing the compiler to produce code that subtracts two numbers. (If you create files named things like sub1.c, sub2.c, etc, then typing make will produce assembly for them.)

Clever Compilers and Multiplication Operations

Take a look at the file mul.c.

6. Based on what you've seen so far in class, predict what assembly code will be generated for this function.

Now, make the .s file.

7. What is the new instruction you see in the code? Can you figure out what it's doing? If you get stuck, Google the instruction and x86-64 or assembly and see what you can find.

Once you've figured out how this program is working, predict what will happen if you change the number 16 in mul.c to 128. Produce the assembly code and see if you are right.

Next, change the type in mul.c from long to int. Make a guess what the assembly should look like and then check your guess.

8. How did the assembly change when you changed the type in C?

Check your knowledge of Assembly

Foreach of the files mystery1.S, mystery2.S, mystery3.S, and mystery4.S, see if you can write C code to generate identical assembly! If you encounter instructions whose meaning you do not understand, try googling! Mystery4 uses several features that we have not yet gone over, so if you don't understand it, don't worry, but if you get that far, have some fun and see if you can figure it out.

Summing Up

You can read assembly for simple programs!
You know how assembly instructions express arithmetic and logical and shift operators
You know how arguments are passed in assembly language

Please complete this short survey.

FAQ

Q1: Why are some of the functions bracketed by:

  subq $8, %rsp
  ...
  addq $8, %rsp

A1: As you may recall from last Thursday's lecture, local space for functions is allocated on the stack. The subtraction provides 8 bytes of space for the function to use and the addition returns that stack space. In the code examples you examined, this space was necessary when the function called another function, because the call uses the stack to restore the address to which the function should return. We will go into this in much more depth next Tuesday.

Solution Walkthrough

The last three slides cover the mystery functions, which are probably the most useful!

Exercise Walkthrough Video