Overview
We introduce data movement and arithmetic instructions on x86-64.
Full lecture notes on assembly — Textbook readings
Machine code and assembly
- A computer processor reads instructions from memory
- The instructions tell the processor what to do
- Instructions have a byte representation (machine code)
- And a textual representation (assembly language)
How machine code is executed: simple model
- The processor (CPU—Central Processing Unit) reads instructions from memory
- It decodes each instruction and performs the corresponding operation before going to the next
- It executes instructions sequentially unless redirected explicitly by an instruction (a branch instruction—like “
goto
”)
How machine code is generated: simple model
How machine code is generated: assembler model
How machine code is generated: linking
Assembly example
0000000000401210 <add>:
401210: 8d 04 3e leal (%rsi,%rdi), %eax
401213: c3 retq
401214: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
40121e: 66 90 nop
- Left: Address or offset at which code appears
- Middle: Machine code representation
- Right: Assembly language
Assembly flavors
- Compiler generated
make FILE.s
,gcc -S
- Includes symbolic names
- Includes labels and directives (e.g.,
## %bb.o:
,.LFB0:
) - Does not include machine code or offsets
- Read from object file
objdump -d file.o
- Offsets can be weird because linker hasn’t set them yet
- (For example, library function calls may show up as
callq 31 <add+0x31>
rather thancallq 401090 <open@plt>
)
- Read from executable
objdump -d exefile
,gdb
- Has final offsets, has fewer symbolic names
- Has garbage at the end of functions (any idea why?)
Reading assembly
- Dive in and make assumptions!
- Assembly can make some intuitive sense, given some basic concepts
- Want to see how a function is compiled?
- Make a file
x.cc
in acs61-lectures/asm1
directory - Run
make x.s
- Look at
x.s
- Make a file
- Confused by an instruction? Look it up in our notes or more broadly
- Or even in the Intel x86-64 manual
Simple functions
.file "f00.cc"
.text
.globl _Z1fv
.type _Z1fv, @function
_Z1fv:
ret
.size _Z1fv, .-_Z1fv
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",@progbits
Directives, labels, instructions
- This is compiler-generated assembly
- Comprises directives, labels, and instructions
- Directive: an instruction to the assembler; controls aspects of the output that aren’t machine code
.file
: What the source file was.text
: Which segment should store the generated instructions.globl
,.type
: Information for the linker about the function
- Label: marks the next instruction, making it referenceable by other instructions and files
_Z1fv:
- Instruction: assembly language
ret
f00.s
, f01.s
- In the body of this lecture, we look at assembly files generated by the compiler and try to reason through what the source files might be!
ret
- Three classes of instruction
- Arithmetic: perform computations on values
- Data movement: move data to and from primary memory
- Control flow: change the instruction sequence
ret
returns from the current function- It’s a control flow instruction
f02.s
, f03.s
mov
- The
mov
instruction is a data movement instruction - Format:
mov SRC, DST
movl $100, %eax
Registers
- Registers comprise the fastest kind of memory available to the CPU
- Machines have tons of memory but few registers
- x86-64 has just 14 general-purpose registers!
- Each 64 bits wide
- Registers have no addresses
- They have names like
%rax
- They cannot be dereferenced using a numeric address or pointer
- Their format and layout is not prescribed by the C++ memory model
- They have names like
Register slices
- Although registers are 64 bits wide, the data we handle is often smaller
- Names are provided for slices of each register
%rax
: the entire register (bits 0–63)%eax
: the lowest 32 bits (bits 0–31)%ax
: the lowest 16 bits (bits 0–15)%al
: the lowest 8 bits (bits 0–7)%ah
: bits 8–15
- Instructions must match sizes
- In compiler-generated assembly, an instruction suffix indicates size
movl $100, %eax
: move the 32-bit number100
into the 32-bit register%eax
- (This sets bits 32–63 to zero)
movq $100, %rax
: move the 64-bit number100
into the 64-bit register%rax
movl $100, %rax
: syntax error