Overview
We introduce data movement and arithmetic instructions on x86-64.
Full lecture notes on assembly — Textbook readings
Machine code and assembly
- A computer processor reads instructions from memory
- The instructions tell the processor what to do
- Instructions have a byte representation (machine code)
- And a textual representation (assembly language)
How machine code is executed: simple model
- The processor (CPU—Central Processing Unit) reads instructions from memory
- It decodes each instruction and performs the corresponding operation before going to the next
- It executes instructions sequentially unless redirected explicitly by an instruction (a branch instruction—like “goto”)
How machine code is generated: simple model
How machine code is generated: assembler model
How machine code is generated: linking
Assembly example
0000000000401210 <add>:
  401210: 8d 04 3e                      leal    (%rsi,%rdi), %eax
  401213: c3                            retq
  401214: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)
  40121e: 66 90                         nop
- Left: Address or offset at which code appears
- Middle: Machine code representation
- Right: Assembly language
Assembly flavors
- Compiler generated
- make FILE.s,- gcc -S
- Includes symbolic names
- Includes labels and directives (e.g., ## %bb.o:,.LFB0:)
- Does not include machine code or offsets
 
- Read from object file
- objdump -d file.o
- Offsets can be weird because linker hasn’t set them yet
- (For example, library function calls may show up as callq 31 <add+0x31>rather thancallq 401090 <open@plt>)
 
- Read from executable
- objdump -d exefile,- gdb
- Has final offsets, has fewer symbolic names
- Has garbage at the end of functions (any idea why?)
 
Reading assembly
- Dive in and make assumptions! Assembly makes some sense
- Confused by an instruction? Look it up in our notes or more broadly
- Or even in the Intel x86-64 manual
 
Simple functions
    .file   "f00.cc"
    .text
    .globl  _Z1fv
    .type   _Z1fv, @function
_Z1fv:
    ret
    .size   _Z1fv, .-_Z1fv
    .ident  "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
    .section    .note.GNU-stack,"",@progbits
Directives, labels, instructions
- This is compiler-generated assembly
- Comprises directives, labels, and instructions
- Directive: an instruction to the assembler; controls aspects of the output that aren’t machine code
- .file: What the source file was
- .text: Which segment should store the generated instructions
- .globl,- .type: Information for the linker about the function
 
- Label: marks the next instruction, making it referenceable by other instructions and files
- _Z1fv:
 
- Instruction: assembly language
- ret
 
f00.s, f01.s
- In the body of this lecture, we look at assembly files generated by the compiler and try to reason through what the source files might be!
ret
- Three classes of instruction
- Arithmetic: perform computations on values
- Data movement: move data to and from primary memory
- Control flow: change the instruction sequence
 
- retreturns from the current function
- It’s a control flow instruction
f02.s, f03.s
mov
- The movinstruction is a data movement instruction
- Format: mov SRC, DST
- movl $100, %eax
Registers
- Registers comprise the fastest kind of memory available to the CPU
- Machines have tons of memory but few registers
- x86-64 has just 14 general-purpose registers!
- Each 64 bits wide
 
- Registers have no addresses
- They have names like %rax
- They cannot be dereferenced using a numeric address or pointer
- Their format and layout is not prescribed by the C++ memory model
 
- They have names like 
Register slices
- Although registers are 64 bits wide, the data we handle is often smaller
- Names are provided for slices of each register
- %rax: the entire register (bits 0–63)- %eax: the lowest 32 bits (bits 0–31)
- %ax: the lowest 16 bits (bits 0–15)
- %al: the lowest 8 bits (bits 0–7)
- %ah: bits 8–15
 
- Instructions must match sizes
- In compiler-generated assembly, an instruction suffix indicates size
- movl $100, %eax: move the 32-bit number- 100into the 32-bit register- %eax
- (This sets bits 32–63 to zero)
- movq $100, %rax: move the 64-bit number- 100into the 64-bit register- %rax
- movl $100, %rax: syntax error
 
f04.s, f05.s, f06.s, f07.s
Data operands and address modes
- $X: an immediate value (a constant)
- %X: a register value
- a(%rip): a global symbol
- (%X): an indirect reference (dereferencing a “pointer”)
- 8(%X): an offset indirect reference (dereferencing a structure or array)- N(R)means dereference memory at address- R+- N
 
f08.s
Arithmetic (computation) instructions
- OP SRC, DSTmeans- DST := DST OP SRC
- xorl %eax, %eaxmeans- %eax := %eax ^ %eax
- Which means…
f09.s, f10.s
f11.s
Moving into register slices
- mov[SIGN][SRCSIZE][DSTSIZE]- SIGNis- z(extend with zeros) or- s(extend with sign bit)
- SRCSIZE/- DSTSIZEis- b(byte),- w(short),- l(int), or- q(long)
 
f12.s
f13.s, f14.s
f15.s, f16.s, f17.s
More data formats
- (%X,%Y,Z): an array indirect reference- Dereference memory at %X + %Y * %Z
 
- Dereference memory at 
- Full format: offset(base,index,scale)- offset + base + index * scale
- offsetmust be a constant
- scalemust be 1, 2, 4, or 8
- Default offset,base, andindexare 0; defaultscaleis 1
 
f18.s
The lea instruction
- leastands for Load Effective Address
- It performs an address computation, but does not dereference
- Often used by compiler as a parsimonious alternative to array arithmetic
- leal (%rdi,%rsi,8), %eax
- movl %esi, %eax; shll $3, %eax; addl %edi, %eax