Overview
We finish our discussion of assembly address modes and arithmetic instructions, then discuss calling conventions and control flow in machine code.
Full lecture notes on assembly — Textbook readings
Sidebar: Type-safe linkage and mangled names
- The name of a C++ function encodes the types of its arguments
- This makes C++ compilations safer and supports overloading (functions with different behavior based on argument types)
- Example:
f(int)
⟶_Z1fi
_Z
: This is a mangled name1
: Function name is 1 character longf
: Actual function namei
: First argument isint
- To demangle, try
c++filt MANGLEDNAME
Data operands and address modes
$X
: an immediate value (a constant)%X
: a register valuea(%rip)
: a global symbol(%X)
: an indirect reference (dereferencing a “pointer”)8(%X)
: an offset indirect reference (dereferencing a structure or array)N(R)
means dereference memory at addressR
+N
Arithmetic instructions
- General format:
OP SRC, DST
- This means
DST := DST OP SRC
- This means
addl %eax, %ebx
%ebx := %ebx + %eax
subl %rdi, %r9
%r9 := %r9 - %rdi
f08.s
Arithmetic (computation) instructions
xorl %eax, %eax
means%eax := %eax ^ %eax
- Which means…
f09.s
, f10.s
f11.s
Moving into register slices
mov[SIGN][SRCSIZE][DSTSIZE]
SIGN
isz
(extend with zeros) ors
(extend with sign bit)SRCSIZE
/DSTSIZE
isb
(byte),w
(short),l
(int), orq
(long)
f12.s
f13.s
, f14.s
f15.s
, f16.s
, f17.s
More data formats
(%X,%Y,Z)
: an array indirect reference- Dereference memory at
%X + %Y * %Z
- Dereference memory at
- Full format:
offset(base,index,scale)
offset + base + index * scale
offset
must be a constantscale
must be 1, 2, 4, or 8- Default
offset
,base
, andindex
are 0; defaultscale
is 1
f18.s
The lea
instruction
lea
stands for Load Effective Address- It performs an address computation, but does not dereference
- Often used by compiler as a parsimonious alternative to array arithmetic
leal (%rdi,%rsi,8), %eax
movl %esi, %eax; shll $3, %eax; addl %edi, %eax
Calling convention
- Some aspects of machine code are fixed by the processor manufacturer
- Intel decided
0xc3
is the representation ofret
- Intel decided
- Some aspects of machine code are set by agreement among compiler and operating system developers
- Intel did not decide which register holds return values
- We call this agreement the calling convention since it governs function calls
- Think Geneva Convention, not Comic Convention
- Different conventions can exist for the same processor (e.g., Unix vs. Windows)
- Only codes with the same conventions can safely interact
Elements of a calling convention
- Function arguments
- Function return values
- Local variable storage
- Stack alignment
- Memory and processor state when the program begins
Let’s explore: cc01.cc
–cc03.cc
Arguments
- Argument registers are
%rdi
,%rsi
,%rdx
,%rcx
,%r8
,%r9
, in that order - Large objects are passed in up to 2 registers if they fit, stack otherwise
Return values
- Return register is
%rax
- Large objects are returned in
%rax
+%rdx
if they fit, otherwise first argument points to space for return value