Assembly 2
Compilers and linkers
The process of changing C code into executable instructions has multiple stages, and involves many file formats (some with subtle differences) and many interesting programs working together.
Overall, the process works like this:
Input file | Translator | Output file | Purpose | ||
---|---|---|---|---|---|
.c |
→ | Compiler | → | .s |
Generate optimized instructions for functions |
.s |
→ | Assembler | → | .o |
Turn instructions into bytes and relocation entries |
.o (+ .a , .so ) |
→ | Linker | → | executable file (no suffix on unix, .exe on windows) | Combine functions, resolve addresses |
executable | → | Loader | → | running program | Load program into memory |
In more detail:
.c
: C source files (and.h
, headers). C source files define the meaning of the program, and are written in terms of the C abstract machine. Each source file is processed by the…- Compiler. The compiler translates C source code into assembly code, which is a textual format for machine instructions. The compiler must understand the C abstract machine and the semantics of the target architecture (for us, x86-64). However, it does not (necessarily) need to understand exactly how machine instructions are represented in terms of bytes: that is the job of the assembler. Compilers are rather epic programs; they are where most optimization takes place.
.s
: The compiler’s output is an assembly file, with a name like\*.s
.- Assembler. The assembler translates assembly code into instructions, which it places in object files. Assemblers are generally very simple translators (although one could imagine an optimizing assembler).
.o
: The assembler’s output is an object file, with a name like\*.o
. Object files store many kinds of data, including instructions, read-only and read/write global variables (i.e., static-lifetime data), and debugging information. (CS:APP3e §7.3 describes object files in more depth.) However, the object files are not yet complete; for example, instructions in object files have not been assigned final addresses. So object file instructions often contain some “holes”—places that will eventually receive addresses. Another region of the object file contains relocation entries (CS:APP3e §7.7) that explain how the holes should be filled..a
,.so
: Object files can be collected into units called library files, with names like\*.a
or\*.so
. The most basic library is the C library, also calledlibc
, which is automatically integrated into every C program.- Linker: The linker collects a number of object files and libraries and combines them into an executable (also known as an executable object file), which contains actual instructions that run on the machine. The linker collects all the object files and libraries together. It assigns addresses for all static-lifetime objects and code. (This cannot be done in advance because the compiler, which processes source files one at a time, doesn’t know what other objects or libraries will be added.) It processes relocation entries in the object files and libraries and puts these computed addresses in all the appropriate places. Then it emits the final executable. Linkers used to be pretty simple, but modern linkers also perform some optimizations that are only possible when all object files are available.
- Executable: On Windows, executables generally have an
.exe
suffix, but on Unix-derived systems, executables have no suffix. An executable is a file, not a running program. The file format of an executable is designed for fast load, since we generally want programs to start running quickly. - Loader: Finally, the loader is a part of the operating system that turns the executable into a running program, which we call a process. The loader takes the instructions and copies them into memory. In some cases, such as shared libraries, the loader performs some tasks that resemble linking.
Often intermediate stages in this process are hidden. For example, the
compiler usually runs the assembler automatically; to generate an
assembly file rather than an object file, you pass the compiler the
-S
option.
Examining objects
We can distinguish different kinds of assembly by looking at their
formats. For example, here is a file x.c
:
#include <stdio.h>
int my_global = 2;
int main(void) {
return my_global;
}
Here’s a part of the corresponding x.s
file, generated by the
compiler (clang -O2 -S x.c
)
main: # @main
.cfi_startproc
# BB#0:
movl my_global(%rip), %eax
retq
Note that this file has no addresses at all: the assembler will assign temporary addresses and generate relocation entries.
We can assemble the file into an object file (clang -O2 -c x.s
) and
then disassemble it to examine the corresponding instructions. Here’s
what objdump -d
reports:
0000000000000000
:
0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <main+0x6>
6: c3 retq
Now there are addresses associated with each instruction (e.g., 0:
and 6:
), but those addresses are placeholders—when we actually
create an executable, main
will not be located at address 0. Also,
note that the address of my_global
has totally disappeared. The
mov
instruction accesses address 0x0(%rip)
. The 0x0
is a
placeholder that will be filled in later by the linker. We can examine
the corresponding relocation entries with objdump -r
, which reports:
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000002 R_X86_64_PC32 my_global-0x0000000000000004
This tells the linker that result of computing my_global - 0x4 - 2
should be stored at offset 2 into the text segment—that is, it should be
placed into the mov
instruction.
We can then run the linker (clang -O2 x.o -o x
). Here’s part of what
objdump -d
reports for the resulting executable x
:
00000000004004e0
:
4004e0: 8b 05 4a 0b 20 00 mov 0x200b4a(%rip),%eax # 601030
<my_global>
4004e6: c3 retq
A final address has been assigned to main
and another has been
assigned to my_global
. The instructions have been moved to the
correct place, so we see their correct addresses. And the correct offset
has been inserted into the instruction stream so that the mov
instruction refers to the address of my_global
.
Calling convention
The calling convention is a set of rules that defines how functions interact. You can see the calling convention as a set of constraints. Different operating systems and architectures have different calling conventions, but all compilers for the same OS generally agree on a calling convention (this allows their object files can work together).
Every function must obey the calling convention. But how it obeys the calling convention can differ, depending on the function.
The basic conventions (for x86-64 Linux):
- Return address: At function entry, the stack pointer
%rsp
points at the function’s return address. - Stack alignment: At function entry, the stack pointer must equal a
multiple of 16 plus 8. That is, it must be 8 bytes off of 16-byte
alignment. (Since the
callq
instruction modifies the stack by pushing the return address, this means that whencallq
is executed,%rsp
must be truly 16-byte aligned.) - Parameters: At function entry, the function’s first 6 integer and
pointer arguments are passed in registers
%rdi
,%rsi
,%rdx
,%rcx
,%r8
, and%r9
, in that order. 4-byte and smaller values use the lower 32 bits of the corresponding registers; the upper 32 bits can be anything and must be ignored. The 7th and further parameters are stored in the stack, starting immediately after the return address, and with each parameter size rounded up to at least 8 bytes. Thus, in a function with 8 integer parameters, the 7th and 8th parameters are stored at8(initial-%rsp)
and16(initial-%rsp)
, respectively. - Return value: At function exit, the function’s return value is in
the
%rax
register. (4-byte and smaller values use the lower 32 bits,%eax
; the upper 32 bits are ignored.) - Callee-saved registers: At function exit, the following registers
must have the same values as they did at function entry:
%rsp
,%rbp
,%rbx
,%r12
,%r13
,%r14
,%r15
. The other registers can be arbitrarily modified. - Caller-saved registers: At function exit, caller-saved registers are not required to have the same values as they did at function entry. These registers are used to hold temporary values. If the caller wishes to preserve these values, they must be pushed onto the stack.
- Stack usage: The function may add space to the stack for its own
use; the initial
%rsp
marks a boundary between available space, which has smaller addresses than initial%rsp
, and space reserved for the caller, which has larger addresses than the initial%rsp
. The function must not access or modify caller-reserved space (larger addresses than the initial%rsp
), with two exceptions: A function may access or modify its stack parameters (as when it has more than 6 arguments), and it may access or modify objects whose addresses are publicly visible (as when its caller passes it a pointer to a local variable). The function may reserve additional space by changing the current%rsp
(e.g., by executing apush
orsubl \$56, %rsp
), and it may use as scratch space the 128 bytes above the current%rsp
(e.g., by storing a temporary at-8(%rsp)
).
These conventions have some consequences. For example, if a function may
modify %rbp
, it will save the initial %rbp
at function entry
and restore it at function exit, often with instructions like:
pushq %rbp
...
popq %rbp
retq
A large function might run through the following stages.
- At entry, the function will
pushq %rbp
. - Then it will
push
any other callee-saved registers it uses. - Then it will allocate any additional required stack space with
subq \$N, %rsp
. - Inside the function, local variables are referenced with names such
as
8(%rsp)
. The positive offset is because%rsp
points at the top of the stack, so it has the smallest address. However, simple functions may use scratch space for local variables (e.g.,-8(%rsp)
). - At exit, the function will un-allocate its stack space with
addq \$N, %rsp
. - Then it will
pop
any callee-saved registers pushed earlier, in reverse order. - Then it will
popq %rbp
. At this point the initial%rsp
has been restored. - Then it will execute
retq
, which returns.
But these stages aren’t strictly required; only the conventions are
required. So if a function doesn’t call another function, or doesn’t
have any local variables, it may not execute subq \$N, %rsp
. If a
function doesn’t modify %rbp
, it may not push the original %rbp
.
And so forth.
The full
conventions
go into far more detail, and explain how objects such as large
structures are passed or returned. (Briefly, small structures, such as
struct point { int x, y; }
, are passed in one or more registers;
large structures are passed on the stack.)