Assembly 2

Compilers and linkers

The process of changing C code into executable instructions has multiple stages, and involves many file formats (some with subtle differences) and many interesting programs working together.

Overall, the process works like this:

Input file		Translator		Output file	Purpose
`.c`	→	Compiler	→	`.s`	Generate optimized instructions for functions
`.s`	→	Assembler	→	`.o`	Turn instructions into bytes and relocation entries
`.o` (+ `.a`, `.so`)	→	Linker	→	executable file (no suffix on unix, .exe on windows)	Combine functions, resolve addresses
executable	→	Loader	→	running program	Load program into memory

In more detail:

.c: C source files (and .h, headers). C source files define the meaning of the program, and are written in terms of the C abstract machine. Each source file is processed by the…
Compiler. The compiler translates C source code into assembly code, which is a textual format for machine instructions. The compiler must understand the C abstract machine and the semantics of the target architecture (for us, x86-64). However, it does not (necessarily) need to understand exactly how machine instructions are represented in terms of bytes: that is the job of the assembler. Compilers are rather epic programs; they are where most optimization takes place.
.s: The compiler’s output is an assembly file, with a name like \*.s.
Assembler. The assembler translates assembly code into instructions, which it places in object files. Assemblers are generally very simple translators (although one could imagine an optimizing assembler).
.o: The assembler’s output is an object file, with a name like \*.o. Object files store many kinds of data, including instructions, read-only and read/write global variables (i.e., static-lifetime data), and debugging information. (CS:APP3e §7.3 describes object files in more depth.) However, the object files are not yet complete; for example, instructions in object files have not been assigned final addresses. So object file instructions often contain some “holes”—places that will eventually receive addresses. Another region of the object file contains relocation entries (CS:APP3e §7.7) that explain how the holes should be filled.
.a, .so: Object files can be collected into units called library files, with names like \*.a or \*.so. The most basic library is the C library, also called libc, which is automatically integrated into every C program.
Linker: The linker collects a number of object files and libraries and combines them into an executable (also known as an executable object file), which contains actual instructions that run on the machine. The linker collects all the object files and libraries together. It assigns addresses for all static-lifetime objects and code. (This cannot be done in advance because the compiler, which processes source files one at a time, doesn’t know what other objects or libraries will be added.) It processes relocation entries in the object files and libraries and puts these computed addresses in all the appropriate places. Then it emits the final executable. Linkers used to be pretty simple, but modern linkers also perform some optimizations that are only possible when all object files are available.
Executable: On Windows, executables generally have an .exe suffix, but on Unix-derived systems, executables have no suffix. An executable is a file, not a running program. The file format of an executable is designed for fast load, since we generally want programs to start running quickly.
Loader: Finally, the loader is a part of the operating system that turns the executable into a running program, which we call a process. The loader takes the instructions and copies them into memory. In some cases, such as shared libraries, the loader performs some tasks that resemble linking.

Often intermediate stages in this process are hidden. For example, the compiler usually runs the assembler automatically; to generate an assembly file rather than an object file, you pass the compiler the -S option.

Examining objects

We can distinguish different kinds of assembly by looking at their formats. For example, here is a file x.c:

#include <stdio.h>
int my_global = 2;
int main(void) {
    return my_global;
}

Here’s a part of the corresponding x.s file, generated by the compiler (clang -O2 -S x.c)

main:                                   # @main
        .cfi_startproc
# BB#0:
        movl    my_global(%rip), %eax
        retq

Note that this file has no addresses at all: the assembler will assign temporary addresses and generate relocation entries.

We can assemble the file into an object file (clang -O2 -c x.s) and then disassemble it to examine the corresponding instructions. Here’s what objdump -d reports:

0000000000000000

   0:  8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 6 <main+0x6>
   6:  c3                      retq

Now there are addresses associated with each instruction (e.g., 0: and 6:), but those addresses are placeholders—when we actually create an executable, main will not be located at address 0. Also, note that the address of my_global has totally disappeared. The mov instruction accesses address 0x0(%rip). The 0x0 is a placeholder that will be filled in later by the linker. We can examine the corresponding relocation entries with objdump -r, which reports:

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000002 R_X86_64_PC32     my_global-0x0000000000000004

This tells the linker that result of computing my_global - 0x4 - 2 should be stored at offset 2 into the text segment—that is, it should be placed into the mov instruction.

We can then run the linker (clang -O2 x.o -o x). Here’s part of what objdump -d reports for the resulting executable x:

00000000004004e0

4004e0: 8b 05 4a 0b 20 00 mov 0x200b4a(%rip),%eax # 601030 <my_global>

  4004e6:  c3                      retq

A final address has been assigned to main and another has been assigned to my_global. The instructions have been moved to the correct place, so we see their correct addresses. And the correct offset has been inserted into the instruction stream so that the mov instruction refers to the address of my_global.

Calling convention

The calling convention is a set of rules that defines how functions interact. You can see the calling convention as a set of constraints. Different operating systems and architectures have different calling conventions, but all compilers for the same OS generally agree on a calling convention (this allows their object files can work together).

Every function must obey the calling convention. But how it obeys the calling convention can differ, depending on the function.

The basic conventions (for x86-64 Linux):

Return address: At function entry, the stack pointer %rsp points at the function’s return address.
Stack alignment: At function entry, the stack pointer must equal a multiple of 16 plus 8. That is, it must be 8 bytes off of 16-byte alignment. (Since the callq instruction modifies the stack by pushing the return address, this means that when callq is executed, %rsp must be truly 16-byte aligned.)
Parameters: At function entry, the function’s first 6 integer and pointer arguments are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9, in that order. 4-byte and smaller values use the lower 32 bits of the corresponding registers; the upper 32 bits can be anything and must be ignored. The 7th and further parameters are stored in the stack, starting immediately after the return address, and with each parameter size rounded up to at least 8 bytes. Thus, in a function with 8 integer parameters, the 7th and 8th parameters are stored at 8(initial-%rsp) and 16(initial-%rsp), respectively.
Return value: At function exit, the function’s return value is in the %rax register. (4-byte and smaller values use the lower 32 bits, %eax; the upper 32 bits are ignored.)
Callee-saved registers: At function exit, the following registers must have the same values as they did at function entry: %rsp, %rbp, %rbx, %r12, %r13, %r14, %r15. The other registers can be arbitrarily modified.
Caller-saved registers: At function exit, caller-saved registers are not required to have the same values as they did at function entry. These registers are used to hold temporary values. If the caller wishes to preserve these values, they must be pushed onto the stack.
Stack usage: The function may add space to the stack for its own use; the initial %rsp marks a boundary between available space, which has smaller addresses than initial %rsp, and space reserved for the caller, which has larger addresses than the initial %rsp. The function must not access or modify caller-reserved space (larger addresses than the initial %rsp), with two exceptions: A function may access or modify its stack parameters (as when it has more than 6 arguments), and it may access or modify objects whose addresses are publicly visible (as when its caller passes it a pointer to a local variable). The function may reserve additional space by changing the current %rsp (e.g., by executing a push or subl \$56, %rsp), and it may use as scratch space the 128 bytes above the current %rsp (e.g., by storing a temporary at -8(%rsp)).

These conventions have some consequences. For example, if a function may modify %rbp, it will save the initial %rbp at function entry and restore it at function exit, often with instructions like:

pushq %rbp
...
popq %rbp
retq

A large function might run through the following stages.

At entry, the function will pushq %rbp.
Then it will push any other callee-saved registers it uses.
Then it will allocate any additional required stack space with subq \$N, %rsp.
Inside the function, local variables are referenced with names such as 8(%rsp). The positive offset is because %rsp points at the top of the stack, so it has the smallest address. However, simple functions may use scratch space for local variables (e.g., -8(%rsp)).
At exit, the function will un-allocate its stack space with addq \$N, %rsp.
Then it will pop any callee-saved registers pushed earlier, in reverse order.
Then it will popq %rbp. At this point the initial %rsp has been restored.
Then it will execute retq, which returns.

But these stages aren’t strictly required; only the conventions are required. So if a function doesn’t call another function, or doesn’t have any local variables, it may not execute subq \$N, %rsp. If a function doesn’t modify %rbp, it may not push the original %rbp. And so forth.

The full conventions go into far more detail, and explain how objects such as large structures are passed or returned. (Briefly, small structures, such as struct point { int x, y; }, are passed in one or more registers; large structures are passed on the stack.)

Assembly 2

Compilers and linkers

Examining objects

Calling convention

References