NOTOC

Lecture 2 thoughts

Very large numbers

The largest unsigned number representable directly on a machine with 2³¹ 8-bit bytes is (2⁸)^2³¹−1, which is the same as 2^8×2³¹−1 = 2^2³⁴−1. An enormous number, far larger than the current estimate of the number of subatomic particles in the observable universe (which is roughly 2²⁶⁶, or less than 2^2⁹).

Of course far larger numbers can be represented indirectly. Here's one, in four Unicode characters (8 bytes): 2↑↑6 ¹

Code as data

What does 8b 44 24 08 03 44 24 04 c3 (the bytes generated by the compiler for int sum(int a, int b) { return a + b; }) mean?

We can find this out in many ways. Here are a few.

objdump

The objdump utility is a Swiss army knife for binary examination. The program has way too many options (as you can tell when I give the wrong option live). But here are a couple useful ones.

-s prints the full contents of an executable or object file.

An executable, also called a binary, is a program: the final output of the compiler. The operating system can run an executable. A object file is intermediate output of the compiler. You can't run an object file on its own; it must first be linked with other object files to form an executable. More on all this soon.

Here's the output of objdump -s sumfunction.o:

kohler@ubuntu:~/cs61-lectures/l02$ objdump -s sumfunction.o

sumfunction.o:     file format elf32-i386

Contents of section .text:
 0000 8b442408 03442404 c3                 .D$..D$..
Contents of section .debug_info:
 0000 5e000000 02000000 00000401 00000000  ^...............
 0010 010c0000 001a0000 00000000 00090000  ................
 0020 00000000 00020173 756d0001 01015a00  .......sum....Z.
 0030 00000000 00000900 00000274 045a0000  ...........t.Z..
 0040 00036100 01015a00 00000291 00036200  ..a...Z.......b.
 0050 01015a00 00000291 04000404 05696e74  ..Z..........int
 0060 0000                                 ..
Contents of section .debug_abbrev:
 0000 01110125 0e130b03 0e1b0e11 01120110  ...%............
 0010 06000002 2e013f0c 03083a0b 3b0b270c  ......?...:.;.'.
 0020 49131101 1201400a 01130000 03050003  I.....@.........
 0030 083a0b3b 0b491302 0a000004 24000b0b  .:.;.I......$...
 0040 3e0b0308 000000                      >......
Contents of section .debug_aranges:
 0000 1c000000 02000000 00000400 00000000  ................
 0010 00000000 09000000 00000000 00000000  ................
Contents of section .debug_line:
 0000 3a000000 02002400 00000101 fb0e0d00  :.....$.........
 0010 01010101 00000001 00000100 73756d66  ............sumf
 0020 756e6374 696f6e2e 63000000 00000005  unction.c.......
 0030 02000000 0001014b 4b020100 0101      .......KK.....
Contents of section .debug_str:
 0000 474e5520 4320342e 362e3300 73756d66  GNU C 4.6.3.sumf
 0010 756e6374 696f6e2e 63002f68 6f6d652f  unction.c./home/
 0020 6b6f686c 65722f63 7336312d 6c656374  kohler/cs61-lect
 0030 75726573 2f6c3032 00                 ures/l02.
Contents of section .comment:
 0000 00474343 3a202855 62756e74 752f4c69  .GCC: (Ubuntu/Li
 0010 6e61726f 20342e36 2e332d31 7562756e  naro 4.6.3-1ubun
 0020 74753529 20342e36 2e3300             tu5) 4.6.3.
Contents of section .eh_frame:
 0000 14000000 00000000 017a5200 017c0801  .........zR..|..
 0010 1b0c0404 88010000 10000000 1c000000  ................
 0020 00000000 09000000 00000000           ............

The first column gives hexadecimal byte offsets. The next four columns show 16 bytes' worth of data in hexadecimal. The final, 16-character wide column shows the data as text, with non-ASCII characters printed as '.'.

Check out all the debugging information. We can also ask objdump to generate output only for the code, which is stored in the "section" named .text.

kohler@ubuntu:~/cs61-lectures/l02$ objdump -s -j .text sumfunction.o

sumfunction.o:     file format elf32-i386

Contents of section .text:
 0000 8b442408 03442404 c3                 .D$..D$..

-d: This flag disassembles the code, printing assembly that could have generated that code. Here's the output of objdump -d sumfunction.o:

kohler@ubuntu:~/cs61-lectures/l02$ objdump -d sumfunction.o

sumfunction.o:     file format elf32-i386


Disassembly of section .text:

00000000 <sum>:
   0:   8b 44 24 08             mov    0x8(%esp),%eax
   4:   03 44 24 04             add    0x4(%esp),%eax
   8:   c3                      ret

-S: This flag is like -d, but also intersperses source lines with the assembly that corresponds. This depends on the source code having been compiled with the -g debugging flag. It works best on unoptimized code, and the output can get confusing, but is still a very useful tool for your toolbox. Here's the output of objdump -S sumfunction.o:

kohler@ubuntu:~/cs61-lectures/l02$ objdump -S sumfunction.o

sumfunction.o:     file format elf32-i386


Disassembly of section .text:

00000000 <sum>:
int sum(int a, int b) {
   0:   8b 44 24 08             mov    0x8(%esp),%eax
    return a + b;
   4:   03 44 24 04             add    0x4(%esp),%eax
}
   8:   c3                      ret

-t: This flag doesn't print data, it just prints the symbol table contained in an object file or executable. The symbol table says what functions and global variables were defined and gives their addresses. Here's the output of objdump -t sumfunction.o:

kohler@ubuntu:~/cs61-lectures/l02$ objdump -t sumfunction.o

sumfunction.o:     file format elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 sumfunction.c
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l    d  .debug_info    00000000 .debug_info
00000000 l    d  .debug_abbrev  00000000 .debug_abbrev
00000000 l    d  .debug_aranges 00000000 .debug_aranges
00000000 l    d  .debug_line    00000000 .debug_line
00000000 l    d  .debug_str 00000000 .debug_str
00000000 l    d  .note.GNU-stack    00000000 .note.GNU-stack
00000000 l    d  .eh_frame  00000000 .eh_frame
00000000 l    d  .comment   00000000 .comment
00000000 g     F .text  00000009 sum

The columns are address, size, type, flags, section, size, and name. For example, sum is defined in the .text section and has size 00000009 bytes. But wait a second, sum has an address of 0!!?.... This is because sumfunction.o is an object file; the final address is not assigned until the linker generates the executable. Like so:

kohler@ubuntu:~/cs61-lectures/l02$ objdump -t mysum

mysum:     file format elf32-i386

SYMBOL TABLE:
08048154 l    d  .interp    00000000              .interp
08048168 l    d  .note.ABI-tag  00000000              .note.ABI-tag
08048188 l    d  .note.gnu.build-id 00000000              .note.gnu.build-id
080481ac l    d  .gnu.hash  00000000              .gnu.hash
...blah blah blah...

kohler@ubuntu:~/cs61-lectures/l02$ objdump -t mysum | grep sum
mysum:     file format elf32-i386
00000000 l    df *ABS*  00000000              mysum.c
00000000 l    df *ABS*  00000000              sumfunction.c
080484e0 g     F .text  00000009              sum

There's the true address, 0x080484e0.

nm: This program produces output like that of objdump -t, but more compact and easier to follow. Here's the output of nm mysum:

kohler@ubuntu:~/cs61-lectures/l02$ nm mysum
08049f28 d _DYNAMIC
08049ff4 d _GLOBAL_OFFSET_TABLE_
080485bc R _IO_stdin_used
         w _Jv_RegisterClasses
08049f18 d __CTOR_END__
08049f14 d __CTOR_LIST__
08049f20 D __DTOR_END__
08049f1c d __DTOR_LIST__
080486f0 r __FRAME_END__
08049f24 d __JCR_END__
08049f24 d __JCR_LIST__
0804a018 A __bss_start
0804a010 D __data_start
08048570 t __do_global_ctors_aux
08048450 t __do_global_dtors_aux
0804a014 D __dso_handle
         w __gmon_start__
08048562 T __i686.get_pc_thunk.bx
08049f14 d __init_array_end
08049f14 d __init_array_start
08048560 T __libc_csu_fini
080484f0 T __libc_csu_init
         U __libc_start_main@@GLIBC_2.0
         U __printf_chk@@GLIBC_2.3.4
0804a018 A _edata
0804a020 A _end
0804859c T _fini
080485b8 R _fp_hw
080482f8 T _init
08048420 T _start
0804a018 b completed.6159
0804a010 W data_start
0804a01c b dtor_idx.6161
080484b0 t frame_dummy
08048380 T main
         U strtol@@GLIBC_2.0
080484e0 T sum

Q. WHAT ARE ALL THOSE SYMBOLS?! A. You should not necessarily worry about tools like nm producing output that you don't fully understand. Skim the output, looking for what you need. For instance, here, you can see that this program probably calls strtol -- there's a reference to strtol@@GLIBC_2.0. As indeed it does. We'll understand some more of these symbols as time goes on. Judicious Googling (or Binging) will enlighten you too.

gcc -S

The compiler normally generates object files, but if we give it the -S flag, it will generate textual assembly output instead. Here's the output of gcc -S -O2 sumfunction.c (which GCC places by default in sumfunction.s):

        .file   "sumfunction.c"
        .text
        .p2align 4,,15
        .globl  sum
        .type   sum, @function
sum:
.LFB0:
        .cfi_startproc
        movl    8(%esp), %eax
        addl    4(%esp), %eax
        ret
        .cfi_endproc
.LFE0:
        .size   sum, .-sum
        .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
        .section        .note.GNU-stack,"",@progbits

The output is very sensitive to compilation flags. This output doesn't have debugging information. Add the -g flag and you get a ton more:

        .file   "sumfunction.c"
        .text
.Ltext0:
        .p2align 4,,15
        .globl  sum
        .type   sum, @function
sum:
.LFB0:
        .file 1 "sumfunction.c"
        .loc 1 1 0
        .cfi_startproc
.LVL0:
        .loc 1 1 0
        movl    8(%esp), %eax
        .loc 1 2 0
        addl    4(%esp), %eax
        .loc 1 3 0
        ret
        .cfi_endproc
.LFE0:
        .size   sum, .-sum
.Letext0:
        .section        .debug_info,"",@progbits
.Ldebug_info0:
        .long   0x5e
        .value  0x2
        .long   .Ldebug_abbrev0
        .byte   0x4
        .uleb128 0x1
......... etc. ...........

Leave off -O2 and GCC generates worse code, involving more instructions:

        .file   "sumfunction.c"
        .text
        .globl  sum
        .type   sum, @function
sum:
.LFB0:
        .cfi_startproc
        pushl   %ebp
        .cfi_def_cfa_offset 8
        .cfi_offset 5, -8
        movl    %esp, %ebp
        .cfi_def_cfa_register 5
        movl    12(%ebp), %eax
        movl    8(%ebp), %edx
        addl    %edx, %eax
        popl    %ebp
        .cfi_def_cfa 4, 4
        .cfi_restore 5
        ret
        .cfi_endproc
.LFE0:
        .size   sum, .-sum
        .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
        .section        .note.GNU-stack,"",@progbits

Other disassemblers

Of course, there are lots of other disassemblers out there. Try ODA, the Online Disassembler.

And you could even write your own! (It's not necessarily that hard.)

Notes

Google "Knuth's arrow notation"! ↩︎