NOTOC
Lecture 2 thoughts
Very large numbers
The largest unsigned number representable directly on a machine with 231 8-bit bytes is (28)231−1, which is the same as 28×231−1 = 2234−1. An enormous number, far larger than the current estimate of the number of subatomic particles in the observable universe (which is roughly 2266, or less than 229).
Of course far larger numbers can be represented indirectly. Here's one, in four Unicode characters (8 bytes): 2↑↑6 1
Code as data
What does 8b 44 24 08 03 44 24 04 c3
(the bytes generated by the
compiler for int sum(int a, int b) { return a + b; }
) mean?
We can find this out in many ways. Here are a few.
objdump
The objdump utility is a Swiss army knife for binary examination. The program has way too many options (as you can tell when I give the wrong option live). But here are a couple useful ones.
-s prints the full contents of an executable or object file.
An executable, also called a binary, is a program: the final output of the compiler. The operating system can run an executable. A object file is intermediate output of the compiler. You can't run an object file on its own; it must first be linked with other object files to form an executable. More on all this soon.
Here's the output of objdump -s sumfunction.o
:
kohler@ubuntu:~/cs61-lectures/l02$ objdump -s sumfunction.o
sumfunction.o: file format elf32-i386
Contents of section .text:
0000 8b442408 03442404 c3 .D$..D$..
Contents of section .debug_info:
0000 5e000000 02000000 00000401 00000000 ^...............
0010 010c0000 001a0000 00000000 00090000 ................
0020 00000000 00020173 756d0001 01015a00 .......sum....Z.
0030 00000000 00000900 00000274 045a0000 ...........t.Z..
0040 00036100 01015a00 00000291 00036200 ..a...Z.......b.
0050 01015a00 00000291 04000404 05696e74 ..Z..........int
0060 0000 ..
Contents of section .debug_abbrev:
0000 01110125 0e130b03 0e1b0e11 01120110 ...%............
0010 06000002 2e013f0c 03083a0b 3b0b270c ......?...:.;.'.
0020 49131101 1201400a 01130000 03050003 I.....@.........
0030 083a0b3b 0b491302 0a000004 24000b0b .:.;.I......$...
0040 3e0b0308 000000 >......
Contents of section .debug_aranges:
0000 1c000000 02000000 00000400 00000000 ................
0010 00000000 09000000 00000000 00000000 ................
Contents of section .debug_line:
0000 3a000000 02002400 00000101 fb0e0d00 :.....$.........
0010 01010101 00000001 00000100 73756d66 ............sumf
0020 756e6374 696f6e2e 63000000 00000005 unction.c.......
0030 02000000 0001014b 4b020100 0101 .......KK.....
Contents of section .debug_str:
0000 474e5520 4320342e 362e3300 73756d66 GNU C 4.6.3.sumf
0010 756e6374 696f6e2e 63002f68 6f6d652f unction.c./home/
0020 6b6f686c 65722f63 7336312d 6c656374 kohler/cs61-lect
0030 75726573 2f6c3032 00 ures/l02.
Contents of section .comment:
0000 00474343 3a202855 62756e74 752f4c69 .GCC: (Ubuntu/Li
0010 6e61726f 20342e36 2e332d31 7562756e naro 4.6.3-1ubun
0020 74753529 20342e36 2e3300 tu5) 4.6.3.
Contents of section .eh_frame:
0000 14000000 00000000 017a5200 017c0801 .........zR..|..
0010 1b0c0404 88010000 10000000 1c000000 ................
0020 00000000 09000000 00000000 ............
The first column gives hexadecimal byte offsets. The next four columns
show 16 bytes' worth of data in hexadecimal. The final, 16-character
wide column shows the data as text, with non-ASCII characters printed as
'.'
.
Check out all the debugging information. We can also ask objdump to
generate output only for the code, which is stored in the "section"
named .text
.
kohler@ubuntu:~/cs61-lectures/l02$ objdump -s -j .text sumfunction.o
sumfunction.o: file format elf32-i386
Contents of section .text:
0000 8b442408 03442404 c3 .D$..D$..
-d: This flag disassembles the code, printing assembly that could
have generated that code. Here's the output of
objdump -d sumfunction.o
:
kohler@ubuntu:~/cs61-lectures/l02$ objdump -d sumfunction.o
sumfunction.o: file format elf32-i386
Disassembly of section .text:
00000000 <sum>:
0: 8b 44 24 08 mov 0x8(%esp),%eax
4: 03 44 24 04 add 0x4(%esp),%eax
8: c3 ret
-S: This flag is like -d, but also intersperses source lines
with the assembly that corresponds. This depends on the source code
having been compiled with the -g debugging flag. It works best on
unoptimized code, and the output can get confusing, but is still a very
useful tool for your toolbox. Here's the output of
objdump -S sumfunction.o
:
kohler@ubuntu:~/cs61-lectures/l02$ objdump -S sumfunction.o
sumfunction.o: file format elf32-i386
Disassembly of section .text:
00000000 <sum>:
int sum(int a, int b) {
0: 8b 44 24 08 mov 0x8(%esp),%eax
return a + b;
4: 03 44 24 04 add 0x4(%esp),%eax
}
8: c3 ret
-t: This flag doesn't print data, it just prints the symbol
table contained in an object file or executable. The symbol table says
what functions and global variables were defined and gives their
addresses. Here's the output of objdump -t sumfunction.o
:
kohler@ubuntu:~/cs61-lectures/l02$ objdump -t sumfunction.o
sumfunction.o: file format elf32-i386
SYMBOL TABLE:
00000000 l df *ABS* 00000000 sumfunction.c
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .debug_info 00000000 .debug_info
00000000 l d .debug_abbrev 00000000 .debug_abbrev
00000000 l d .debug_aranges 00000000 .debug_aranges
00000000 l d .debug_line 00000000 .debug_line
00000000 l d .debug_str 00000000 .debug_str
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
00000000 l d .eh_frame 00000000 .eh_frame
00000000 l d .comment 00000000 .comment
00000000 g F .text 00000009 sum
The columns are address, size, type, flags, section, size, and name. For
example, sum
is defined in the .text
section and has size 00000009
bytes. But wait a second, sum
has an address of 0!!?.... This is
because sumfunction.o is an object file; the final address is not
assigned until the linker generates the executable. Like so:
kohler@ubuntu:~/cs61-lectures/l02$ objdump -t mysum
mysum: file format elf32-i386
SYMBOL TABLE:
08048154 l d .interp 00000000 .interp
08048168 l d .note.ABI-tag 00000000 .note.ABI-tag
08048188 l d .note.gnu.build-id 00000000 .note.gnu.build-id
080481ac l d .gnu.hash 00000000 .gnu.hash
...blah blah blah...
kohler@ubuntu:~/cs61-lectures/l02$ objdump -t mysum | grep sum
mysum: file format elf32-i386
00000000 l df *ABS* 00000000 mysum.c
00000000 l df *ABS* 00000000 sumfunction.c
080484e0 g F .text 00000009 sum
There's the true address, 0x080484e0.
nm: This program produces output like that of objdump -t
, but more
compact and easier to follow. Here's the output of nm mysum
:
kohler@ubuntu:~/cs61-lectures/l02$ nm mysum
08049f28 d _DYNAMIC
08049ff4 d _GLOBAL_OFFSET_TABLE_
080485bc R _IO_stdin_used
w _Jv_RegisterClasses
08049f18 d __CTOR_END__
08049f14 d __CTOR_LIST__
08049f20 D __DTOR_END__
08049f1c d __DTOR_LIST__
080486f0 r __FRAME_END__
08049f24 d __JCR_END__
08049f24 d __JCR_LIST__
0804a018 A __bss_start
0804a010 D __data_start
08048570 t __do_global_ctors_aux
08048450 t __do_global_dtors_aux
0804a014 D __dso_handle
w __gmon_start__
08048562 T __i686.get_pc_thunk.bx
08049f14 d __init_array_end
08049f14 d __init_array_start
08048560 T __libc_csu_fini
080484f0 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
U __printf_chk@@GLIBC_2.3.4
0804a018 A _edata
0804a020 A _end
0804859c T _fini
080485b8 R _fp_hw
080482f8 T _init
08048420 T _start
0804a018 b completed.6159
0804a010 W data_start
0804a01c b dtor_idx.6161
080484b0 t frame_dummy
08048380 T main
U strtol@@GLIBC_2.0
080484e0 T sum
Q. WHAT ARE ALL THOSE SYMBOLS?!
A. You should not necessarily worry about tools like nm
producing
output that you don't fully understand. Skim the output, looking for
what you need. For instance, here, you can see that this program
probably calls strtol
-- there's a reference to strtol@@GLIBC_2.0
.
As indeed it does. We'll understand some more of these symbols as time
goes on. Judicious Googling (or Binging) will enlighten you too.
gcc -S
The compiler normally generates object files, but if we give it the
-S flag, it will generate textual assembly output instead. Here's
the output of gcc -S -O2 sumfunction.c
(which GCC places by default in
sumfunction.s
):
.file "sumfunction.c"
.text
.p2align 4,,15
.globl sum
.type sum, @function
sum:
.LFB0:
.cfi_startproc
movl 8(%esp), %eax
addl 4(%esp), %eax
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
The output is very sensitive to compilation flags. This output doesn't
have debugging information. Add the -g
flag and you get a ton more:
.file "sumfunction.c"
.text
.Ltext0:
.p2align 4,,15
.globl sum
.type sum, @function
sum:
.LFB0:
.file 1 "sumfunction.c"
.loc 1 1 0
.cfi_startproc
.LVL0:
.loc 1 1 0
movl 8(%esp), %eax
.loc 1 2 0
addl 4(%esp), %eax
.loc 1 3 0
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
.Letext0:
.section .debug_info,"",@progbits
.Ldebug_info0:
.long 0x5e
.value 0x2
.long .Ldebug_abbrev0
.byte 0x4
.uleb128 0x1
......... etc. ...........
Leave off -O2 and GCC generates worse code, involving more instructions:
.file "sumfunction.c"
.text
.globl sum
.type sum, @function
sum:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
movl 12(%ebp), %eax
movl 8(%ebp), %edx
addl %edx, %eax
popl %ebp
.cfi_def_cfa 4, 4
.cfi_restore 5
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
Other disassemblers
Of course, there are lots of other disassemblers out there. Try ODA, the Online Disassembler.
And you could even write your own! (It's not necessarily that hard.)
Notes
-
Google "Knuth's arrow notation"! ↩︎