Assembly exercises

Exercises not as directly relevant to this year’s class are marked with ⚠️.

ASM-1. Assembly properties

Here’s some assembly produced by compiling a C program. (The instructions are labeled with byte offsets for your convenience.)

             .globl  f
             .align  16, 0x90
             .type   f,@function
f:
   0:        movl    $1, %r8d
   6:        jmp     b <f+0xb>

   8:        incl    %r8d

   b:        movl    %r8d, %ecx
   e:        imull   %ecx, %ecx
  11:        movl    $1, %edx

  16:        movl    %edx, %edi
  18:        imull   %edi, %edi
  1b:        movl    $1, %esi

  20:        movl    %esi, %eax
  22:        imull   %eax, %eax
  25:        addl    %edi, %eax
  27:        cmpl    %ecx, %eax
  29:        je      40 <f+0x40>
  2b:        cmpl    %edx, %esi
  2d:        leal    1(%rsi), %eax
  30:        movl    %eax, %esi
  32:        jl      20 <f+0x20>
  34:        cmpl    %r8d, %edx
  37:        leal    1(%rdx), %eax
  3a:        movl    %eax, %edx
  3c:        jl      16 <f+0x16>
  3e:        jmp     8 <f+0x8>

  40:        pushq   %rax
  41:        movl    $.L.str, %edi
  46:        xorl    %eax, %eax
  48:        callq   printf
  4d:        movl    $1, %eax
  52:        popq    %rcx
  53:        retq

             .type   .L.str,@object
.L.str:
             .asciz  "%d %d\n"
             .size   .L.str, 7

QUESTION ASM-1A. How many arguments might this function take? List all that apply.

0
1
2
3 or more

QUESTION ASM-1B. What might this function return? List all that apply.

0
1
−1
Its first argument, whatever that argument is
A square number other than 0 or 1
None of the above

QUESTION ASM-1C. Which callee-saved registers does this function save and restore? List all that apply.

%rax
%rbx
%rcx
%rdx
%rbp
%rsi
%rdi
None of the above

QUESTION ASM-1D. This function handles signed integers. If we changed the C source to use unsigned integers instead, which instructions would change? List all that apply.

movl
imull
addl
cmpl
je
jl
popq
None of the above

QUESTION ASM-1E. What might this function print? List all that apply.

0 0
1 1
3 4
4 5
6 8
None of the above

ASM-2. Disassembly

Here is some x86 assembly code.

f:
         movl a, %eax
         movl b, %edx
         andl $255, %edx
         subl %edx, %eax
         movl %eax, a
         retq

QUESTION ASM-2A. Write valid C code that could have compiled into this assembly (i.e., write a C definition of function f), given the global variable declarations “extern unsigned a, b;.” Your C code should compile without warnings.

void f() {
    a -= b & 255;
}
Or see below for more possibilities.

Some notes:

The function does not use any of its arguments (it modifies %rdx, for example, before computing with its entry-time value), so it is likely to take zero parameters.

The function both reads and modifies a, and reads b without modifying it.

Remember that an operation like OP SRC, DST in assembly means DST := DST OP SRC.

QUESTION ASM-2B. Write different valid, warning-free C code that could have compiled into that assembly. This version should contain different operators than your first version. (For extra credit, use only one operator.)

QUESTION ASM-2C. Again, write different valid, warning-free C code that could have compiled into that assembly. In this version, f should have a different type than in your first version.

ASM-3. Assembly and Data Structures

For each code sample below, indicate the most likely type of the data being accessed. (If multiple types are equally likely, just pick one.)

QUESTION ASM-3A. movzbl %al, %eax

QUESTION ASM-3B. movl -28(%rbp), %edx

QUESTION ASM-3C. movsbl -32(%rbp), %eax

QUESTION ASM-3D. movzwl -30(%rbp), %eax

For each code sample below, indicate the most likely data structure being accessed (assume that g_var is a global variable). Be as specific as possible.

QUESTION ASM-3E. movzwl 6(%rdx,%rax,8), %eax

QUESTION ASM-3F. movl (%rdx,%rax,4), %eax

QUESTION ASM-3G.

movzbl 4(%rax), %eax
movsbl %al, %eax

For the remaining questions, indicate for what values of the register contents will the jump be taken.

QUESTION ASM-3H.

xorl %eax, %eax
jge LABEL

QUESTION ASM-3I.

testb $1, %eax
jne LABEL

QUESTION ASM-3J.

cmpl %edx, %eax
jl LABEL

ASM-4. Assembly control flow

The next four questions pertain to the following four code samples.

f1:
       subq    $8, %rsp
       call    callfunc
       movl    %eax, %edx
       leal    1(%rax,%rax,2), %eax
       testb   $1, %dl
       jne     .L3
       movl    %edx, %eax
       shrl    $31, %eax
       addl    %edx, %eax
       sarl    %eax
.L3:
       addq    $8, %rsp
       ret

f2:
       pushq   %rbx
       xorl    %ebx, %ebx
.L3:
       movl    %ebx, %edi
       addl    $1, %ebx
       call    callfunc
       cmpl    $10, %ebx
       jne     .L3
       popq    %rbx
       ret

f3:
       subq    $8, %rsp
       call    callfunc
       subl    $97, %eax
       cmpb    $4, %al
       ja      .L2
       movzbl  %al, %eax
       jmp     *.L4(,%rax,8)
.L4:
       .quad   .L3
       .quad   .L9
       .quad   .L6
       .quad   .L7
       .quad   .L8
.L3:
       movl    $42, %edx
       jmp     .L5
.L6:
       movl    $4096, %edx
       jmp     .L5
.L7:
       movl    $52, %edx
       jmp     .L5
.L8:
       movl    $6440, %edx
       jmp     .L5
.L2:
       movl    $0, %edx
       jmp     .L5
.L9:
       movl    $61, %edx
.L5:
       movl    $.LC0, %esi
       movl    $1, %edi
       movl    $0, %eax
       call    __printf_chk
       addq    $8, %rsp
       ret
.LC0:
       .string "Sum = %d\n"

f4:
       subq    $40, %rsp
       movl    $1, (%rsp)
       movl    $0, 16(%rsp)
.L2:
       leaq    16(%rsp), %rsi
       movq    %rsp, %rdi
       call    callfunc
       movl    16(%rsp), %eax
       cmpl    %eax, (%rsp)
       jne     .L2
       addq    $40, %rsp
       ret

Now answer the following questions. Pick the most likely sample; you will use each sample exactly once.

QUESTION ASM-4A. Which sample contains a for loop?

QUESTION ASM-4B. Which sample contains a switch statement?

QUESTION ASM-4C. Which sample contains only an if/else construct?

QUESTION ASM-4D. Which sample contains a while loop?

ASM-5. Calling conventions: 6186

University Professor Helen Vendler is designing a poetic new processor, the 6186. Can you reverse-engineer some aspects of the 6186’s calling convention from its assembly?

Here’s a function:

int f(int* a, unsigned b) {
    extern int g(int x);
    int index = g(a[2*b + 1]);
    return a[index + b];
}

And here’s that function compiled into 6186 instructions.

f:
    sub $24, %rsp
    movq %ra, (%rsp)
    mov %rb, %rx
    shl $1, %rx
    add $1, %rx
    movl (%ra, %rx, 4), %ra
    call g
    add %rb, %rr
    movq (%rsp), %ra
    movl (%ra, %rr, 4), %ra
    mov %ra, %rr
    add $24, %rsp
    ret

6186 assembly syntax is based on x86-64 assembly, and like the x86-64, 6186 registers are 64 bits wide. However, the 6186 has a different set of registers. There are just five general-purpose registers, %ra, %rb, %rr, %rx, and %ry. (“[W]hen she tries to be deadly serious she is speaking under…constraint”.) The example also features the stack pointer register, %rsp.

Give brief explanations if unsure.

QUESTION ASM-5A. Which register holds function return values?

QUESTION ASM-5B. What is sizeof(int) on the 6186?

QUESTION ASM-5C. Which general-purpose register(s) must be callee-saved?

QUESTION ASM-5D. Which general-purpose register(s) must be caller-saved?

QUESTION ASM-5E. Which general-purpose register(s) might be callee-saved or caller-saved (you can’t tell which)?

QUESTION ASM-5F. Assuming the compiler makes function stack frames as small as possible given the calling convention, what is the alignment of stack frames?

QUESTION ASM-5G. Assuming that the 6186 supports the same addressing modes as the x86-64, write a single instruction that has the same effect on %ra as these three instructions:

shl $1, %rx
add $1, %rx
movl (%ra,%rx,4), %ra

ASM-6. Data structure assembly

Here are four assembly functions, f1 through f4.

f1:
        testl   %esi, %esi
        jle     LBB0_3
        incl    %esi
LBB0_2:
        movq    8(%rdi), %rdi
        decl    %esi
        cmpl    $1, %esi
        jg      LBB0_2
LBB0_3:
        movl    (%rdi), %eax
        retq

f2:
        pushq   %rbp
        movq    %rsp, %rbp
        movslq  %esi, %rax
        movq    (%rdi,%rax,8), %rcx
        movl    (%rcx,%rax,4), %eax
        popq    %rbp
        retq

f3:
        testl   %esi, %esi
        jle     LBB2_3
        incl    %esi
LBB2_2:
        movl    %edx, %eax
        andl    $1, %eax
        movq    8(%rdi,%rax,8), %rdi
        sarl    %edx
        decl    %esi
        cmpl    $1, %esi
        jg      LBB2_2
LBB2_3:
        movl    (%rdi), %eax
        retq

f4:
        movslq  %esi, %rax
        movl    (%rdi,%rax,4), %eax
        retq

QUESTION ASM-6A. Each function returns a value loaded from some data structure. Which function uses which data structure?

Array
Array of pointers to arrays
Linked list
Binary tree

The f4 function uses an array. f2 uses an array of pointers to arrays. f1 uses a linked list, and f3 uses a binary tree.

The best way to answer a question like this is probably not to figure out from first principles what each function corresponds to, but rather to make an educated guess and confirm it with evidence.

For example, you might guess that the shortest function corresponds to the simplest data structure, and the longest function to the most complex data structure. That would give you f4=array, f3=binary tree—which is correct! But making a guess isn’t enough; you should test the guess.

Does f4 make sense as dealing with an array? Yes! The (%rdi,%rax,4) address format is classic for arrays; it corresponds to a load of the 4-byte value (because of the destination register, %eax) stored at %rdi + %rax * 4. Since %rax is initialized from (the least-significant byte of) %rsi, this function might have been written as:
int f4(int* array, signed char idx) {
    return array[idx];
}
Does f3 make sense as dealing with a binary tree? Yes! First off, binary trees are recursive data structures, and the function contains a loop. Second, consider these instructions following LBB2_2:
andl $1, %eax
movq 8(%rdi,%rax,8), %rdi
The first instruction computes %eax := %eax & 1; after it, %rax can be only 0 or 1. The second instruction uses %rax as an index into an array. An array whose index always 0 or 1 is likely to be an array of two elements, such as the children of a binary tree node!

More confirmation is available if you want it. The movq instruction above loads 8 bytes of memory from a block starting near %rdi. Importantly, it loads that memory into %rdi itself. Given the rest of the loop, we can tell that the %rdi register appears to point to a recursive data structure where each “node” contains at least 2 pointers to child “nodes”—a binary tree.
struct treenode {
    int value;
    treenode* child[2];
};
int f3(treenode* n, int count, int path) {
    while (count > 0) {
        n = n->child[path & 1];
        path >>= 1;
        --count;
    }
    return n->value;
}
What about f1 and f2? Using similar arguments, you can see that f1 is the linked list (it involves a recursive data structure) and f2 the array of pointers to arrays (its code features two array dereferences).

QUESTION ASM-6B. The array data structure is an array of type T. Considering the code for the function that manipulates the array, which of the following types are likely possibilities for T? List all that apply.

char
int
unsigned long
unsigned long long
char*
None of the above

ASM-7. Where’s Waldo?

In the following questions, we give you C code and a portion of the assembly generated by some compiler for that code. (Sometimes we blank out a part of the assembly.) The C code contains a variable, constant, or function called waldo, and a point in the assembly is marked with asterisks ***. Your job is to find Waldo: write an assembly expression or constant that holds the value of waldo at the marked point. We’ve done the first one for you.

NON-QUESTION: Where’s Waldo?

int identity(int waldo) {
    return waldo;
}

00000000004007f6 <identity>:
  4007f6:       55                      push   %rbp
  4007f7:       48 89 e5                mov    %rsp,%rbp
  4007fa:       89 7d fc                mov    %edi,-0x4(%rbp)
  4007fd:       8b 45 fc                mov    -0x4(%rbp),%eax
           ***
  400800:       5d                      pop    %rbp
  400801:       c3                      retq

ANSWER: %edi, -0x4(%rbp), %eax, and %rax all hold the value of waldo at the marked point, so any of them is a valid answer. If the asterisks came before the first instruction, only %edi would work.

QUESTION ASM-7A: Where’s Waldo?

int f1(int a, int b, int waldo, int d) {
    if (a > b) {
        return waldo;
    } else {
        return d;
    }
}

0000000000400802 <f1>:
           ***
  400802:       55                      push   %rbp
  400803:       48 89 e5                mov    %rsp,%rbp
  400806:       89 7d fc                mov    %edi,-0x4(%rbp)
  400809:       89 75 f8                mov    %esi,-0x8(%rbp)
  40080c:       89 55 f4                mov    %edx,-0xc(%rbp)
  40080f:       89 4d f0                mov    %ecx,-0x10(%rbp)
  400812:       8b 45 fc                mov    -0x4(%rbp),%eax
  400815:       3b 45 f8                cmp    -0x8(%rbp),%eax
  400818:       7e 05                   jle    40081f <f1+0x1d>
  40081a:       8b 45 f4                mov    -0xc(%rbp),%eax
  40081d:       eb 03                   jmp    400822 <f1+0x20>
  40081f:       8b 45 f0                mov    -0x10(%rbp),%eax
  400822:       5d                      pop    %rbp
  400823:       c3                      retq

QUESTION ASM-7B: Where’s Waldo?

int int_array_get(int* a, int waldo) {
    int x = a[waldo];
    return x;
}

00000000004007d9 <int_array_get>:
INSTRUCTIONS OMITTED 
          ***
 4007dc:       8b 04 b7                mov    (%rdi,%rsi,4),%eax
 4007df:       c3                      retq

QUESTION ASM-7C: Where’s Waldo?

int matrix_get(int** matrix, int row, int col) {
    int* waldo = matrix[row];
    return waldo[col];
}

00000000004007e0 <matrix_get>:
 4007e0:       48 63 f6                movslq %esi,%rsi
 4007e3:       48 63 d2                movslq %edx,%rdx
          ***
 4007e6:       ?? ?? ?? ??             mov    ??,%rax
 4007ea:       8b 04 90                mov    (%rax,%rdx,4),%eax
 4007ed:       c3                      retq

QUESTION ASM-7D: Where’s Waldo?

int f5(int x) {
    extern int waldo(int);
    return waldo(x * 45);
}

0000000000400be0 <f5>:
          ***
 400be0:       6b ff 2d                imul   $0x2d,%edi,%edi
 400be3:       eb eb                   jmp    400bd0

QUESTION ASM-7E: Where’s Waldo?

int factorial(int waldo) {
    if (waldo < 2) {
        return 1;
    } else {
        return waldo * factorial(waldo - 1);
    }
}

0000000000400910 <factorial>:
     400910:       83 ff 01                cmp    $0x1,%edi
     400913:       b8 01 00 00 00          mov    $0x1,%eax
     400918:       7e 13                   jle    .L2 <factorial+0x1b>
     40091a:       [6 bytes of padding (a no-op instruction)]
.L1:          ***
     400920:       0f af c7                imul   %edi,%eax
     400923:       83 ef 01                sub    $0x1,%edi
     400926:       83 ff 01                cmp    $0x1,%edi
     400929:       75 f5                   jne    .L1 <factorial+0x10>
.L2: 40092b:       f3 c3                   repz retq

QUESTION ASM-7F: Where’s Waldo?

⚠️ This question currently uses 32-bit assembly.

int binary_search(const char* needle, const char** haystack, unsigned sz) {
    unsigned waldo = 0, r = sz;
    while (waldo < r) {
        unsigned m = waldo + ((r - waldo) >> 1);
        if (strcmp(needle, haystack[m]) < 0) {
            r = m;
        } else if (strcmp(needle, haystack[m]) == 0) {
            waldo = r = m;
        } else {
            waldo = m + 1;
        }
    }
    return waldo;
}

80484ab <binary_search>:
     INSTRUCTIONS OMITTED
.L1: 80484c3:       89 fe                   mov    %edi,%esi
     80484c5:       29 de                   sub    %ebx,%esi
     80484c7:       d1 ee                   shr    %esi
     80484c9:       01 de                   add    %ebx,%esi
     80484cb:       8b 44 b5 00             mov    0x0(%ebp,%esi,4),%eax
     80484cf:       89 44 24 04             mov    %eax,0x4(%esp)
     80484d3:       8b 44 24 30             mov    0x30(%esp),%eax
     80484d7:       89 04 24                mov    %eax,(%esp)
     80484da:       e8 11 fe ff ff          call   80482f0 <strcmp@plt>
     80484df:       85 c0                   test   %eax,%eax
     80484e1:       78 09                   js     .L2 <binary_search+0x41>
     80484e3:       85 c0                   test   %eax,%eax
     80484e5:       74 13                   je     80484fa <binary_search+0x4f>
               ***
     80484e7:       8d 5e 01                lea    0x1(%esi),%ebx
     80484ea:       eb 02                   jmp    .L3 <binary_search+0x43>
.L2: 80484ec:       89 f7                   mov    %esi,%edi
.L3: 80484ee:       39 df                   cmp    %ebx,%edi
     80484f0:       77 d1                   ja     .L1 <binary_search+0x18>
     INSTRUCTIONS OMITTED

In the remaining questions, you are given assembly compiled from one of the above functions by a different compiler, or at a different optimization level. Your goal is to figure out what C code corresponds to the given assembly.

QUESTION ASM-7G:

⚠️ This question currently uses 32-bit assembly.

804851d <waldo>:
804851d:       55                      push   %ebp
804851e:       89 e5                   mov    %esp,%ebp
8048520:       83 ec 18                sub    $0x18,%esp
8048523:       83 7d 08 01             cmpl   $0x1,0x8(%ebp)
8048527:       7f 07                   jg     8048530
8048529:       b8 01 00 00 00          mov    $0x1,%eax
804852e:       eb 10                   jmp    8048540
8048530:       8b 45 08                mov    0x8(%ebp),%eax
8048533:       48                      dec    %eax
8048534:       89 04 24                mov    %eax,(%esp)
8048537:       e8 e1 ff ff ff          call   804851d
804853c:       0f af 45 08             imul   0x8(%ebp),%eax
8048540:       c9                      leave
8048541:       c3                      ret

What’s Waldo? List one.

f1
f5
matrix_get
permutation_compare
factorial
binary_search

QUESTION ASM-7H:

⚠️ This question currently uses 32-bit assembly.

8048425 <waldo>:
8048425:       55                      push   %ebp
8048426:       89 e5                   mov    %esp,%ebp
8048428:       8b 45 08                mov    0x8(%ebp),%eax
804842b:       3b 45 0c                cmp    0xc(%ebp),%eax
804842e:       7e 05                   jle    8048435 <waldo+0x10>
8048430:       8b 45 10                mov    0x10(%ebp),%eax
8048433:       eb 03                   jmp    8048438 <waldo+0x13>
8048435:       8b 45 14                mov    0x14(%ebp),%eax
8048438:       5d                      pop    %ebp
8048439:       c3                      ret

What’s Waldo? List one.

f1
f5
matrix_get
permutation_compare
factorial
binary_search

QUESTION ASM-7I:

00000000004008b4 <waldo>:
 4008b4:       55                      push   %rbp
 4008b5:       48 89 e5                mov    %rsp,%rbp
 4008b8:       48 83 ec 10             sub    $0x10,%rsp
 4008bc:       89 7d fc                mov    %edi,-0x4(%rbp)
 4008bf:       8b 45 fc                mov    -0x4(%rbp),%eax
 4008c2:       6b c0 2d                imul   $0x2d,%eax,%eax
 4008c5:       89 c7                   mov    %eax,%edi
 4008c7:       e8 9e 05 00 00          callq  400e6a
 4008cc:       c9                      leaveq 
 4008cd:       c3                      retq

What’s Waldo? List one.

f1
f5
matrix_get
permutation_compare
factorial
binary_search

ASM-9. Disassembly II

The questions in this section concern a function called ensmallen, which has the following assembly.

ensmallen:
 1.          movzbl  (%rsi),  %edx
 2.          testb   %dl, %dl
 3.          movb    %dl, (%rdi)
 4.          jne     .L22
 5.          jmp     .L23
 6.  .L18:
 7.          addq    $1, %rsi
 8.  .L22:
 9.          movzbl  (%rsi), %eax
10.          cmpb    %dl, %al
11.          je      .L18
12.          addq    $1, %rdi
13.          testb   %al, %al
14.          movb    %al, (%rdi)
15.          je      .L23
16.          movl    %eax, %edx
17.          jmp     .L22
18.  .L23:
19.          retq

QUESTION ASM-9A. How many arguments is this function likely to take? Give line numbers that helped you determine an answer.

QUESTION ASM-9B. Are the argument(s) pointers? Give line numbers that helped you determine an answer.

QUESTION ASM-9C. What type(s) are the argument(s) likely to have? Give line numbers that helped you determine an answer.

QUESTION ASM-9D. Write a likely signature for the function. Use return type void.

QUESTION ASM-9E. Write an alternate likely signature for the function, different from your last answer. Again, use return type void.

QUESTION ASM-9F. Which callee-saved registers does this function use? Give line numbers that helped you determine an answer.

QUESTION ASM-9G. The function has an “input” and an “output”. Give an “input” that would cause the CPU to jump from line 5 to label .L23, and describe what is placed in the “output” for that “input”.

QUESTION ASM-9H. Give an “input” for which the corresponding “output” is not a copy of the “input”. Your answer must differ from the previous answer.

QUESTION ASM-9I. Write C code corresponding to this function. Make it as compact as you can.

void ensmallen(char* dst, const char* src) {
    while ((*dst = *src)) {
        while (*dst == *src)
            ++src;
        ++dst;
    }
}

Or, a little less compactly:

void ensmallen(char* dst, const char* src) {
    while (*src) {
        *dst = *src;
        while (*src == *dst)
            ++src;
        ++dst;
    }
    *dst = 0;
}

ASM-10. Machine programming

Intel really messed up this time. They’ve released a processor, the Fartium Core Trio, where every instruction is broken except the ones on this list.


1.	`cmpq %rdi, %rsi`
2.	`decq %rsi`
3.	`incq %rax`
4.	`je L1`
5.	`jl L2`
6.	`jmp L3`
7.	`movl (%rdi,%rax,4), %edi`
8.	`retq`
9.	`xchgq %rax, %rdi`
10.	`xorq %rax, %rax`

(In case you forgot, xchgq swaps two values—here, the values in two registers—without modifying condition codes.)

“So what if it’s buggy,” says Andy Grove; “it can still run programs.” For instance, he argues convincingly that this function:

void do_nothing() {
}

is implemented correctly by this Fartium instruction sequence:

retq

Your job is to implement more complex functions using only Fartium instructions. Your implementations must have the same semantics as the C functions, but may perform much worse than one might expect. You may leave off arguments and write instruction numbers (#1–10) or instruction names. Indicate where labels L1–L3 point (if you need them). Assume that the Fartium Core Trio uses the normal x86-64 calling convention.

QUESTION ASM-10A.

int return_zero() {
    return 0;
}

QUESTION ASM-10B.

int identity(int a) {
    return a;
}

QUESTION ASM-10C.

void infinite_loop() {
    while (1) {
        /* do nothing */
    }
}

QUESTION ASM-10D.

struct point {
    int x;
    int y;
    int z;
};

int extract_z(point* p) {
    return p->z;
}

So much for the easy ones. Now complete one out of the following parts, or more than one for extra credit.

QUESTION ASM-10E.

long add(long a, long b) {
    return a + b;
}

     xorq %rax, %rax         # %rax := 0
     xchgq %rax, %rdi        # now %rax == a and %rdi == 0
 L3: cmpq %rdi, %rsi         # compare %rsi and %rdi (which is 0)
     je L1                   # "if %rsi == 0 goto L1"
     incl %rax               # ++%rax
     decl %rsi               # --%rsi
     jmp L3
 L1: retq

The loop at L3 executes b times, incrementing %eax each time. Here’s morally equivalent C++:

long add(long a, long b) {
    while (b != 0) {
        ++a;
        --b;
    }
    return a;
}

QUESTION ASM-10F.

int array_dereference(int* a, long i) {
    return a[i];
}

     xorq %rax, %rax           # %rax := 0
 L3: xchgq %rax, %rdi
     cmpl %rdi, %rsi
     xchgq %rax, %rdi
     je L1                     # "if %rax == i goto L1"
     incq %rax                 # ++%rax
     jmp L3
 L1: movl (%rdi,%rax,4), %edi  # %edi := a[i]
     xchgq %rax, %rdi
     ret

ASM-12. Assembly and data structures

Consider the following assembly function.

func:
        xorl    %eax, %eax
        cmpb    $0, (%rdi)
        je      .L27
.L26:
        addq    $1, %rdi
        addl    $1, %eax
        cmpb    $0, (%rdi)
        jne     .L26
.L27:
        retq

QUESTION ASM-12A. How many parameters does this function appear to have?

QUESTION ASM-12B. What do you suppose the type of that parameter is?

QUESTION ASM-12C. Write C++ code that corresponds to it.

ASM-13. Assembly language

Consider the following four assembly functions.

# Code Sample 1
       movq    %rdi, %rax
       testq   %rdx, %rdx
       je      .L2
       addq    %rdi, %rdx
       movq    %rdi, %rcx
.L3:
       addq    $1, %rcx
       movb    %sil, -1(%rcx)
       cmpq    %rdx, %rcx
       jne     .L3
.L2:
       rep ret

# Code Sample 2
       movq    %rdi, %rax
       testq   %rdx, %rdx
       je      .L2
       addq    %rdi, %rdx
       movq    %rdi, %rcx
.L3:
       addq    $1, %rcx
       addq    $1, %rsi
       movzbl  -1(%rsi), %r8d
       movb    %r8b, -1(%rcx)
       cmpq    %rdx, %rcx
       jne     .L3
.L2:
       rep ret

# Code Sample 3
      movb    (%rsi), %al
      testb   %al, %al
      je      .L3
      incq    %rsi
.L2:
      movb    %al, (%rdi)
      incq    %rdi
      movb    (%rsi), %al
      incq    %rsi
      testb   %al, %al
      jne     .L2
.L3:
      movq    %rdi, %rax
      ret

# Code Sample 4
       testq   %rdx, %rdx
       je      .L3
       movq    %rdi, %rax
.L2:
       movb    %sil, (%rax)
       incq    %rax
       decq    %rdx
       jne     .L2
.L3:
       movq    %rdi, %rax
       ret

(Note: The %sil register is the lowest-order byte of register %rsi, just as %al is the lowest-order byte of %rax and %r8b is the lowest-order byte of %r8.)

QUESTION ASM-13A. Which two of the assembly functions perform the exact same task?

QUESTION ASM-13B. What is that task? You can describe it briefly, or give the name of the corresponding C library function.

QUESTION ASM-13C. Explain how the other two functions differ from each other.

ASM-14. Golden Baron

A very rich person has designed a new x86-64-based computer, the Golden Baron Supercomputer 9000, and is paying you handsomely to write a C compiler for it. There’s just one problem. This person, like many very rich people, is dumb, and on their computer, odd-numbered memory addresses don’t work for data. When data is loaded into a general-purpose register from an odd-numbered address, the value read is zero. For example, consider the following instructions:

movl $0x01020304, a(%rip)
movl a(%rip), %eax

(where the address of a is even). Executed on true x86-64, %rax will hold the value 0x01020304; on Golden Baron, %rax will hold 0x00020004.

But it is still possible to write a correct C compiler for this ungodly hardware—you just have to work around the bad memory with code. You plan to use two bytes of Golden Baron memory for every one byte of normal x86-64 memory. For instance, an array int a[2] = {1, 0x0a0b0c0d}; would be stored in 16 bytes of memory, like so:

01 00 00 00 00 00 00 00 0d 00 0c 00 0b 00 0a 00

Pointer arithmetic, and moving multi-byte values to and from registers, must account for the zero bytes that alternate with meaningful bytes. So to read the correct value for a[2], the compiler must arrange to read the bytes at addresses A+8, A+10, A+12, and A+14, where A is the address of the first byte in a.

QUESTION ASM-14A. What should printf("%zu\n", sizeof(char)) print on Golden Baron?

QUESTION ASM-14B. This function

int f(signed char* c, size_t i) {
    return c[i];
}

can compile to two instructions on x86-64, including retq. It can also compile to two instructions on Golden Baron. (We’re assuming that memory used for Golden Baron instructions works normally.) What are those instructions?

QUESTION ASM-14C. This function

int g(int* a, size_t i) {
    return a[i];
}

can compile to two instructions on x86-64, but Golden Baron requires more work. Write the Golden Baron translation of this function in x86-64 assembly language. For partial credit, write C code that, executed on x86-64, would return the correct value from a Golden Baron-formatted array.

QUESTION ASM-14D. The Golden Baron’s x86-64 processor actually supports a secret instruction, swizzleq SRC, REG, which rearranges the nybbles (the hexadecimal digits—the aligned 4-bit slices) of the destination register REG based on the source argument SRC. Here’s some examples. Assuming that %rax holds the value 0x0123456789ABCDEF, the following swizzleq instructions leave the indicated results in %rax:

swizzleq $0, %rax: %rax gets 0xFFFF'FFFF'FFFF'FFFF.

The contents of nybble 0 [bits 0-3, inclusive], are repeated into every nybble.
swizzleq $0xFEDCBA9876543210, %rax: %rax gets 0x0123'4567'89AB'CDEF.

Each nybble is mapped to its current value: nybble 0 [bits 0-3] is placed in nybble 0 [bits 0-3], nybble 1 in nybble 1, and so forth.
swizzleq $0x0123456701234567, %rax: %rax gets 0xFEDC'BA98'FEDC'BA98.

Nybble 0 [bits 0-3] is placed in nybbles 7 and 15 [bits 28-31 and 60-63]; nybble 1 [bits 4-7] is placed in nybbles 6 and 14 [bits 24-27 and 56-59]; etc.
swizzleq $0xEFCDAB8967452301, %rax: %rax gets 0x1032'5476'98BA'DCFE.

The nybbles of each byte are exchanged.

Use swizzleq to shorten your answer for Part C.

ASM-15. Instruction behavior

QUESTION ASM-15A. Name three different x86-64 instructions that always modify the stack pointer, no matter their arguments (instruction names only; suffixes don’t count, so movl and movq are the same instruction name).

QUESTION ASM-15B. Name three different x86-64 instructions that sometimes modify the stack pointer, depending on their arguments.

QUESTION ASM-15C. Name three different x86-64 instructions that never modify the stack pointer, no matter their arguments.

QUESTION ASM-15D. List three different instructions, including arguments, that if placed immediately before a retq instruction that ends a function, will never change the function’s behavior. The instructions should have different names. No funny business: assume the function was compiled from valid C, that relative jumps are fixed up, and that, for example, it doesn’t access its own code.

ASM-16. Calling convention

The following questions concern valid C++ functions compiled using the normal x86-64 calling convention. True or false?

QUESTION ASM-16A. If the function’s instructions do not save and restore any registers, then the C++ function did not call any other function.

QUESTION ASM-16B. If the function’s instructions do not change the stack pointer, then the function’s instructions do not contain a call instruction.

QUESTION ASM-16C. If the function’s instructions do not change the stack pointer, then the C++ function did not call any other function. Explain your answer briefly.

QUESTION ASM-16D. If the function’s instructions do not modify the %rax register, then the C++ function must return void.

QUESTION ASM-16E. If the function’s instructions store a local variable on the stack, then that variable’s address will be less than the function’s initial stack pointer.

ASM-17. Assembly

Here are three x86-64 assembly functions that were compiled from C.

f1:
    xorl    %eax, %eax
L2:
    movsbq  (%rdi), %rdx
    subq    $48, %rdx
    cmpq    $9, %rdx
    ja      L5
    imulq   $10, %rax, %rax
    incq    %rdi
    addq    %rdx, %rax
    jmp     L2
L5:
    ret

f2:
    movq    %rdi, %rax
L7:
    cmpb    $0, (%rax)
    je      L9
    incq    %rax
    jmp     L7
L9:
    cmpq    %rax, %rdi
    jnb     L11
    decq    %rax
    movb    (%rdi), %cl
    incq    %rdi
    movb    (%rax), %dl
    movb    %cl, (%rax)
    movb    %dl, -1(%rdi)
    jmp     L9
L11:
    ret

f3:
    xorl    %eax, %eax
L13:
    cmpq    %rax, %rdx
    je      L15
    movb    (%rdi,%rax), %cl
    movb    (%rsi,%rax), %r8b
    movb    %r8b, (%rdi,%rax)
    movb    %cl, (%rsi,%rax)
    incq    %rax
    jmp     L13
L15:
    ret

(Note: imulq $10, %rax, %rax means %rax *= 10.)

QUESTION ASM-17A. How many arguments does each function most likely take?

QUESTION ASM-17B. Which functions modify at least one caller-saved register? List all that apply or write “none”.

QUESTION ASM-17C. Which functions never modify memory? List all that apply or write “none”.

QUESTION ASM-17D. Write a signature for each function, giving a likely type for each argument and a likely return type. (You may give a void return type if you think the function doesn’t return a useful value.)

_______ f1(___________________________________)


_______ f2(___________________________________)


_______ f3(___________________________________)

QUESTION ASM-17E. One of these functions swaps the contents of two memory regions. Which one?

QUESTION ASM-17F. What is the value of %rax in f2 the first time L9 is reached? Write a C expression in terms of f2’s argument or arguments; you may use standard library functions.

QUESTION ASM-17G. Give arguments for each function that would result in the function returning without writing to memory or causing a fault.

f1(_____________________)


f2(_____________________)


f3(_____________________)

QUESTION ASM-17H. Complete this function so that it returns the number 6161. For full credit, use only calls to f1, f2, and f3. For partial credit, do something simpler.

int magic() {
    char s1[] = "Shaka kaSenzangakhona became King of the Zulu Kingdom in 1816";
    char s2[] = "Dingane kaSenzangakhona succeeded Shaka in 1828";
    char s3[] = "1661 in the Gregorian calendar is 3994 in the Korean calendar";




}

ASM-18. Kernel assembly

Helen found the following assembly code while hacking a WeensyOS-like kernel. She would like to figure out what the function is doing.

__Z3vmlPmm:
        pushq   %rbp
        movq    %rsp, %rbp
        movq    %rsi, %rax

        shrq    $36, %rax
        andl    $4088, %eax
        movq    (%rdi,%rax), %rax
        testb   $1, %al
        je      error

        andq    $-4096, %rax
        movq    %rsi, %rcx
        shrq    $27, %rcx
        andl    $4088, %ecx
        movq    (%rax,%rcx), %rax
        testb   $1, %al
        je      error

        andq    $-4096, %rax
        movq    %rsi, %rcx
        shrq    $18, %rcx
        andl    $4088, %ecx
        movq    (%rax,%rcx), %rax
        testb   $1, %al
        je      error

        andq    $-4096, %rax
        movq    %rsi, %rcx
        shrq    $9, %rcx
        andl    $4088, %ecx
        movq    (%rax,%rcx), %rax
        testb   $1, %al
        je      error

        andq    $-4096, %rax
        andl    $4095, %esi
        orq     %rax, %rsi
done:
        movq    %rsi, %rax
        popq    %rbp
        retq

error:
        xorl    %esi, %esi
        jmp     done

Hexadecimal reference: 4095 == 0xFFF; 4096 == 0x1000; -4096 == 0xFFFFFFFFFFFFF000; 4088 == 0xFF8.

QUESTION ASM-18A. How many arguments does the function take, assuming that there are no unused arguments?

QUESTION ASM-18B. What does this function return if it encounters a situation that leads to “error”?

Helen realizes that the compiler performed some optimizations that made the code somewhat obscure. For example, these instructions contain constants not present in the C++ source:

        shrq    $36, %rax
        andl    $4088, %eax
        movq    (%rdi,%rax), %rax

QUESTION ASM-18C. Which of the following snippets is equivalent to the snippet above? List all that apply.

1.      shrq    $33, %rax
        andl    $511, %eax
        movq    (%rdi,%rax,8), %rax

2.      shrq    $39, %rax
        andl    $511, %eax
        movq    (%rdi,%rax,8), %rax

3.      shrq    $39, %rax
        andl    $511, %eax
        movq    (%rdi,%rax,4), %rax

4.      shrq    $33, %rax
        andl    $511, %eax
        movq    (%rdi,%rax,4), %rax

QUESTION ASM-18D. If rdi and rax were C++ variables corresponding to %rdi and %rax, what would be their likely types?

QUESTION ASM-18E. Please write one line of C++ code, using variables rdi and rax, that could compile into the 3 lines of assembly code we are currently investigating.

QUESTION ASM-18F. How many bits of this function’s second argument are used?

QUESTION ASM-18G. Helen notes that the code contains four copies of a pattern of instructions with different constants. After connecting these constants with CS 61 concepts, she’s able to tell what the function does without deciphering every line of assembly.

Use a single concise sentence to describe the high-level functionality of this function.

ASM-19. Calling convention and optimizations

QUESTION ASM-19A. Some aspects of the x86-64 calling convention are conventional, meaning that different operating systems could easily make different choices. Others are more fundamental in that they are built in to the x86-64 instruction set. Which of the following aspects of the calling convention are truly conventional (different operating systems could easily make different choices)? List all that apply.

The stack grows down
%rsp is a callee-saved register
%rbx is a callee-saved register
%rbp is a callee-saved register
The first argument register is %rdi
The second argument register is %rsi
The return register is %rax

QUESTION ASM-19B. We saw the compiler sometimes place loop variables in callee-saved registers. For example, here:

extern unsigned g(unsigned i);

unsigned f(unsigned n) {
    unsigned sum = 0;
    for (unsigned i = 0; i != n; ++i) {
        sum += g(i);
    }
    return sum;
}

the i, n, and sum objects were stored in callee-saved registers %rbx, %r12, and %rbp, respectively. Describe a situation when this choice has a meaningful chance of boosting the overall performance of f.

QUESTION ASM-19C. Write two substantively different C++ functions that could compile to the same assembly on x86-64. (“Substantively different” means you can’t just change the function’s name and types.)

Some examples:

int f1a() {
    return 0;
}
int f1b() {
    return 1 - 1;
}

int f2a(int a, int b) {
    if (a > 0) {
        return f2a(a - 1, b + a);
    } else {
        return b;
    }
}
int f2b(int a, int b) {
    while (a > 0) {
        b += a;
        a -= 1;
    }
    return b;
}

QUESTION ASM-19D. Which of the following properties of a function could you reliably infer from compiler-generated assembly? List all that apply; assume the function’s name is not provided, and that the function does not call any other functions. Explain briefly if unsure.

The types of its arguments
Whether it can dereference a pointer argument
Whether it can use at least one of its arguments
Whether it has unused arguments

ASM-20. Bignums

x86-64 processors have built-in support for integers of up to 64 bits. For larger integers, we must turn to software. This simple “bignum” type represents a 128-bit unsigned integer.

struct bignum {
    unsigned long x, y;
};

void hex_print_bignum(bignum n) {
    if (n.y == 0) {
        printf("0x%lx", n.x);
    } else {
        printf("0x%lx%016lx", n.y, n.x);
    }
}

If bignum bn represents 2⁶⁵, then hex_print_bignum(bn) will print 0x20000000000000000 (that’s a 2 followed by 16 zeros).

QUESTION ASM-20A. Does bignum use a big-endian representation, a little-endian representation, or a mix?

QUESTION ASM-20B. Complete the following function, which returns true iff a > b.

bool bignum_greater(bignum a, bignum b) {


}

QUESTION ASM-20C. Complete the following function, which returns the sum a + b. (You may use the fact that if x and y are unsigned integers using computer arithmetic, then the addition z = x + y overflows iff z < x || z < y.)

bignum bignum_add(bignum a, bignum b) {


}

bignum bignum_add(bignum a, bignum b) {
    bignum ret = { a.x + b.x, a.y + b.y };
    if (ret.x < a.x || ret.x < b.x) {
        ret.y += 1;
    }
    return ret;
}
Note that the question had a bug in it in some versions, where the overflow condition was given as “z < x && z < y”. Sorry! We didn’t take off points for that error.

QUESTION ASM-20D. Complete the same function in x86-64 assembly. Thanks to the calling convention, this function behaves as if the signature were as follows:

bignum* bignum_add(bignum* ret, unsigned long a_x, unsigned long a_y,
                   unsigned long b_x, unsigned long b_y);
// Stores result in `*ret` and returns `ret`.

You may translate your C++ code directly, or you may use the x86-64 CF carry flag and associated instructions to simplify the translation.

bignum_add:

bignum_add:
    movq    %rsi, %eax
    addq    %rcx, %eax
    movq    %eax, (%rdi)
    jnc     .L1
    xorq    %eax, %eax
    addq    $1, %eax
    jmp     .L2
.L1:
    xorq    %eax, %eax
.L2:
    addq    %rdx, %eax
    addq    %r8, %eax
    movq    %eax, $8(%rdi)
    movq    %rdi, %eax
    retq

ASM-21. Disassembly

Consider the following function in assembly:

_Z3gcdjj:
        movl    %esi, %eax
        cmpl    %esi, %edi
        jne     .L3
        jmp     .L2
.L8:
        subl    %eax, %edi
        cmpl    %edi, %eax
        je      .L2
.L3:
        cmpl    %edi, %eax
        jb      .L8
        subl    %edi, %eax
        cmpl    %edi, %eax
        jne     .L3
.L2:
        ret

QUESTION ASM-21A. True or False:

The function likely takes 2 arguments.
The function accesses 4-byte integer values in primary memory.
The function likely works with signed integers.
There is likely a loop in the function.
The register %eax likely contains a loop index (a value updated by a fixed amount on each loop iteration).

QUESTION ASM-21B. Give a input to this function such that it never returns.

QUESTION ASM-21C. How would you modify the function, in assembly, such that it eventually returns for all inputs?

QUESTION ASM-21D. Give one set of arguments for which the function terminates, and the corresponding return value.

ASM-22. Buffer overflows

QUESTION ASM-22A. Give argument(s) to function _Z4buf1PKc, including types and values, that would cause a buffer overflow that overwrote at least part of the function’s return address; or explain briefly why this is impossible.

_Z4buf1PKc:
        subq    $40, %rsp
        movq    %rdi, %rsi
        movq    %rsp, %rdi
        call    strcpy@PLT
        movq    %rsp, %rax
        addq    $40, %rsp
        ret

QUESTION ASM-22B. Which object in function _Z4buf2m is referenced using %rip-relative addressing?

_Z4buf2m:
        subq    $24, %rsp
        movq    %rdi, %rdx
        leaq    16(%rsp), %rdi
        leaq    .LC0(%rip), %rsi
        movl    $0, %eax
        call    sprintf@PLT
        movl    $0, %eax
.L2:
        cmpb    $0, 16(%rsp,%rax)
        je      .L3
        incl    %eax
        jmp     .L2
.L3:
        addq    $24, %rsp
        ret
.LC0:
        .string "%zu"

QUESTION ASM-22C. Give argument(s) to function _Z4buf2m, including types and values, that would cause a buffer overflow that overwrote at least part of the function’s return address; or explain briefly why this is impossible.

QUESTION ASM-22D. It is possible to change two lines of assembly in function _Z4buf2m so that the revised function performs the same task, but never causes buffer overflow. Describe how.

ASM-23. Get–set interfaces

The following questions concern get–set interfaces. These are map-like data structures which support two operations:

void set(container, key, value)
Change container’s value for key to value.

value_type get(container, key)
Returns container’s current value for key. This is the most recent value set by set(container, key, value) or, if there is no previous set, the natural default for the value type (0, nullptr, empty string—whatever’s appropriate).

Here’s the assembly definition of xxx1, which is either get or set for a data structure.

_Z4xxx1...:
    movq    (%rdi,%rsi,8), %rax
    ret

QUESTION ASM-23A. Is this get or set?

QUESTION ASM-23B. What kind of data structure is this?

QUESTION ASM-23C. What can you tell about the type of key arguments for this data structure?

QUESTION ASM-23D. What can you tell about the type of value arguments?

QUESTION ASM-23E. Write a C++ version of the other operation for the same data structure, including a function signature and any type definitions required.

Here’s the assembly definition of get2, which is get for a different data structure.

_Z4get2...:
.L28:
    testq   %rdi, %rdi
    je  .L30
    movq    (%rdi), %rax
    cmpq    %rsi, %rax
    je  .L35
    cmpq    %rax, %rsi
    jnb .L27
    movq    16(%rdi), %rdi
    jmp .L28
.L27:
    movq    24(%rdi), %rdi
    jmp .L28
.L30:
    xorl    %eax, %eax
    ret
.L35:
    movq    8(%rdi), %rax
    ret

QUESTION ASM-23F. What value is returned if the key parameter is not present in the data structure?

QUESTION ASM-23G. Write a struct definition for the data structure being accessed by this code.

The first argument to get2 is the container type, so let’s consider the properties of the memory stored starting at %rdi.

The eight-byte value at (%rdi) is compared against %rsi, the second argument, which is a key. We can therefore assume the first member of the data structure is an eight-byte key.

The eight-byte value at 8(%rdi) is returned in one return path. Since get functions return values, we can assume the second member of the data structure is an eight-byte value.

The eight-byte value at 16(%rdi) is loaded into %rdi in one remaining path, while the eight-byte value at 24(%rdi) is loaded into %rdi in the other. In each case, the code loops back to .L28, the start of the function, which must continue searching for the key. The third and fourth members of the data structure must be pointers to additional nodes in this data structure, which must be recursive.

This gives us:
struct container_node {
    unsigned long key;      // or other eight-byte type
    unsigned long value;
    container_node* left;
    container_node* right;
};

QUESTION ASM-23H. What kind of data structure is being accessed by this code?