Assembly Language Basics

Learning Objectives

By the end of this reading, you will be able to:

  • Understand the relationship between assembly, machine code, and high-level languages
  • Read and write basic assembly language programs
  • Understand registers, instructions, and addressing modes
  • Implement control flow (conditionals, loops) in assembly
  • Understand the stack and function calling conventions
  • Translate simple C code to assembly
  • Debug and analyze compiled code
  • Appreciate the role of compilers and optimization

Introduction

Assembly language is a low-level programming language with a strong correspondence between instructions and machine code. Each assembly instruction typically corresponds to one machine code instruction.

Why learn assembly?

  • Understand what the CPU actually does
  • Debug optimized or compiled code
  • Write performance-critical code
  • Understand security vulnerabilities
  • Bridge the gap between hardware and software
  • Appreciate what compilers do for you

The Language Hierarchy

┌────────────────────────────────────┐
│   High-Level Language (C, Python)  │ ← Abstraction, portability
└──────────────┬─────────────────────┘
               │ Compiler/Interpreter
               ▼
┌────────────────────────────────────┐
│   Assembly Language (asm)          │ ← Human-readable mnemonics
└──────────────┬─────────────────────┘
               │ Assembler
               ▼
┌────────────────────────────────────┐
│   Machine Code (binary)            │ ← CPU executes this
└────────────────────────────────────┘

Example:
C:        a = b + c;
Assembly: add  rax, rbx, rcx
Machine:  0100 1000 0000 0001 0001 1010 0001 1100

Choosing an Assembly Language

Different CPUs have different instruction sets. Common architectures:

x86-64 (Intel/AMD)

  • CISC architecture
  • Variable-length instructions
  • Used in most desktop/laptop computers
  • Complex but powerful

ARM

  • RISC architecture
  • Fixed-length instructions (usually)
  • Used in mobile devices, Apple Silicon
  • Simple and efficient

MIPS

  • RISC architecture
  • Educational favorite
  • Clean, simple design

RISC-V

  • Modern open-source RISC
  • Growing in popularity

This reading uses x86-64 as the primary example (with RISC concepts explained for comparison).

x86-64 Registers

Registers are the CPU's fastest storage locations.

General-Purpose Registers (64-bit)

┌─────────────────────────────────────────────────────┐
│                       RAX                           │ 64-bit
├───────────────────────────┬─────────────────────────┤
│                           │         EAX             │ 32-bit
│                           ├─────────────┬───────────┤
│                           │             │    AX     │ 16-bit
│                           │             ├─────┬─────┤
│                           │             │ AH  │ AL  │ 8-bit
└───────────────────────────┴─────────────┴─────┴─────┘

Register Names (64-bit / 32-bit / 16-bit / 8-bit):

RAX / EAX / AX / AL    - Accumulator (arithmetic operations)
RBX / EBX / BX / BL    - Base (memory addressing)
RCX / ECX / CX / CL    - Counter (loop counters)
RDX / EDX / DX / DL    - Data (I/O operations, arithmetic)
RSI / ESI / SI / SIL   - Source Index (string/memory operations)
RDI / EDI / DI / DIL   - Destination Index (string/memory operations)
RBP / EBP / BP / BPL   - Base Pointer (stack frame base)
RSP / ESP / SP / SPL   - Stack Pointer (top of stack)
R8-R15                 - Additional general-purpose (64-bit mode)

Special Registers

RIP   - Instruction Pointer (program counter)
RFLAGS- Status flags

Flags Register:
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│  │  │  │OF│DF│ IF│TF│SF│ZF│  │CF│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
         │  │  │  │  │  │     │
         │  │  │  │  │  │     └─ Carry Flag
         │  │  │  │  │  └─────── Zero Flag
         │  │  │  │  └────────── Sign Flag
         │  │  │  └───────────── Trap Flag
         │  │  └──────────────── Interrupt Enable
         │  └─────────────────── Direction Flag
         └────────────────────── Overflow Flag

Basic Assembly Syntax

Intel vs AT&T Syntax

Two common syntaxes for x86 assembly:

Intel Syntax (we'll use this):

mov rax, 5          ; Destination, Source
add rax, rbx        ; rax = rax + rbx

AT&T Syntax:

movq $5, %rax       ; Source, Destination
addq %rbx, %rax     ; rax = rax + rbx

Instruction Format

label: mnemonic operands        ; comment

example:
    mov rax, 10                 ; Load 10 into rax
loop_start:
    add rax, rbx                ; Add rbx to rax

Data Types and Sizes

BYTE   - 8 bits   (1 byte)
WORD   - 16 bits  (2 bytes)
DWORD  - 32 bits  (4 bytes)
QWORD  - 64 bits  (8 bytes)

Data Movement Instructions

MOV - Move Data

; Immediate to register
mov rax, 42                     ; rax = 42

; Register to register
mov rbx, rax                    ; rbx = rax

; Memory to register
mov rax, [rbx]                  ; rax = memory[rbx]

; Register to memory
mov [rbx], rax                  ; memory[rbx] = rax

; Immediate to memory
mov QWORD PTR [rbx], 100        ; memory[rbx] = 100

Note: Cannot move directly from memory to memory. Use a register as intermediate.

PUSH/POP - Stack Operations

push rax                        ; Push rax onto stack
                                ; rsp = rsp - 8
                                ; memory[rsp] = rax

pop rbx                         ; Pop from stack into rbx
                                ; rbx = memory[rsp]
                                ; rsp = rsp + 8

LEA - Load Effective Address

lea rax, [rbx + rcx*4 + 8]     ; rax = rbx + rcx*4 + 8
                                ; (computes address without accessing memory)

Arithmetic Instructions

Basic Arithmetic

add rax, rbx                    ; rax = rax + rbx
sub rax, rbx                    ; rax = rax - rbx
inc rax                         ; rax = rax + 1
dec rax                         ; rax = rax - 1
neg rax                         ; rax = -rax

; Immediate values
add rax, 10                     ; rax = rax + 10
sub rbx, 5                      ; rbx = rbx - 5

Multiplication and Division

; Unsigned multiplication
mul rbx                         ; rdx:rax = rax * rbx
                                ; (128-bit result in rdx:rax)

; Signed multiplication
imul rbx                        ; rdx:rax = rax * rbx (signed)
imul rax, rbx                   ; rax = rax * rbx (result in rax only)
imul rax, rbx, 10               ; rax = rbx * 10

; Unsigned division
div rbx                         ; rax = rdx:rax / rbx (quotient)
                                ; rdx = rdx:rax % rbx (remainder)

; Signed division
idiv rbx                        ; rax = rdx:rax / rbx (quotient)
                                ; rdx = rdx:rax % rbx (remainder)

Division Setup:

; To divide rax by rbx:
xor rdx, rdx                    ; Clear rdx (high bits)
div rbx                         ; rax = rax / rbx, rdx = rax % rbx

Logical and Bitwise Instructions

and rax, rbx                    ; rax = rax & rbx
or  rax, rbx                    ; rax = rax | rbx
xor rax, rbx                    ; rax = rax ^ rbx
not rax                         ; rax = ~rax

; Useful idiom: zero a register
xor rax, rax                    ; rax = 0 (faster than mov rax, 0)

; Bit shifts
shl rax, 2                      ; rax = rax << 2 (logical left)
shr rax, 2                      ; rax = rax >> 2 (logical right)
sal rax, 2                      ; rax = rax << 2 (arithmetic left)
sar rax, 2                      ; rax = rax >> 2 (arithmetic right, preserves sign)

; Bit rotation
rol rax, 3                      ; Rotate left 3 bits
ror rax, 3                      ; Rotate right 3 bits

Addressing Modes

How to specify operands:

1. Immediate

mov rax, 42                     ; Load constant 42

2. Register

mov rax, rbx                    ; Copy from rbx

3. Direct Memory

mov rax, [0x1000]               ; rax = memory[0x1000]
mov rax, [myvar]                ; rax = memory[myvar]

4. Register Indirect

mov rax, [rbx]                  ; rax = memory[rbx]

5. Base + Offset

mov rax, [rbx + 8]              ; rax = memory[rbx + 8]

6. Indexed

mov rax, [rbx + rcx*4]          ; rax = memory[rbx + rcx*4]
                                ; Scale factor: 1, 2, 4, or 8

7. Base + Index + Offset

mov rax, [rbx + rcx*4 + 8]      ; rax = memory[rbx + rcx*4 + 8]

Common Use Cases:

; Array access: array[i]
mov rax, [array_base + rcx*8]   ; rcx = index, 8 = sizeof(element)

; Structure field: struct.field
mov rax, [rbx + 16]             ; rbx = struct base, 16 = field offset

; Array of structs: array[i].field
mov rax, [rbx + rcx*32 + 16]    ; 32 = sizeof(struct), 16 = field offset

Control Flow

Comparison

cmp rax, rbx                    ; Compare rax and rbx (computes rax - rbx)
                                ; Sets flags, doesn't store result

test rax, rbx                   ; Computes rax & rbx
                                ; Sets flags, doesn't store result

; Common idiom: test if zero
test rax, rax                   ; Is rax zero?

Unconditional Jump

jmp label                       ; Jump to label

Conditional Jumps

Based on comparison results:

je  label                       ; Jump if equal (ZF=1)
jne label                       ; Jump if not equal (ZF=0)
jz  label                       ; Jump if zero (same as je)
jnz label                       ; Jump if not zero (same as jne)

; Signed comparisons
jg  label                       ; Jump if greater (signed)
jge label                       ; Jump if greater or equal
jl  label                       ; Jump if less (signed)
jle label                       ; Jump if less or equal

; Unsigned comparisons
ja  label                       ; Jump if above (unsigned >)
jae label                       ; Jump if above or equal (unsigned >=)
jb  label                       ; Jump if below (unsigned <)
jbe label                       ; Jump if below or equal (unsigned <=)

; Other flags
jc  label                       ; Jump if carry (CF=1)
jo  label                       ; Jump if overflow (OF=1)
js  label                       ; Jump if sign (SF=1, negative)

Examples: If-Else

C Code:

if (a > b) {
    c = a;
} else {
    c = b;
}

Assembly:

    mov rax, [a]                ; Load a
    mov rbx, [b]                ; Load b
    cmp rax, rbx                ; Compare a and b
    jle else_block              ; If a <= b, goto else

    ; Then block (a > b)
    mov [c], rax                ; c = a
    jmp end_if                  ; Skip else block

else_block:
    mov [c], rbx                ; c = b

end_if:
    ; Continue...

Examples: While Loop

C Code:

int sum = 0;
int i = 0;
while (i < 10) {
    sum += i;
    i++;
}

Assembly:

    xor rax, rax                ; sum = 0
    xor rbx, rbx                ; i = 0

loop_start:
    cmp rbx, 10                 ; Compare i with 10
    jge loop_end                ; If i >= 10, exit loop

    add rax, rbx                ; sum += i
    inc rbx                     ; i++
    jmp loop_start              ; Repeat

loop_end:
    ; rax contains sum

Examples: For Loop

C Code:

int sum = 0;
for (int i = 0; i < 10; i++) {
    sum += i;
}

Assembly:

    xor rax, rax                ; sum = 0
    xor rcx, rcx                ; i = 0

for_loop:
    cmp rcx, 10                 ; Compare i with 10
    jge for_end                 ; If i >= 10, exit

    add rax, rcx                ; sum += i
    inc rcx                     ; i++
    jmp for_loop                ; Continue loop

for_end:
    ; rax contains sum

The Stack

The stack is a LIFO (Last In, First Out) data structure used for:

  • Function call management
  • Local variables
  • Temporary storage

Stack Growth

Important: On x86-64, the stack grows downward (toward lower addresses).

High Addresses
    ↑
    │
┌───┴────┐ ← Old RSP
│  Data  │
├────────┤
│  Data  │
├────────┤ ← Current RSP (after push)
│        │
│        │
└────────┘
    ↓
Low Addresses

Stack Operations

; Push (add to stack)
push rax
; Equivalent to:
;   sub rsp, 8
;   mov [rsp], rax

; Pop (remove from stack)
pop rbx
; Equivalent to:
;   mov rbx, [rsp]
;   add rsp, 8

Functions and Calling Conventions

x86-64 System V Calling Convention (Linux/macOS)

Parameter Passing:

First 6 integer arguments: RDI, RSI, RDX, RCX, R8, R9
Additional arguments:      Stack (right to left)
Return value:              RAX
Floating point:            XMM0-XMM7

Preserved Registers (callee must save/restore):

RBX, RBP, R12-R15

Scratch Registers (caller must save if needed):

RAX, RCX, RDX, RSI, RDI, R8-R11

Function Prologue and Epilogue

Function Structure:

my_function:
    ; Prologue (setup)
    push rbp                    ; Save old base pointer
    mov rbp, rsp                ; Set new base pointer
    sub rsp, 32                 ; Allocate 32 bytes for locals

    ; Function body
    ; ...

    ; Epilogue (cleanup)
    mov rsp, rbp                ; Restore stack pointer
    pop rbp                     ; Restore base pointer
    ret                         ; Return

Calling a Function

; Call: result = add(5, 10)

    mov rdi, 5                  ; First argument
    mov rsi, 10                 ; Second argument
    call add                    ; Call function
    ; Return value in rax

; Function definition
add:
    push rbp
    mov rbp, rsp

    mov rax, rdi                ; rax = first argument
    add rax, rsi                ; rax += second argument

    pop rbp
    ret

Stack Frame

High Addresses
┌─────────────────┐
│  Arguments      │
│  (if > 6)       │
├─────────────────┤
│  Return Address │ ← Pushed by call instruction
├─────────────────┤
│  Old RBP        │ ← Pushed by function prologue
├─────────────────┤ ← RBP points here
│  Local Var 1    │
├─────────────────┤
│  Local Var 2    │
├─────────────────┤
│  Local Var 3    │
├─────────────────┤ ← RSP points here
│                 │
Low Addresses

Example: Factorial Function

C Code:

int factorial(int n) {
    if (n <= 1)
        return 1;
    else
        return n * factorial(n - 1);
}

Assembly:

factorial:
    push rbp
    mov rbp, rsp

    ; Check base case: n <= 1
    cmp rdi, 1
    jg recursive_case

base_case:
    mov rax, 1                  ; return 1
    pop rbp
    ret

recursive_case:
    push rdi                    ; Save n
    dec rdi                     ; n - 1
    call factorial              ; factorial(n-1)
    pop rdi                     ; Restore n
    imul rax, rdi               ; rax = n * factorial(n-1)

    pop rbp
    ret

Complete Program Example

Simple Addition Program

; sum.asm - Add two numbers

section .data
    num1 dq 15                  ; 64-bit integer
    num2 dq 27
    msg db "Result: ", 0        ; String (null-terminated)

section .bss
    result resq 1               ; Reserve 1 qword for result

section .text
    global _start

_start:
    ; Load numbers
    mov rax, [num1]
    mov rbx, [num2]

    ; Add
    add rax, rbx

    ; Store result
    mov [result], rax

    ; Exit program (Linux syscall)
    mov rax, 60                 ; syscall: exit
    xor rdi, rdi                ; status: 0
    syscall

Array Sum Example

C Code:

int sum_array(int *arr, int len) {
    int sum = 0;
    for (int i = 0; i < len; i++) {
        sum += arr[i];
    }
    return sum;
}

Assembly:

sum_array:
    push rbp
    mov rbp, rsp

    xor rax, rax                ; sum = 0
    xor rcx, rcx                ; i = 0

loop_start:
    cmp rcx, rsi                ; Compare i with len
    jge loop_end                ; If i >= len, exit

    mov edx, [rdi + rcx*4]      ; edx = arr[i] (4-byte int)
    add rax, rdx                ; sum += arr[i]
    inc rcx                     ; i++
    jmp loop_start

loop_end:
    pop rbp
    ret                         ; Return sum in rax

Common Patterns and Idioms

1. Clear a Register

xor rax, rax                    ; Faster than mov rax, 0

2. Multiply/Divide by Power of 2

; Multiply by 8
shl rax, 3                      ; rax *= 2^3 = 8

; Divide by 4 (unsigned)
shr rax, 2                      ; rax /= 2^2 = 4

; Divide by 4 (signed)
sar rax, 2                      ; Preserves sign

3. Swap Two Registers (without temp)

xor rax, rbx
xor rbx, rax
xor rax, rbx                    ; rax and rbx swapped

4. Conditional Move (avoid branching)

; Modern alternative to conditional jump
cmp rax, rbx
cmovg rax, rcx                  ; if (rax > rbx) rax = rcx

5. Check if Number is Even

test rax, 1                     ; Check lowest bit
jz is_even                      ; If zero, number is even

Reading Compiler Output

Compiling with GCC

# Compile to assembly
gcc -S -O0 program.c            # No optimization
gcc -S -O2 program.c            # Optimized

# Compile to object file
gcc -c program.c

# Disassemble object file
objdump -d program.o

# View with Intel syntax
objdump -d -M intel program.o

Example: Compiler Optimization

C Code:

int add(int a, int b) {
    return a + b;
}

Assembly (no optimization):

add:
    push rbp
    mov rbp, rsp
    mov DWORD PTR [rbp-4], edi
    mov DWORD PTR [rbp-8], esi
    mov edx, DWORD PTR [rbp-4]
    mov eax, DWORD PTR [rbp-8]
    add eax, edx
    pop rbp
    ret

Assembly (optimized):

add:
    lea eax, [rdi+rsi]
    ret

Debugging Assembly

GDB (GNU Debugger)

# Compile with debug symbols
gcc -g program.c -o program

# Run in GDB
gdb program

# GDB commands:
break _start                    # Set breakpoint
run                             # Start program
stepi                           # Step one instruction
info registers                  # Show all registers
print $rax                      # Print rax value
x/10x $rsp                      # Examine stack (10 hex values)
disassemble                     # Show assembly around current instruction

Common Debugging Tasks

1. Examine register values:

(gdb) info registers
(gdb) print/x $rax              # Print in hex
(gdb) print/d $rbx              # Print in decimal

2. Examine memory:

(gdb) x/4xg $rsp                # 4 qwords in hex
(gdb) x/s $rdi                  # String
(gdb) x/10i $rip                # 10 instructions

3. Set breakpoints:

(gdb) break *0x401234           # Break at address
(gdb) break my_function         # Break at function

Inline Assembly in C

GCC Inline Assembly

int add(int a, int b) {
    int result;
    asm("addl %1, %2\n\t"
        "movl %2, %0"
        : "=r" (result)         // Output
        : "r" (a), "r" (b)      // Inputs
    );
    return result;
}

Volatile (don't optimize):

asm volatile("nop");            // No operation (don't remove)

RISC Comparison: MIPS Example

For contrast, here's MIPS (classic RISC):

x86-64 vs MIPS:

Operation         x86-64              MIPS
─────────────────────────────────────────────────
Add               add rax, rbx        add $t0, $t1, $t2
Load              mov rax, [rbx]      lw $t0, 0($t1)
Store             mov [rbx], rax      sw $t0, 0($t1)
Branch if equal   je label            beq $t0, $t1, label
Jump              jmp label           j label

MIPS Characteristics:

  • Fixed 32-bit instruction length
  • Load/store architecture (only load/store access memory)
  • 3-operand format: add $dest, $src1, $src2
  • Simpler, more regular

Programming Exercises

Python: Assembly Simulator

class SimpleAssembler:
    def __init__(self):
        self.registers = {f'r{i}': 0 for i in range(8)}
        self.flags = {'Z': 0, 'N': 0, 'C': 0}
        self.memory = [0] * 1024
        self.pc = 0

    def execute(self, instruction):
        """Execute a single instruction"""
        parts = instruction.split()
        opcode = parts[0]

        if opcode == 'mov':
            dest, src = parts[1].rstrip(','), parts[2]
            self.registers[dest] = self._get_value(src)

        elif opcode == 'add':
            dest, src = parts[1].rstrip(','), parts[2]
            result = self.registers[dest] + self._get_value(src)
            self.registers[dest] = result & 0xFFFFFFFF
            self._update_flags(result)

        elif opcode == 'sub':
            dest, src = parts[1].rstrip(','), parts[2]
            result = self.registers[dest] - self._get_value(src)
            self.registers[dest] = result & 0xFFFFFFFF
            self._update_flags(result)

        elif opcode == 'cmp':
            reg1, reg2 = parts[1].rstrip(','), parts[2]
            result = self.registers[reg1] - self._get_value(reg2)
            self._update_flags(result)

    def _get_value(self, operand):
        """Get value from register or immediate"""
        if operand.startswith('r'):
            return self.registers[operand]
        else:
            return int(operand)

    def _update_flags(self, result):
        """Update status flags based on result"""
        self.flags['Z'] = 1 if result == 0 else 0
        self.flags['N'] = 1 if result < 0 else 0

    def show_state(self):
        """Display current state"""
        print("Registers:", {k: v for k, v in self.registers.items() if v != 0})
        print("Flags:", self.flags)

# Example usage
asm = SimpleAssembler()
program = [
    "mov r0, 10",
    "mov r1, 20",
    "add r0, r1",
]

for instruction in program:
    print(f"Executing: {instruction}")
    asm.execute(instruction)
    asm.show_state()
    print()

Exercises

Basic Exercises

  1. Simple Instructions

    • Write assembly to swap values of rax and rbx (using a temp register)
    • Calculate: (a + b) * c using assembly instructions
    • Load value from memory address in rbx, add 10, store back
  2. Control Flow

    • Write assembly for: if (x > 5) x = x * 2;
    • Implement a loop that sums 1+2+3+...+10
    • Write assembly for: max = (a > b) ? a : b;
  3. Addressing Modes

    • Access 5th element of integer array (base in rbx, index in rcx)
    • Access field at offset 16 in a structure (base in rdi)
    • Calculate address: array[i].field where field is at offset 8

Intermediate Exercises

  1. Function Implementation

    • Write assembly function: int square(int n) { return n * n; }
    • Implement: int max3(int a, int b, int c) that returns largest
    • Write function to reverse a string in place
  2. Array Operations

    • Implement: int sum_array(int *arr, int len)
    • Write: void fill_array(int *arr, int len, int value)
    • Implement: int find_max(int *arr, int len)
  3. Translation

    • Convert this C to assembly: for(i=0; i<n; i++) sum += arr[i];
    • Translate: while (x > 0) { sum += x; x--; }
    • Convert: if (a > b && c < d) result = 1; else result = 0;

Advanced Exercises

  1. Recursion

    • Implement recursive fibonacci: fib(n) = fib(n-1) + fib(n-2)
    • Write recursive function to compute power: pow(base, exp)
    • Implement recursive array sum
  2. Optimization Analysis

    • Compare compiled output with -O0 vs -O2 for simple function
    • Identify what optimizations the compiler applied
    • Hand-optimize a loop to reduce instruction count
  3. Stack Manipulation

    • Write function with 10 local variables (use stack)
    • Implement function that calls another function with 8 arguments
    • Write function that uses both preserved and scratch registers
  4. Advanced Patterns

    • Implement switch statement with jump table
    • Write inline assembly for atomic compare-and-swap
    • Create assembly for a simple state machine

Summary

In this reading, we explored assembly language programming:

Key Concepts:

  • Assembly language bridges high-level code and machine code
  • Registers provide fast CPU storage with specific purposes
  • Addressing modes offer flexible ways to access memory
  • Control flow implements conditionals and loops
  • Stack manages function calls and local variables
  • Calling conventions define how functions communicate

Important Skills:

  • Reading and writing basic assembly code
  • Understanding instruction encoding and execution
  • Translating between C and assembly
  • Debugging at the assembly level
  • Recognizing compiler optimizations

Why This Matters:

  • Understand what your code actually does on the CPU
  • Debug performance issues and optimize critical code
  • Understand security vulnerabilities (buffer overflows, etc.)
  • Appreciate the complexity compilers handle
  • Foundation for reverse engineering and systems programming

Further Reading

  • "Programming from the Ground Up" by Jonathan Bartlett
  • Intel Software Developer Manuals (full reference)
  • "Computer Systems: A Programmer's Perspective" by Bryant & O'Hallaron
  • RISC-V and ARM assembly for comparison

Module Complete!

Congratulations! You've completed Module 5: Computer Architecture. You now understand:

  1. Number systems and binary representation
  2. Logic gates and digital circuits
  3. CPU architecture and operation
  4. Memory hierarchy and caching
  5. Assembly language programming

This foundation will serve you well in systems programming, performance optimization, embedded systems, and understanding how computers work at the lowest levels.

Next Steps:

  • Build a simple CPU simulator
  • Write performance-critical code in assembly
  • Explore computer organization in more depth
  • Study operating systems to see how software manages hardware

Module 5: Computer Architecture | Reading 5 of 5

Module Complete!