Assembly Language Basics
Learning Objectives
By the end of this reading, you will be able to:
- Understand the relationship between assembly, machine code, and high-level languages
- Read and write basic assembly language programs
- Understand registers, instructions, and addressing modes
- Implement control flow (conditionals, loops) in assembly
- Understand the stack and function calling conventions
- Translate simple C code to assembly
- Debug and analyze compiled code
- Appreciate the role of compilers and optimization
Introduction
Assembly language is a low-level programming language with a strong correspondence between instructions and machine code. Each assembly instruction typically corresponds to one machine code instruction.
Why learn assembly?
- Understand what the CPU actually does
- Debug optimized or compiled code
- Write performance-critical code
- Understand security vulnerabilities
- Bridge the gap between hardware and software
- Appreciate what compilers do for you
The Language Hierarchy
┌────────────────────────────────────┐
│ High-Level Language (C, Python) │ ← Abstraction, portability
└──────────────┬─────────────────────┘
│ Compiler/Interpreter
▼
┌────────────────────────────────────┐
│ Assembly Language (asm) │ ← Human-readable mnemonics
└──────────────┬─────────────────────┘
│ Assembler
▼
┌────────────────────────────────────┐
│ Machine Code (binary) │ ← CPU executes this
└────────────────────────────────────┘
Example:
C: a = b + c;
Assembly: add rax, rbx, rcx
Machine: 0100 1000 0000 0001 0001 1010 0001 1100
Choosing an Assembly Language
Different CPUs have different instruction sets. Common architectures:
x86-64 (Intel/AMD)
- CISC architecture
- Variable-length instructions
- Used in most desktop/laptop computers
- Complex but powerful
ARM
- RISC architecture
- Fixed-length instructions (usually)
- Used in mobile devices, Apple Silicon
- Simple and efficient
MIPS
- RISC architecture
- Educational favorite
- Clean, simple design
RISC-V
- Modern open-source RISC
- Growing in popularity
This reading uses x86-64 as the primary example (with RISC concepts explained for comparison).
x86-64 Registers
Registers are the CPU's fastest storage locations.
General-Purpose Registers (64-bit)
┌─────────────────────────────────────────────────────┐
│ RAX │ 64-bit
├───────────────────────────┬─────────────────────────┤
│ │ EAX │ 32-bit
│ ├─────────────┬───────────┤
│ │ │ AX │ 16-bit
│ │ ├─────┬─────┤
│ │ │ AH │ AL │ 8-bit
└───────────────────────────┴─────────────┴─────┴─────┘
Register Names (64-bit / 32-bit / 16-bit / 8-bit):
RAX / EAX / AX / AL - Accumulator (arithmetic operations)
RBX / EBX / BX / BL - Base (memory addressing)
RCX / ECX / CX / CL - Counter (loop counters)
RDX / EDX / DX / DL - Data (I/O operations, arithmetic)
RSI / ESI / SI / SIL - Source Index (string/memory operations)
RDI / EDI / DI / DIL - Destination Index (string/memory operations)
RBP / EBP / BP / BPL - Base Pointer (stack frame base)
RSP / ESP / SP / SPL - Stack Pointer (top of stack)
R8-R15 - Additional general-purpose (64-bit mode)
Special Registers
RIP - Instruction Pointer (program counter)
RFLAGS- Status flags
Flags Register:
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ │ │ │OF│DF│ IF│TF│SF│ZF│ │CF│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
│ │ │ │ │ │ │
│ │ │ │ │ │ └─ Carry Flag
│ │ │ │ │ └─────── Zero Flag
│ │ │ │ └────────── Sign Flag
│ │ │ └───────────── Trap Flag
│ │ └──────────────── Interrupt Enable
│ └─────────────────── Direction Flag
└────────────────────── Overflow Flag
Basic Assembly Syntax
Intel vs AT&T Syntax
Two common syntaxes for x86 assembly:
Intel Syntax (we'll use this):
mov rax, 5 ; Destination, Source
add rax, rbx ; rax = rax + rbx
AT&T Syntax:
movq $5, %rax ; Source, Destination
addq %rbx, %rax ; rax = rax + rbx
Instruction Format
label: mnemonic operands ; comment
example:
mov rax, 10 ; Load 10 into rax
loop_start:
add rax, rbx ; Add rbx to rax
Data Types and Sizes
BYTE - 8 bits (1 byte)
WORD - 16 bits (2 bytes)
DWORD - 32 bits (4 bytes)
QWORD - 64 bits (8 bytes)
Data Movement Instructions
MOV - Move Data
; Immediate to register
mov rax, 42 ; rax = 42
; Register to register
mov rbx, rax ; rbx = rax
; Memory to register
mov rax, [rbx] ; rax = memory[rbx]
; Register to memory
mov [rbx], rax ; memory[rbx] = rax
; Immediate to memory
mov QWORD PTR [rbx], 100 ; memory[rbx] = 100
Note: Cannot move directly from memory to memory. Use a register as intermediate.
PUSH/POP - Stack Operations
push rax ; Push rax onto stack
; rsp = rsp - 8
; memory[rsp] = rax
pop rbx ; Pop from stack into rbx
; rbx = memory[rsp]
; rsp = rsp + 8
LEA - Load Effective Address
lea rax, [rbx + rcx*4 + 8] ; rax = rbx + rcx*4 + 8
; (computes address without accessing memory)
Arithmetic Instructions
Basic Arithmetic
add rax, rbx ; rax = rax + rbx
sub rax, rbx ; rax = rax - rbx
inc rax ; rax = rax + 1
dec rax ; rax = rax - 1
neg rax ; rax = -rax
; Immediate values
add rax, 10 ; rax = rax + 10
sub rbx, 5 ; rbx = rbx - 5
Multiplication and Division
; Unsigned multiplication
mul rbx ; rdx:rax = rax * rbx
; (128-bit result in rdx:rax)
; Signed multiplication
imul rbx ; rdx:rax = rax * rbx (signed)
imul rax, rbx ; rax = rax * rbx (result in rax only)
imul rax, rbx, 10 ; rax = rbx * 10
; Unsigned division
div rbx ; rax = rdx:rax / rbx (quotient)
; rdx = rdx:rax % rbx (remainder)
; Signed division
idiv rbx ; rax = rdx:rax / rbx (quotient)
; rdx = rdx:rax % rbx (remainder)
Division Setup:
; To divide rax by rbx:
xor rdx, rdx ; Clear rdx (high bits)
div rbx ; rax = rax / rbx, rdx = rax % rbx
Logical and Bitwise Instructions
and rax, rbx ; rax = rax & rbx
or rax, rbx ; rax = rax | rbx
xor rax, rbx ; rax = rax ^ rbx
not rax ; rax = ~rax
; Useful idiom: zero a register
xor rax, rax ; rax = 0 (faster than mov rax, 0)
; Bit shifts
shl rax, 2 ; rax = rax << 2 (logical left)
shr rax, 2 ; rax = rax >> 2 (logical right)
sal rax, 2 ; rax = rax << 2 (arithmetic left)
sar rax, 2 ; rax = rax >> 2 (arithmetic right, preserves sign)
; Bit rotation
rol rax, 3 ; Rotate left 3 bits
ror rax, 3 ; Rotate right 3 bits
Addressing Modes
How to specify operands:
1. Immediate
mov rax, 42 ; Load constant 42
2. Register
mov rax, rbx ; Copy from rbx
3. Direct Memory
mov rax, [0x1000] ; rax = memory[0x1000]
mov rax, [myvar] ; rax = memory[myvar]
4. Register Indirect
mov rax, [rbx] ; rax = memory[rbx]
5. Base + Offset
mov rax, [rbx + 8] ; rax = memory[rbx + 8]
6. Indexed
mov rax, [rbx + rcx*4] ; rax = memory[rbx + rcx*4]
; Scale factor: 1, 2, 4, or 8
7. Base + Index + Offset
mov rax, [rbx + rcx*4 + 8] ; rax = memory[rbx + rcx*4 + 8]
Common Use Cases:
; Array access: array[i]
mov rax, [array_base + rcx*8] ; rcx = index, 8 = sizeof(element)
; Structure field: struct.field
mov rax, [rbx + 16] ; rbx = struct base, 16 = field offset
; Array of structs: array[i].field
mov rax, [rbx + rcx*32 + 16] ; 32 = sizeof(struct), 16 = field offset
Control Flow
Comparison
cmp rax, rbx ; Compare rax and rbx (computes rax - rbx)
; Sets flags, doesn't store result
test rax, rbx ; Computes rax & rbx
; Sets flags, doesn't store result
; Common idiom: test if zero
test rax, rax ; Is rax zero?
Unconditional Jump
jmp label ; Jump to label
Conditional Jumps
Based on comparison results:
je label ; Jump if equal (ZF=1)
jne label ; Jump if not equal (ZF=0)
jz label ; Jump if zero (same as je)
jnz label ; Jump if not zero (same as jne)
; Signed comparisons
jg label ; Jump if greater (signed)
jge label ; Jump if greater or equal
jl label ; Jump if less (signed)
jle label ; Jump if less or equal
; Unsigned comparisons
ja label ; Jump if above (unsigned >)
jae label ; Jump if above or equal (unsigned >=)
jb label ; Jump if below (unsigned <)
jbe label ; Jump if below or equal (unsigned <=)
; Other flags
jc label ; Jump if carry (CF=1)
jo label ; Jump if overflow (OF=1)
js label ; Jump if sign (SF=1, negative)
Examples: If-Else
C Code:
if (a > b) {
c = a;
} else {
c = b;
}
Assembly:
mov rax, [a] ; Load a
mov rbx, [b] ; Load b
cmp rax, rbx ; Compare a and b
jle else_block ; If a <= b, goto else
; Then block (a > b)
mov [c], rax ; c = a
jmp end_if ; Skip else block
else_block:
mov [c], rbx ; c = b
end_if:
; Continue...
Examples: While Loop
C Code:
int sum = 0;
int i = 0;
while (i < 10) {
sum += i;
i++;
}
Assembly:
xor rax, rax ; sum = 0
xor rbx, rbx ; i = 0
loop_start:
cmp rbx, 10 ; Compare i with 10
jge loop_end ; If i >= 10, exit loop
add rax, rbx ; sum += i
inc rbx ; i++
jmp loop_start ; Repeat
loop_end:
; rax contains sum
Examples: For Loop
C Code:
int sum = 0;
for (int i = 0; i < 10; i++) {
sum += i;
}
Assembly:
xor rax, rax ; sum = 0
xor rcx, rcx ; i = 0
for_loop:
cmp rcx, 10 ; Compare i with 10
jge for_end ; If i >= 10, exit
add rax, rcx ; sum += i
inc rcx ; i++
jmp for_loop ; Continue loop
for_end:
; rax contains sum
The Stack
The stack is a LIFO (Last In, First Out) data structure used for:
- Function call management
- Local variables
- Temporary storage
Stack Growth
Important: On x86-64, the stack grows downward (toward lower addresses).
High Addresses
↑
│
┌───┴────┐ ← Old RSP
│ Data │
├────────┤
│ Data │
├────────┤ ← Current RSP (after push)
│ │
│ │
└────────┘
↓
Low Addresses
Stack Operations
; Push (add to stack)
push rax
; Equivalent to:
; sub rsp, 8
; mov [rsp], rax
; Pop (remove from stack)
pop rbx
; Equivalent to:
; mov rbx, [rsp]
; add rsp, 8
Functions and Calling Conventions
x86-64 System V Calling Convention (Linux/macOS)
Parameter Passing:
First 6 integer arguments: RDI, RSI, RDX, RCX, R8, R9
Additional arguments: Stack (right to left)
Return value: RAX
Floating point: XMM0-XMM7
Preserved Registers (callee must save/restore):
RBX, RBP, R12-R15
Scratch Registers (caller must save if needed):
RAX, RCX, RDX, RSI, RDI, R8-R11
Function Prologue and Epilogue
Function Structure:
my_function:
; Prologue (setup)
push rbp ; Save old base pointer
mov rbp, rsp ; Set new base pointer
sub rsp, 32 ; Allocate 32 bytes for locals
; Function body
; ...
; Epilogue (cleanup)
mov rsp, rbp ; Restore stack pointer
pop rbp ; Restore base pointer
ret ; Return
Calling a Function
; Call: result = add(5, 10)
mov rdi, 5 ; First argument
mov rsi, 10 ; Second argument
call add ; Call function
; Return value in rax
; Function definition
add:
push rbp
mov rbp, rsp
mov rax, rdi ; rax = first argument
add rax, rsi ; rax += second argument
pop rbp
ret
Stack Frame
High Addresses
┌─────────────────┐
│ Arguments │
│ (if > 6) │
├─────────────────┤
│ Return Address │ ← Pushed by call instruction
├─────────────────┤
│ Old RBP │ ← Pushed by function prologue
├─────────────────┤ ← RBP points here
│ Local Var 1 │
├─────────────────┤
│ Local Var 2 │
├─────────────────┤
│ Local Var 3 │
├─────────────────┤ ← RSP points here
│ │
Low Addresses
Example: Factorial Function
C Code:
int factorial(int n) {
if (n <= 1)
return 1;
else
return n * factorial(n - 1);
}
Assembly:
factorial:
push rbp
mov rbp, rsp
; Check base case: n <= 1
cmp rdi, 1
jg recursive_case
base_case:
mov rax, 1 ; return 1
pop rbp
ret
recursive_case:
push rdi ; Save n
dec rdi ; n - 1
call factorial ; factorial(n-1)
pop rdi ; Restore n
imul rax, rdi ; rax = n * factorial(n-1)
pop rbp
ret
Complete Program Example
Simple Addition Program
; sum.asm - Add two numbers
section .data
num1 dq 15 ; 64-bit integer
num2 dq 27
msg db "Result: ", 0 ; String (null-terminated)
section .bss
result resq 1 ; Reserve 1 qword for result
section .text
global _start
_start:
; Load numbers
mov rax, [num1]
mov rbx, [num2]
; Add
add rax, rbx
; Store result
mov [result], rax
; Exit program (Linux syscall)
mov rax, 60 ; syscall: exit
xor rdi, rdi ; status: 0
syscall
Array Sum Example
C Code:
int sum_array(int *arr, int len) {
int sum = 0;
for (int i = 0; i < len; i++) {
sum += arr[i];
}
return sum;
}
Assembly:
sum_array:
push rbp
mov rbp, rsp
xor rax, rax ; sum = 0
xor rcx, rcx ; i = 0
loop_start:
cmp rcx, rsi ; Compare i with len
jge loop_end ; If i >= len, exit
mov edx, [rdi + rcx*4] ; edx = arr[i] (4-byte int)
add rax, rdx ; sum += arr[i]
inc rcx ; i++
jmp loop_start
loop_end:
pop rbp
ret ; Return sum in rax
Common Patterns and Idioms
1. Clear a Register
xor rax, rax ; Faster than mov rax, 0
2. Multiply/Divide by Power of 2
; Multiply by 8
shl rax, 3 ; rax *= 2^3 = 8
; Divide by 4 (unsigned)
shr rax, 2 ; rax /= 2^2 = 4
; Divide by 4 (signed)
sar rax, 2 ; Preserves sign
3. Swap Two Registers (without temp)
xor rax, rbx
xor rbx, rax
xor rax, rbx ; rax and rbx swapped
4. Conditional Move (avoid branching)
; Modern alternative to conditional jump
cmp rax, rbx
cmovg rax, rcx ; if (rax > rbx) rax = rcx
5. Check if Number is Even
test rax, 1 ; Check lowest bit
jz is_even ; If zero, number is even
Reading Compiler Output
Compiling with GCC
# Compile to assembly
gcc -S -O0 program.c # No optimization
gcc -S -O2 program.c # Optimized
# Compile to object file
gcc -c program.c
# Disassemble object file
objdump -d program.o
# View with Intel syntax
objdump -d -M intel program.o
Example: Compiler Optimization
C Code:
int add(int a, int b) {
return a + b;
}
Assembly (no optimization):
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-8]
add eax, edx
pop rbp
ret
Assembly (optimized):
add:
lea eax, [rdi+rsi]
ret
Debugging Assembly
GDB (GNU Debugger)
# Compile with debug symbols
gcc -g program.c -o program
# Run in GDB
gdb program
# GDB commands:
break _start # Set breakpoint
run # Start program
stepi # Step one instruction
info registers # Show all registers
print $rax # Print rax value
x/10x $rsp # Examine stack (10 hex values)
disassemble # Show assembly around current instruction
Common Debugging Tasks
1. Examine register values:
(gdb) info registers
(gdb) print/x $rax # Print in hex
(gdb) print/d $rbx # Print in decimal
2. Examine memory:
(gdb) x/4xg $rsp # 4 qwords in hex
(gdb) x/s $rdi # String
(gdb) x/10i $rip # 10 instructions
3. Set breakpoints:
(gdb) break *0x401234 # Break at address
(gdb) break my_function # Break at function
Inline Assembly in C
GCC Inline Assembly
int add(int a, int b) {
int result;
asm("addl %1, %2\n\t"
"movl %2, %0"
: "=r" (result) // Output
: "r" (a), "r" (b) // Inputs
);
return result;
}
Volatile (don't optimize):
asm volatile("nop"); // No operation (don't remove)
RISC Comparison: MIPS Example
For contrast, here's MIPS (classic RISC):
x86-64 vs MIPS:
Operation x86-64 MIPS
─────────────────────────────────────────────────
Add add rax, rbx add $t0, $t1, $t2
Load mov rax, [rbx] lw $t0, 0($t1)
Store mov [rbx], rax sw $t0, 0($t1)
Branch if equal je label beq $t0, $t1, label
Jump jmp label j label
MIPS Characteristics:
- Fixed 32-bit instruction length
- Load/store architecture (only load/store access memory)
- 3-operand format:
add $dest, $src1, $src2 - Simpler, more regular
Programming Exercises
Python: Assembly Simulator
class SimpleAssembler:
def __init__(self):
self.registers = {f'r{i}': 0 for i in range(8)}
self.flags = {'Z': 0, 'N': 0, 'C': 0}
self.memory = [0] * 1024
self.pc = 0
def execute(self, instruction):
"""Execute a single instruction"""
parts = instruction.split()
opcode = parts[0]
if opcode == 'mov':
dest, src = parts[1].rstrip(','), parts[2]
self.registers[dest] = self._get_value(src)
elif opcode == 'add':
dest, src = parts[1].rstrip(','), parts[2]
result = self.registers[dest] + self._get_value(src)
self.registers[dest] = result & 0xFFFFFFFF
self._update_flags(result)
elif opcode == 'sub':
dest, src = parts[1].rstrip(','), parts[2]
result = self.registers[dest] - self._get_value(src)
self.registers[dest] = result & 0xFFFFFFFF
self._update_flags(result)
elif opcode == 'cmp':
reg1, reg2 = parts[1].rstrip(','), parts[2]
result = self.registers[reg1] - self._get_value(reg2)
self._update_flags(result)
def _get_value(self, operand):
"""Get value from register or immediate"""
if operand.startswith('r'):
return self.registers[operand]
else:
return int(operand)
def _update_flags(self, result):
"""Update status flags based on result"""
self.flags['Z'] = 1 if result == 0 else 0
self.flags['N'] = 1 if result < 0 else 0
def show_state(self):
"""Display current state"""
print("Registers:", {k: v for k, v in self.registers.items() if v != 0})
print("Flags:", self.flags)
# Example usage
asm = SimpleAssembler()
program = [
"mov r0, 10",
"mov r1, 20",
"add r0, r1",
]
for instruction in program:
print(f"Executing: {instruction}")
asm.execute(instruction)
asm.show_state()
print()
Exercises
Basic Exercises
Simple Instructions
- Write assembly to swap values of rax and rbx (using a temp register)
- Calculate: (a + b) * c using assembly instructions
- Load value from memory address in rbx, add 10, store back
Control Flow
- Write assembly for: if (x > 5) x = x * 2;
- Implement a loop that sums 1+2+3+...+10
- Write assembly for: max = (a > b) ? a : b;
Addressing Modes
- Access 5th element of integer array (base in rbx, index in rcx)
- Access field at offset 16 in a structure (base in rdi)
- Calculate address: array[i].field where field is at offset 8
Intermediate Exercises
Function Implementation
- Write assembly function: int square(int n) { return n * n; }
- Implement: int max3(int a, int b, int c) that returns largest
- Write function to reverse a string in place
Array Operations
- Implement: int sum_array(int *arr, int len)
- Write: void fill_array(int *arr, int len, int value)
- Implement: int find_max(int *arr, int len)
Translation
- Convert this C to assembly: for(i=0; i<n; i++) sum += arr[i];
- Translate: while (x > 0) { sum += x; x--; }
- Convert: if (a > b && c < d) result = 1; else result = 0;
Advanced Exercises
Recursion
- Implement recursive fibonacci: fib(n) = fib(n-1) + fib(n-2)
- Write recursive function to compute power: pow(base, exp)
- Implement recursive array sum
Optimization Analysis
- Compare compiled output with -O0 vs -O2 for simple function
- Identify what optimizations the compiler applied
- Hand-optimize a loop to reduce instruction count
Stack Manipulation
- Write function with 10 local variables (use stack)
- Implement function that calls another function with 8 arguments
- Write function that uses both preserved and scratch registers
Advanced Patterns
- Implement switch statement with jump table
- Write inline assembly for atomic compare-and-swap
- Create assembly for a simple state machine
Summary
In this reading, we explored assembly language programming:
Key Concepts:
- Assembly language bridges high-level code and machine code
- Registers provide fast CPU storage with specific purposes
- Addressing modes offer flexible ways to access memory
- Control flow implements conditionals and loops
- Stack manages function calls and local variables
- Calling conventions define how functions communicate
Important Skills:
- Reading and writing basic assembly code
- Understanding instruction encoding and execution
- Translating between C and assembly
- Debugging at the assembly level
- Recognizing compiler optimizations
Why This Matters:
- Understand what your code actually does on the CPU
- Debug performance issues and optimize critical code
- Understand security vulnerabilities (buffer overflows, etc.)
- Appreciate the complexity compilers handle
- Foundation for reverse engineering and systems programming
Further Reading
- "Programming from the Ground Up" by Jonathan Bartlett
- Intel Software Developer Manuals (full reference)
- "Computer Systems: A Programmer's Perspective" by Bryant & O'Hallaron
- RISC-V and ARM assembly for comparison
Module Complete!
Congratulations! You've completed Module 5: Computer Architecture. You now understand:
- Number systems and binary representation
- Logic gates and digital circuits
- CPU architecture and operation
- Memory hierarchy and caching
- Assembly language programming
This foundation will serve you well in systems programming, performance optimization, embedded systems, and understanding how computers work at the lowest levels.
Next Steps:
- Build a simple CPU simulator
- Write performance-critical code in assembly
- Explore computer organization in more depth
- Study operating systems to see how software manages hardware
Module 5: Computer Architecture | Reading 5 of 5
Module Complete!