CPU Architecture

Learning Objectives

By the end of this reading, you will be able to:

  • Understand the Von Neumann and Harvard architectures
  • Explain the fetch-decode-execute cycle
  • Identify the major components of a CPU: ALU, Control Unit, and Registers
  • Understand instruction set architecture (ISA) and machine code
  • Trace instruction execution through the CPU
  • Understand pipelining and its impact on performance
  • Recognize different CPU architectures (CISC vs RISC)

Introduction

The Central Processing Unit (CPU) is the "brain" of a computer. It executes instructions stored in memory, performing calculations, making decisions, and coordinating the entire system.

Understanding CPU architecture is crucial because:

  • It reveals how software ultimately executes on hardware
  • It helps you write more efficient code
  • It's fundamental to understanding computer performance
  • It bridges the gap between high-level languages and machine operation

Historical Context: Von Neumann Architecture

In 1945, John von Neumann proposed a computer architecture that became the foundation for modern computers.

Von Neumann Architecture Components

┌─────────────────────────────────────────────┐
│                                             │
│  ┌──────────────┐      ┌─────────────────┐ │
│  │              │      │                 │ │
│  │  Control     │◄────►│   Arithmetic    │ │
│  │  Unit        │      │   Logic Unit    │ │
│  │              │      │   (ALU)         │ │
│  └──────┬───────┘      └─────────────────┘ │
│         │                                   │
│         │              CPU                  │
│         │                                   │
│         ▼                                   │
│  ┌──────────────┐                          │
│  │  Registers   │                          │
│  └──────────────┘                          │
│                                             │
└─────────┬───────────────────────────────────┘
          │
          │ System Bus (Address, Data, Control)
          │
    ┌─────┴──────┐
    │            │
┌───▼──────┐  ┌──▼──────┐
│          │  │         │
│  Memory  │  │  Input/ │
│          │  │  Output │
│          │  │         │
└──────────┘  └─────────┘

Key Principles

  1. Stored-Program Concept: Both program instructions and data are stored in the same memory
  2. Sequential Execution: Instructions are fetched and executed one after another
  3. Binary Representation: All information is represented in binary
  4. Shared Bus: Single bus for both instructions and data (Von Neumann bottleneck)

Harvard Architecture

An alternative architecture that separates instruction and data memory:

         ┌─────────────┐
         │             │
         │     CPU     │
         │             │
         └──┬────────┬─┘
            │        │
    ┌───────┘        └────────┐
    │                         │
    │ Instruction Bus         │ Data Bus
    │                         │
┌───▼──────────┐      ┌───────▼─────┐
│              │      │             │
│ Instruction  │      │    Data     │
│   Memory     │      │   Memory    │
│              │      │             │
└──────────────┘      └─────────────┘

Advantages:

  • Can fetch instructions and data simultaneously
  • No Von Neumann bottleneck
  • Common in embedded systems and microcontrollers

Modern CPUs often use a modified Harvard architecture with separate L1 caches.

Major CPU Components

1. Arithmetic Logic Unit (ALU)

The ALU performs arithmetic and logical operations.

Arithmetic Operations:

  • Addition, Subtraction
  • Multiplication, Division (sometimes separate unit)
  • Increment, Decrement

Logical Operations:

  • AND, OR, XOR, NOT
  • Bit shifting, rotation
  • Comparisons

ALU Structure:

       A (operand 1)    B (operand 2)
            │                │
            ▼                ▼
       ┌────────────────────────┐
       │                        │
       │        ALU             │
       │   ┌────────────┐       │
       │   │  Adder     │       │
       │   │  Logic     │       │  ◄─── Opcode (operation selector)
       │   │  Shifter   │       │
       │   │  etc.      │       │
       │   └────────────┘       │
       │                        │
       └───┬──────────────┬─────┘
           │              │
           ▼              ▼
        Result         Flags
                    (Z, C, N, V)

Flags (Status Register):

  • Zero (Z): Result is zero
  • Carry (C): Carry out from MSB
  • Negative (N): Result is negative
  • Overflow (V): Signed overflow occurred

2. Control Unit (CU)

The Control Unit directs the operation of the processor.

Responsibilities:

  • Fetches instructions from memory
  • Decodes instructions
  • Generates control signals for other components
  • Coordinates timing using the clock

Control Signals:

Control Unit outputs:
├── Memory Read/Write
├── ALU operation select
├── Register select
├── Bus control
└── Interrupt handling

3. Registers

Registers are ultra-fast storage locations inside the CPU.

Common Registers:

┌──────────────────────────────────────────┐
│  General-Purpose Registers               │
│  ┌────┬────┬────┬────┬─────┬─────────┐  │
│  │ R0 │ R1 │ R2 │ R3 │ ... │ R15/16  │  │
│  └────┴────┴────┴────┴─────┴─────────┘  │
│                                          │
│  Special-Purpose Registers               │
│  ┌────────────────────────────────────┐ │
│  │ PC  - Program Counter              │ │
│  │ IR  - Instruction Register         │ │
│  │ MAR - Memory Address Register      │ │
│  │ MDR - Memory Data Register         │ │
│  │ SP  - Stack Pointer                │ │
│  │ SR  - Status Register (Flags)      │ │
│  └────────────────────────────────────┘ │
└──────────────────────────────────────────┘

Register Functions:

  • Program Counter (PC): Holds address of next instruction
  • Instruction Register (IR): Holds current instruction being executed
  • Memory Address Register (MAR): Holds memory address to read/write
  • Memory Data Register (MDR): Holds data read from or written to memory
  • Stack Pointer (SP): Points to top of stack
  • Accumulator: Stores intermediate results (some architectures)

The Fetch-Decode-Execute Cycle

The fundamental operating cycle of a CPU:

     ┌──────────┐
     │  FETCH   │ ← Retrieve instruction from memory
     └────┬─────┘
          │
          ▼
     ┌──────────┐
     │  DECODE  │ ← Interpret the instruction
     └────┬─────┘
          │
          ▼
     ┌──────────┐
     │ EXECUTE  │ ← Perform the operation
     └────┬─────┘
          │
          └──────► (repeat)

Detailed Cycle Steps

1. FETCH

1. Copy PC to MAR
2. Send memory read signal
3. Copy instruction from memory to MDR
4. Copy MDR to IR
5. Increment PC

2. DECODE

1. Control unit examines IR
2. Determines operation (opcode)
3. Identifies operands
4. Prepares control signals

3. EXECUTE

1. Activate appropriate functional units
2. Perform the operation (ALU, memory access, etc.)
3. Store result in register or memory
4. Update status flags if necessary

Example: ADD Instruction

Instruction: ADD R1, R2, R3 (R1 = R2 + R3)

Cycle:

┌─────────┬──────────────────────────────────────────┐
│ FETCH   │ 1. MAR ← PC (e.g., 0x1000)              │
│         │ 2. MDR ← Memory[MAR]                     │
│         │ 3. IR ← MDR (instruction loaded)         │
│         │ 4. PC ← PC + 4 (next instruction)        │
├─────────┼──────────────────────────────────────────┤
│ DECODE  │ 1. Opcode = ADD                          │
│         │ 2. Source1 = R2                          │
│         │ 3. Source2 = R3                          │
│         │ 4. Destination = R1                      │
├─────────┼──────────────────────────────────────────┤
│ EXECUTE │ 1. Read values from R2 and R3            │
│         │ 2. Send to ALU with ADD operation        │
│         │ 3. ALU computes result                   │
│         │ 4. Write result to R1                    │
│         │ 5. Update flags (Z, C, N, V)             │
└─────────┴──────────────────────────────────────────┘

Instruction Set Architecture (ISA)

The ISA defines the interface between software and hardware.

Components of an Instruction

┌─────────┬──────────┬──────────┬──────────┐
│ Opcode  │ Operand1 │ Operand2 │ Operand3 │
└─────────┴──────────┴──────────┴──────────┘

Opcode:   What operation to perform
Operands: Where to get data / where to store result

Instruction Types

1. Data Movement

LOAD  R1, [address]   ; Load from memory
STORE R1, [address]   ; Store to memory
MOVE  R1, R2          ; Copy between registers

2. Arithmetic

ADD   R1, R2, R3      ; R1 = R2 + R3
SUB   R1, R2, R3      ; R1 = R2 - R3
MUL   R1, R2, R3      ; R1 = R2 × R3
DIV   R1, R2, R3      ; R1 = R2 ÷ R3
INC   R1              ; R1 = R1 + 1

3. Logical

AND   R1, R2, R3      ; R1 = R2 & R3
OR    R1, R2, R3      ; R1 = R2 | R3
XOR   R1, R2, R3      ; R1 = R2 ^ R3
NOT   R1, R2          ; R1 = ~R2

4. Shift/Rotate

SHL   R1, R2, 2       ; Shift left 2 bits
SHR   R1, R2, 2       ; Shift right 2 bits
ROL   R1, R2, 1       ; Rotate left 1 bit

5. Control Flow

JMP   label           ; Unconditional jump
JZ    label           ; Jump if zero
JNZ   label           ; Jump if not zero
CALL  function        ; Function call
RET                   ; Return from function

6. Comparison

CMP   R1, R2          ; Compare R1 and R2 (sets flags)
TEST  R1, R2          ; Bitwise AND without storing result

Addressing Modes

Different ways to specify operands:

1. Immediate Addressing

LOAD R1, #5          ; R1 = 5 (value is in instruction)

2. Direct/Absolute Addressing

LOAD R1, [1000]      ; R1 = Memory[1000]

3. Register Addressing

ADD R1, R2, R3       ; R1 = R2 + R3

4. Indirect Addressing

LOAD R1, [R2]        ; R1 = Memory[R2]

5. Indexed Addressing

LOAD R1, [R2 + 100]  ; R1 = Memory[R2 + 100]
                     ; (base + offset)

6. Register Indirect with Offset

LOAD R1, [R2 + R3]   ; R1 = Memory[R2 + R3]

Machine Code and Encoding

Instructions are encoded as binary numbers.

Example: Simple 16-bit Instruction Format

┌────────┬──────┬──────┬──────┐
│ Opcode │  Rd  │  Rs  │  Rt  │
│ 4 bits │ 4 bits│4 bits│4 bits│
└────────┴──────┴──────┴──────┘

Example: ADD R1, R2, R3

Opcode: 0001 (ADD)
Rd:     0001 (R1 - destination)
Rs:     0010 (R2 - source 1)
Rt:     0011 (R3 - source 2)

Machine Code: 0001 0001 0010 0011 = 0x1123

Instruction Formats

Fixed-Length (RISC)

  • All instructions same size (e.g., 32 bits)
  • Simpler decoding
  • May waste bits for simple operations

Variable-Length (CISC)

  • Instructions can be different sizes
  • More efficient encoding
  • More complex decoding

CISC vs RISC

CISC (Complex Instruction Set Computer)

Examples: x86, x86-64

Characteristics:

  • Large instruction set (hundreds of instructions)
  • Variable-length instructions
  • Complex addressing modes
  • Instructions can take multiple cycles
  • Fewer instructions per program

Example CISC Instructions:

MULT [R1], [R2+100]  ; Memory[R1] = Memory[R1] * Memory[R2+100]
                     ; (single instruction does a lot)

Advantages:

  • Fewer instructions for complex operations
  • Smaller code size
  • Better use of memory bandwidth

Disadvantages:

  • Complex hardware
  • Harder to pipeline
  • Variable execution times

RISC (Reduced Instruction Set Computer)

Examples: ARM, MIPS, RISC-V

Characteristics:

  • Small instruction set (< 100 instructions)
  • Fixed-length instructions
  • Simple addressing modes
  • Most instructions execute in one cycle
  • Load/Store architecture (only LOAD/STORE access memory)

Example RISC Instructions:

LOAD  R1, [R2]       ; R1 = Memory[R2]
LOAD  R3, [R4+100]   ; R3 = Memory[R4+100]
MULT  R1, R1, R3     ; R1 = R1 * R3
STORE R1, [R2]       ; Memory[R2] = R1

; (same operation as CISC example, but 4 instructions)

Advantages:

  • Simpler hardware
  • Easier to pipeline
  • Predictable execution time
  • Lower power consumption

Disadvantages:

  • More instructions per program
  • Larger code size

Modern Trend: Hybrid Approach

Modern processors (even x86) use RISC-like execution internally:

  • Decode complex instructions into micro-operations
  • Execute micro-ops on RISC-like core
  • Get benefits of both approaches

CPU Performance

Clock Speed

The CPU clock synchronizes operations.

Clock Cycle:
     ┌──┐  ┌──┐  ┌──┐  ┌──┐
─────┘  └──┘  └──┘  └──┘  └─────

Frequency: Number of cycles per second (Hz)
Period:    Time per cycle (seconds)

1 GHz = 1,000,000,000 cycles/second
Period = 1/frequency = 1 nanosecond

Clock Speed Examples:

  • 1 GHz = 1 billion cycles/second
  • 3.5 GHz = 3.5 billion cycles/second

Note: Higher clock speed ≠ always faster (architecture matters!)

Cycles Per Instruction (CPI)

Different instructions take different numbers of cycles:

Simple instruction (ADD):    1 cycle
Memory load (LOAD):          3-5 cycles
Division (DIV):              10-40 cycles

Performance Metrics

Execution Time:

Time = (Instructions × CPI × Clock Period)

Or:
Time = (Instructions × CPI) / Clock Frequency

Example:

Program: 1,000,000 instructions
Average CPI: 2
Clock: 2 GHz

Time = (1,000,000 × 2) / (2 × 10⁹)
     = 2,000,000 / 2,000,000,000
     = 0.001 seconds = 1 millisecond

MIPS (Million Instructions Per Second):

MIPS = Clock Frequency (MHz) / CPI

Example:
2000 MHz / 2 CPI = 1000 MIPS

Pipelining

Pipelining overlaps instruction execution for better throughput.

Non-Pipelined Execution

Time: 1    2    3    4    5    6    7    8    9    10   11   12
      ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
Inst1 │ F  │ D  │ E  │    │    │    │    │    │    │    │    │
      ├────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
Inst2 │    │    │    │ F  │ D  │ E  │    │    │    │    │    │
      ├────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
Inst3 │    │    │    │    │    │    │ F  │ D  │ E  │    │    │
      ├────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
Inst4 │    │    │    │    │    │    │    │    │    │ F  │ D  │
      └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

F = Fetch, D = Decode, E = Execute
Total time: 11 cycles for 4 instructions

Pipelined Execution

Time: 1    2    3    4    5    6    7
      ┌────┬────┬────┬────┬────┬────┐
Inst1 │ F  │ D  │ E  │    │    │    │
      ├────┼────┼────┼────┼────┼────┤
Inst2 │    │ F  │ D  │ E  │    │    │
      ├────┼────┼────┼────┼────┼────┤
Inst3 │    │    │ F  │ D  │ E  │    │
      ├────┼────┼────┼────┼────┼────┤
Inst4 │    │    │    │ F  │ D  │ E  │
      └────┴────┴────┴────┴────┴────┘

Total time: 6 cycles for 4 instructions
Speedup: 11/6 = 1.83×

Ideal Speedup: For n stages, speedup approaches n for long programs.

Pipeline Stages (5-stage classic RISC)

1. IF  - Instruction Fetch
2. ID  - Instruction Decode
3. EX  - Execute
4. MEM - Memory Access
5. WB  - Write Back

┌────┬────┬────┬────┬────┐
│ IF │ ID │ EX │MEM │ WB │  Instruction 1
└────┴────┴────┴────┴────┘
     ┌────┬────┬────┬────┬────┐
     │ IF │ ID │ EX │MEM │ WB │  Instruction 2
     └────┴────┴────┴────┴────┘
          ┌────┬────┬────┬────┬────┐
          │ IF │ ID │ EX │MEM │ WB │  Instruction 3
          └────┴────┴────┴────┴────┘

Pipeline Hazards

1. Structural Hazards

  • Resource conflict (e.g., memory access)
  • Solution: Separate instruction/data caches

2. Data Hazards

ADD R1, R2, R3   ; R1 = R2 + R3
SUB R4, R1, R5   ; R4 = R1 - R5  (needs R1 from previous!)
  • Solutions: Forwarding, stalling, reordering

3. Control Hazards

BEQ R1, R2, label  ; Branch instruction
ADD R3, R4, R5     ; What if we branch? This shouldn't execute!
  • Solutions: Branch prediction, delayed branches

Multi-Core and Parallel Processing

Modern CPUs have multiple cores:

┌─────────────────────────────────────┐
│           CPU Package               │
│                                     │
│  ┌─────────┐      ┌─────────┐     │
│  │ Core 0  │      │ Core 1  │     │
│  │  ┌───┐  │      │  ┌───┐  │     │
│  │  │ALU│  │      │  │ALU│  │     │
│  │  ├───┤  │      │  ├───┤  │     │
│  │  │Reg│  │      │  │Reg│  │     │
│  │  └───┘  │      │  └───┘  │     │
│  │ L1 Cache│      │ L1 Cache│     │
│  └────┬────┘      └────┬────┘     │
│       │                │          │
│       └────┬──── ──────┘          │
│            │                       │
│       ┌────▼────┐                 │
│       │ L2/L3   │                 │
│       │ Cache   │                 │
│       └─────────┘                 │
└─────────────────────────────────────┘

Benefits:

  • True parallel execution
  • Better multitasking
  • Higher throughput

Challenges:

  • Synchronization
  • Cache coherency
  • Programming complexity

Practical Example: Tracing Execution

Simple Program

; Calculate: result = (a + b) - c
LOAD  R1, [a_addr]    ; R1 = a
LOAD  R2, [b_addr]    ; R2 = b
ADD   R3, R1, R2      ; R3 = a + b
LOAD  R4, [c_addr]    ; R4 = c
SUB   R5, R3, R4      ; R5 = (a+b) - c
STORE R5, [result]    ; result = R5

Detailed Trace

Initial State:

Memory[a_addr] = 10
Memory[b_addr] = 20
Memory[c_addr] = 5
PC = 0x1000

Instruction 1: LOAD R1, [a_addr]

FETCH:   IR ← Memory[PC], PC ← PC + 4
DECODE:  Opcode = LOAD, Dest = R1, Addr = a_addr
EXECUTE: MAR ← a_addr, MDR ← Memory[MAR], R1 ← MDR
Result:  R1 = 10

Instruction 2: LOAD R2, [b_addr]

Result: R2 = 20

Instruction 3: ADD R3, R1, R2

FETCH:   IR ← Memory[PC], PC ← PC + 4
DECODE:  Opcode = ADD, Dest = R3, Src1 = R1, Src2 = R2
EXECUTE: ALU ← R1 + R2, R3 ← ALU_Result
Result:  R3 = 30

Instruction 4: LOAD R4, [c_addr]

Result: R4 = 5

Instruction 5: SUB R5, R3, R4

EXECUTE: ALU ← R3 - R4, R5 ← ALU_Result
Result:  R5 = 25

Instruction 6: STORE R5, [result]

EXECUTE: MAR ← result, MDR ← R5, Memory[MAR] ← MDR
Result:  Memory[result] = 25

Programming Example

Python: Simple CPU Simulator

class SimpleCPU:
    def __init__(self):
        self.registers = [0] * 8  # 8 general-purpose registers
        self.pc = 0               # Program counter
        self.ir = 0               # Instruction register
        self.flags = {            # Status flags
            'Z': 0,  # Zero
            'N': 0,  # Negative
            'C': 0,  # Carry
            'V': 0   # Overflow
        }
        self.memory = [0] * 1024  # 1KB memory
        self.running = True

    def fetch(self):
        """Fetch instruction from memory"""
        self.ir = self.memory[self.pc]
        self.pc += 1

    def decode_execute(self):
        """Decode and execute instruction"""
        opcode = (self.ir >> 12) & 0xF
        rd = (self.ir >> 8) & 0xF
        rs = (self.ir >> 4) & 0xF
        rt = self.ir & 0xF

        if opcode == 0x0:    # HALT
            self.running = False
        elif opcode == 0x1:  # ADD
            result = self.registers[rs] + self.registers[rt]
            self.registers[rd] = result & 0xFFFF
            self.update_flags(result)
        elif opcode == 0x2:  # SUB
            result = self.registers[rs] - self.registers[rt]
            self.registers[rd] = result & 0xFFFF
            self.update_flags(result)
        elif opcode == 0x3:  # LOAD immediate
            value = self.ir & 0xFF
            self.registers[rd] = value
        elif opcode == 0x4:  # STORE
            addr = self.registers[rs]
            self.memory[addr] = self.registers[rd]

    def update_flags(self, result):
        """Update status flags based on result"""
        self.flags['Z'] = 1 if result == 0 else 0
        self.flags['N'] = 1 if result < 0 else 0

    def run(self):
        """Run the fetch-decode-execute cycle"""
        cycle = 0
        while self.running and cycle < 1000:
            print(f"\n--- Cycle {cycle} ---")
            print(f"PC: {self.pc}, IR: {hex(self.ir)}")

            self.fetch()
            print(f"Fetched: {hex(self.ir)} from address {self.pc - 1}")

            self.decode_execute()
            print(f"Registers: {self.registers[:4]}")
            print(f"Flags: Z={self.flags['Z']}, N={self.flags['N']}")

            cycle += 1

# Example usage
cpu = SimpleCPU()

# Program: Add two numbers
# LOAD R0, #10
# LOAD R1, #20
# ADD R2, R0, R1
# HALT

cpu.memory[0] = 0x300A  # LOAD R0, #10
cpu.memory[1] = 0x3114  # LOAD R1, #20
cpu.memory[2] = 0x1201  # ADD R2, R0, R1
cpu.memory[3] = 0x0000  # HALT

cpu.run()
print(f"\nFinal result in R2: {cpu.registers[2]}")

Exercises

Basic Exercises

  1. Fetch-Decode-Execute

    • Trace the execution of: LOAD R1, #50; ADD R2, R1, R1
    • Identify which CPU component is active at each stage
    • Draw a timing diagram for the execution
  2. Instruction Encoding

    • Encode ADD R3, R1, R2 in a 16-bit format (4-bit opcode, 4-bit per register)
    • Decode: 0x2315 (opcode: 2=SUB, registers: 3,1,5)
    • Design a 3-bit opcode instruction set with 8 operations
  3. Registers

    • Why are registers faster than RAM?
    • What happens to PC after a jump instruction?
    • Trace PC values for a 5-instruction sequence with one jump

Intermediate Exercises

  1. Pipeline Analysis

    • Draw a pipeline diagram for 6 instructions (3-stage pipeline)
    • Calculate speedup vs. non-pipelined execution
    • Identify a data hazard in: ADD R1,R2,R3; SUB R4,R1,R5
  2. Performance Calculation

    • Program: 1M instructions, CPI=1.5, Clock=3GHz. Find execution time.
    • Compare: 2GHz RISC (CPI=1) vs. 3GHz CISC (CPI=3) for same program
    • Calculate MIPS rating for a 2.5GHz processor with CPI of 2
  3. ISA Design

    • Design addressing modes for: array access, function call, global variable
    • Compare instruction count: RISC vs CISC for a simple loop
    • Justify LOAD/STORE architecture advantages

Advanced Exercises

  1. Complex Tracing

    • Trace execution with branch: if (a>b) c=a else c=b
    • Show pipeline diagram including a branch misprediction
    • Calculate performance impact of branch misprediction
  2. Architecture Comparison

    • Design both CISC and RISC instruction sets for: array[i] = array[i] + 5
    • Compare code size, instruction count, and complexity
    • Analyze trade-offs for embedded vs desktop systems
  3. Multi-Core

    • Design a task distribution strategy for 4 cores
    • Identify where cache coherency problems might occur
    • Calculate theoretical speedup (consider Amdahl's Law)
  4. CPU Design

    • Design a minimal ISA (8 instructions) for basic computation
    • Create a state machine for a 3-stage pipeline
    • Estimate logic gate count for a simple ALU (4-bit, ADD/SUB/AND/OR)

Summary

In this reading, we explored the architecture and operation of the CPU:

Key Concepts:

  • Von Neumann architecture provides the foundation for modern computers
  • CPU components: ALU (computation), Control Unit (coordination), Registers (fast storage)
  • Fetch-Decode-Execute cycle is the fundamental operation loop
  • ISA defines the interface between hardware and software
  • Pipelining improves throughput by overlapping instruction execution
  • CISC vs RISC represent different design philosophies

Important Skills:

  • Tracing instruction execution through the CPU
  • Understanding how instructions are encoded
  • Analyzing performance based on clock speed, CPI, and instruction count
  • Recognizing pipeline hazards and their solutions

Why This Matters:

  • Foundation for understanding how programs actually execute
  • Essential for writing performance-critical code
  • Helps debug low-level issues
  • Critical for systems programming and compiler design

Further Reading

  • "Computer Organization and Design" by Patterson & Hennessy
  • Study specific architectures: x86-64, ARM, RISC-V
  • Superscalar execution and out-of-order processing
  • Branch prediction algorithms

Next Steps

Now that you understand how the CPU processes instructions, you're ready to learn about the memory hierarchy that stores the data and instructions the CPU needs.

Next Reading: 04-memory.md - Memory Hierarchy


Module 5: Computer Architecture | Reading 3 of 5