Parsing & Abstract Syntax Trees

The parser is the second stage of our interpreter. It takes the flat stream of tokens from the lexer and builds a tree structure that represents the program's structure - the Abstract Syntax Tree (AST).

What is Parsing?

Parsing transforms tokens into a hierarchical structure:

Tokens:  [Let, Identifier("x"), Equal, Number(5), Plus, Number(3)]

AST:
LetStmt {
    name: "x",
    value: BinaryExpr {
        left: Number(5),
        op: Plus,
        right: Number(3)
    }
}

Why Trees?

Programs have nested structure:

if x > 5 {           // Outer if statement
    if y < 10 {      // Nested if statement
        print x + y; // Nested expression
    }
}

A tree naturally represents this nesting.

Defining the AST

Add to src/ast.rs:

// Expression nodes
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
    // Literals
    Number(f64),
    String(String),
    Bool(bool),
    Nil,

    // Variable reference
    Identifier(String),

    // Unary operation: -x, not x
    Unary {
        op: UnaryOp,
        expr: Box<Expr>,
    },

    // Binary operation: x + y, x == y
    Binary {
        left: Box<Expr>,
        op: BinaryOp,
        right: Box<Expr>,
    },

    // Grouping: (expr)
    Grouping(Box<Expr>),

    // Assignment: x = value
    Assign {
        name: String,
        value: Box<Expr>,
    },

    // Function call: func(arg1, arg2)
    Call {
        callee: Box<Expr>,
        args: Vec<Expr>,
    },

    // List indexing: list[index]
    Index {
        object: Box<Expr>,
        index: Box<Expr>,
    },

    // List literal: [1, 2, 3]
    List(Vec<Expr>),

    // Anonymous function: fn(x) { return x * 2; }
    Lambda {
        params: Vec<String>,
        body: Vec<Stmt>,
    },
}

// Statement nodes
#[derive(Debug, Clone, PartialEq)]
pub enum Stmt {
    // Expression statement: expr;
    Expression(Expr),

    // Variable declaration: let x = expr;
    Let {
        name: String,
        value: Expr,
    },

    // Print statement: print expr;
    Print(Expr),

    // Block: { stmt1; stmt2; }
    Block(Vec<Stmt>),

    // If statement: if expr { block } else { block }
    If {
        condition: Expr,
        then_branch: Box<Stmt>,
        else_branch: Option<Box<Stmt>>,
    },

    // While loop: while expr { block }
    While {
        condition: Expr,
        body: Box<Stmt>,
    },

    // For loop: for item in iterable { block }
    For {
        variable: String,
        iterable: Expr,
        body: Box<Stmt>,
    },

    // Function declaration: fn name(params) { body }
    Function {
        name: String,
        params: Vec<String>,
        body: Vec<Stmt>,
    },

    // Return statement: return expr;
    Return(Option<Expr>),
}

// Operators
#[derive(Debug, Clone, PartialEq)]
pub enum UnaryOp {
    Minus,  // -
    Not,    // not
}

#[derive(Debug, Clone, PartialEq)]
pub enum BinaryOp {
    // Arithmetic
    Add,      // +
    Subtract, // -
    Multiply, // *
    Divide,   // /
    Modulo,   // %

    // Comparison
    Equal,        // ==
    NotEqual,     // !=
    Less,         // <
    LessEqual,    // <=
    Greater,      // >
    GreaterEqual, // >=

    // Logical
    And, // and
    Or,  // or
}

Recursive Descent Parsing

We'll use recursive descent parsing - each grammar rule becomes a function.

Operator Precedence

Operators have different priorities:

PrecedenceOperatorsExample
1 (lowest)ora or b
2anda and b
3== !=a == b
4< > <= >=a < b
5+ -a + b
6* / %a * b
7 (highest)- not-a, not a

Expression 2 + 3 * 4 parses as 2 + (3 * 4), not (2 + 3) * 4.

The Parser Structure

Create src/parser.rs:

use crate::ast::*;

pub struct Parser {
    tokens: Vec<TokenInfo>,
    current: usize,
}

impl Parser {
    pub fn new(tokens: Vec<TokenInfo>) -> Self {
        Parser { tokens, current: 0 }
    }

    /// Parse the entire program
    pub fn parse(&mut self) -> Result<Vec<Stmt>, String> {
        let mut statements = Vec::new();

        while !self.is_at_end() {
            statements.push(self.parse_statement()?);
        }

        Ok(statements)
    }

    // ========== Statements ==========

    fn parse_statement(&mut self) -> Result<Stmt, String> {
        match &self.peek().token {
            Token::Let => self.parse_let(),
            Token::Print => self.parse_print(),
            Token::LeftBrace => self.parse_block(),
            Token::If => self.parse_if(),
            Token::While => self.parse_while(),
            Token::For => self.parse_for(),
            Token::Fn => self.parse_function(),
            Token::Return => self.parse_return(),
            _ => self.parse_expression_statement(),
        }
    }

    fn parse_let(&mut self) -> Result<Stmt, String> {
        self.consume(Token::Let, "Expected 'let'")?;

        let name = if let Token::Identifier(n) = &self.peek().token {
            let name = n.clone();
            self.advance();
            name
        } else {
            return Err(self.error("Expected identifier after 'let'"));
        };

        self.consume(Token::Equal, "Expected '=' after variable name")?;
        let value = self.parse_expression()?;
        self.consume(Token::Semicolon, "Expected ';' after value")?;

        Ok(Stmt::Let { name, value })
    }

    fn parse_print(&mut self) -> Result<Stmt, String> {
        self.consume(Token::Print, "Expected 'print'")?;
        let expr = self.parse_expression()?;
        self.consume(Token::Semicolon, "Expected ';' after expression")?;
        Ok(Stmt::Print(expr))
    }

    fn parse_block(&mut self) -> Result<Stmt, String> {
        self.consume(Token::LeftBrace, "Expected '{'")?;
        let mut statements = Vec::new();

        while !self.check(&Token::RightBrace) && !self.is_at_end() {
            statements.push(self.parse_statement()?);
        }

        self.consume(Token::RightBrace, "Expected '}' after block")?;
        Ok(Stmt::Block(statements))
    }

    fn parse_if(&mut self) -> Result<Stmt, String> {
        self.consume(Token::If, "Expected 'if'")?;
        let condition = self.parse_expression()?;
        let then_branch = Box::new(self.parse_block()?);

        let else_branch = if self.matches(&Token::Else) {
            Some(Box::new(if self.check(&Token::If) {
                self.parse_if()?
            } else {
                self.parse_block()?
            }))
        } else {
            None
        };

        Ok(Stmt::If {
            condition,
            then_branch,
            else_branch,
        })
    }

    fn parse_while(&mut self) -> Result<Stmt, String> {
        self.consume(Token::While, "Expected 'while'")?;
        let condition = self.parse_expression()?;
        let body = Box::new(self.parse_block()?);
        Ok(Stmt::While { condition, body })
    }

    fn parse_for(&mut self) -> Result<Stmt, String> {
        self.consume(Token::For, "Expected 'for'")?;

        let variable = if let Token::Identifier(n) = &self.peek().token {
            let name = n.clone();
            self.advance();
            name
        } else {
            return Err(self.error("Expected identifier after 'for'"));
        };

        self.consume(Token::In, "Expected 'in' after loop variable")?;
        let iterable = self.parse_expression()?;
        let body = Box::new(self.parse_block()?);

        Ok(Stmt::For {
            variable,
            iterable,
            body,
        })
    }

    fn parse_function(&mut self) -> Result<Stmt, String> {
        self.consume(Token::Fn, "Expected 'fn'")?;

        let name = if let Token::Identifier(n) = &self.peek().token {
            let name = n.clone();
            self.advance();
            name
        } else {
            return Err(self.error("Expected function name"));
        };

        self.consume(Token::LeftParen, "Expected '(' after function name")?;

        let mut params = Vec::new();
        if !self.check(&Token::RightParen) {
            loop {
                if let Token::Identifier(p) = &self.peek().token {
                    params.push(p.clone());
                    self.advance();
                } else {
                    return Err(self.error("Expected parameter name"));
                }

                if !self.matches(&Token::Comma) {
                    break;
                }
            }
        }

        self.consume(Token::RightParen, "Expected ')' after parameters")?;
        self.consume(Token::LeftBrace, "Expected '{' before function body")?;

        let mut body = Vec::new();
        while !self.check(&Token::RightBrace) && !self.is_at_end() {
            body.push(self.parse_statement()?);
        }

        self.consume(Token::RightBrace, "Expected '}' after function body")?;

        Ok(Stmt::Function { name, params, body })
    }

    fn parse_return(&mut self) -> Result<Stmt, String> {
        self.consume(Token::Return, "Expected 'return'")?;

        let value = if self.check(&Token::Semicolon) {
            None
        } else {
            Some(self.parse_expression()?)
        };

        self.consume(Token::Semicolon, "Expected ';' after return value")?;
        Ok(Stmt::Return(value))
    }

    fn parse_expression_statement(&mut self) -> Result<Stmt, String> {
        let expr = self.parse_expression()?;
        self.consume(Token::Semicolon, "Expected ';' after expression")?;
        Ok(Stmt::Expression(expr))
    }

    // ========== Expressions (with precedence) ==========

    fn parse_expression(&mut self) -> Result<Expr, String> {
        self.parse_assignment()
    }

    fn parse_assignment(&mut self) -> Result<Expr, String> {
        let expr = self.parse_or()?;

        if self.matches(&Token::Equal) {
            if let Expr::Identifier(name) = expr {
                let value = Box::new(self.parse_assignment()?);
                return Ok(Expr::Assign { name, value });
            } else {
                return Err(self.error("Invalid assignment target"));
            }
        }

        Ok(expr)
    }

    fn parse_or(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_and()?;

        while self.matches(&Token::Or) {
            let right = Box::new(self.parse_and()?);
            expr = Expr::Binary {
                left: Box::new(expr),
                op: BinaryOp::Or,
                right,
            };
        }

        Ok(expr)
    }

    fn parse_and(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_equality()?;

        while self.matches(&Token::And) {
            let right = Box::new(self.parse_equality()?);
            expr = Expr::Binary {
                left: Box::new(expr),
                op: BinaryOp::And,
                right,
            };
        }

        Ok(expr)
    }

    fn parse_equality(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_comparison()?;

        while let Some(op) = self.match_tokens(&[Token::EqualEqual, Token::BangEqual]) {
            let binary_op = match op {
                Token::EqualEqual => BinaryOp::Equal,
                Token::BangEqual => BinaryOp::NotEqual,
                _ => unreachable!(),
            };
            let right = Box::new(self.parse_comparison()?);
            expr = Expr::Binary {
                left: Box::new(expr),
                op: binary_op,
                right,
            };
        }

        Ok(expr)
    }

    fn parse_comparison(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_term()?;

        while let Some(op) = self.match_tokens(&[
            Token::Less,
            Token::LessEqual,
            Token::Greater,
            Token::GreaterEqual,
        ]) {
            let binary_op = match op {
                Token::Less => BinaryOp::Less,
                Token::LessEqual => BinaryOp::LessEqual,
                Token::Greater => BinaryOp::Greater,
                Token::GreaterEqual => BinaryOp::GreaterEqual,
                _ => unreachable!(),
            };
            let right = Box::new(self.parse_term()?);
            expr = Expr::Binary {
                left: Box::new(expr),
                op: binary_op,
                right,
            };
        }

        Ok(expr)
    }

    fn parse_term(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_factor()?;

        while let Some(op) = self.match_tokens(&[Token::Plus, Token::Minus]) {
            let binary_op = match op {
                Token::Plus => BinaryOp::Add,
                Token::Minus => BinaryOp::Subtract,
                _ => unreachable!(),
            };
            let right = Box::new(self.parse_factor()?);
            expr = Expr::Binary {
                left: Box::new(expr),
                op: binary_op,
                right,
            };
        }

        Ok(expr)
    }

    fn parse_factor(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_unary()?;

        while let Some(op) = self.match_tokens(&[Token::Star, Token::Slash, Token::Percent]) {
            let binary_op = match op {
                Token::Star => BinaryOp::Multiply,
                Token::Slash => BinaryOp::Divide,
                Token::Percent => BinaryOp::Modulo,
                _ => unreachable!(),
            };
            let right = Box::new(self.parse_unary()?);
            expr = Expr::Binary {
                left: Box::new(expr),
                op: binary_op,
                right,
            };
        }

        Ok(expr)
    }

    fn parse_unary(&mut self) -> Result<Expr, String> {
        if let Some(op) = self.match_tokens(&[Token::Minus, Token::Not, Token::Bang]) {
            let unary_op = match op {
                Token::Minus => UnaryOp::Minus,
                Token::Not | Token::Bang => UnaryOp::Not,
                _ => unreachable!(),
            };
            let expr = Box::new(self.parse_unary()?);
            return Ok(Expr::Unary { op: unary_op, expr });
        }

        self.parse_call()
    }

    fn parse_call(&mut self) -> Result<Expr, String> {
        let mut expr = self.parse_primary()?;

        loop {
            if self.matches(&Token::LeftParen) {
                expr = self.finish_call(expr)?;
            } else if self.matches(&Token::LeftBracket) {
                let index = Box::new(self.parse_expression()?);
                self.consume(Token::RightBracket, "Expected ']' after index")?;
                expr = Expr::Index {
                    object: Box::new(expr),
                    index,
                };
            } else {
                break;
            }
        }

        Ok(expr)
    }

    fn finish_call(&mut self, callee: Expr) -> Result<Expr, String> {
        let mut args = Vec::new();

        if !self.check(&Token::RightParen) {
            loop {
                args.push(self.parse_expression()?);
                if !self.matches(&Token::Comma) {
                    break;
                }
            }
        }

        self.consume(Token::RightParen, "Expected ')' after arguments")?;

        Ok(Expr::Call {
            callee: Box::new(callee),
            args,
        })
    }

    fn parse_primary(&mut self) -> Result<Expr, String> {
        match &self.peek().token.clone() {
            Token::True => {
                self.advance();
                Ok(Expr::Bool(true))
            }
            Token::False => {
                self.advance();
                Ok(Expr::Bool(false))
            }
            Token::Nil => {
                self.advance();
                Ok(Expr::Nil)
            }
            Token::Number(n) => {
                let num = *n;
                self.advance();
                Ok(Expr::Number(num))
            }
            Token::String(s) => {
                let str = s.clone();
                self.advance();
                Ok(Expr::String(str))
            }
            Token::Identifier(name) => {
                let id = name.clone();
                self.advance();
                Ok(Expr::Identifier(id))
            }
            Token::LeftParen => {
                self.advance();
                let expr = self.parse_expression()?;
                self.consume(Token::RightParen, "Expected ')' after expression")?;
                Ok(Expr::Grouping(Box::new(expr)))
            }
            Token::LeftBracket => {
                self.advance();
                let mut elements = Vec::new();

                if !self.check(&Token::RightBracket) {
                    loop {
                        elements.push(self.parse_expression()?);
                        if !self.matches(&Token::Comma) {
                            break;
                        }
                    }
                }

                self.consume(Token::RightBracket, "Expected ']' after list elements")?;
                Ok(Expr::List(elements))
            }
            Token::Fn => {
                self.advance();
                self.consume(Token::LeftParen, "Expected '(' after 'fn'")?;

                let mut params = Vec::new();
                if !self.check(&Token::RightParen) {
                    loop {
                        if let Token::Identifier(p) = &self.peek().token {
                            params.push(p.clone());
                            self.advance();
                        } else {
                            return Err(self.error("Expected parameter name"));
                        }
                        if !self.matches(&Token::Comma) {
                            break;
                        }
                    }
                }

                self.consume(Token::RightParen, "Expected ')' after parameters")?;
                self.consume(Token::LeftBrace, "Expected '{' before function body")?;

                let mut body = Vec::new();
                while !self.check(&Token::RightBrace) && !self.is_at_end() {
                    body.push(self.parse_statement()?);
                }

                self.consume(Token::RightBrace, "Expected '}' after function body")?;

                Ok(Expr::Lambda { params, body })
            }
            _ => Err(self.error("Expected expression")),
        }
    }

    // ========== Helper methods ==========

    fn matches(&mut self, token: &Token) -> bool {
        if self.check(token) {
            self.advance();
            true
        } else {
            false
        }
    }

    fn match_tokens(&mut self, tokens: &[Token]) -> Option<Token> {
        for token in tokens {
            if self.check(token) {
                let matched = self.peek().token.clone();
                self.advance();
                return Some(matched);
            }
        }
        None
    }

    fn check(&self, token: &Token) -> bool {
        if self.is_at_end() {
            return false;
        }
        std::mem::discriminant(&self.peek().token) == std::mem::discriminant(token)
    }

    fn advance(&mut self) {
        if !self.is_at_end() {
            self.current += 1;
        }
    }

    fn is_at_end(&self) -> bool {
        matches!(self.peek().token, Token::Eof)
    }

    fn peek(&self) -> &TokenInfo {
        &self.tokens[self.current]
    }

    fn consume(&mut self, token: Token, message: &str) -> Result<(), String> {
        if self.check(&token) {
            self.advance();
            Ok(())
        } else {
            Err(self.error(message))
        }
    }

    fn error(&self, message: &str) -> String {
        let token_info = self.peek();
        format!(
            "{} at line {}, column {}",
            message, token_info.line, token_info.column
        )
    }
}

How the Parser Works

Precedence Climbing

Each precedence level has its own function:

parse_expression()     → assignment (=)
parse_assignment()     → or
parse_or()             → and
parse_and()            → equality (==, !=)
parse_equality()       → comparison (<, >, <=, >=)
parse_comparison()     → term (+, -)
parse_term()           → factor (*, /, %)
parse_factor()         → unary (-, not)
parse_unary()          → call (function calls, indexing)
parse_call()           → primary (literals, identifiers)
parse_primary()

Example: Parsing 2 + 3 * 4

  1. parse_expression()parse_assignment()parse_or() → ... → parse_term()
  2. parse_term() sees 2, then +
  3. Recursively calls parse_factor() for right side
  4. parse_factor() sees 3, then *
  5. Recursively calls parse_unary() for 4
  6. Builds: Binary { left: 2, op: Add, right: Binary { left: 3, op: Multiply, right: 4 } }

Result: 2 + (3 * 4)

Testing the Parser

Update src/main.rs:

mod ast;
mod lexer;
mod parser;

use lexer::Lexer;
use parser::Parser;

fn main() {
    let source = r#"
        let x = 5 + 3 * 2;
        
        fn factorial(n) {
            if n <= 1 {
                return 1;
            }
            return n * factorial(n - 1);
        }
        
        print factorial(5);
    "#;

    // Lexing
    let mut lexer = Lexer::new(source.to_string());
    let tokens = match lexer.tokenize() {
        Ok(t) => t,
        Err(e) => {
            eprintln!("Lexer error: {}", e);
            return;
        }
    };

    // Parsing
    let mut parser = Parser::new(tokens);
    match parser.parse() {
        Ok(statements) => {
            println!("AST:");
            for stmt in statements {
                println!("{:#?}", stmt);
            }
        }
        Err(e) => {
            eprintln!("Parser error: {}", e);
        }
    }
}

Run it:

cargo run

Error Recovery

The parser detects syntax errors:

let x = ;  // Error: Expected expression at line 1, column 9
fn { }     // Error: Expected function name at line 1, column 4
if x       // Error: Expected '{' after condition at line 1, column 5

Common Pitfalls

1. Wrong Precedence Order

// ❌ Wrong: term before factor
fn parse_factor() { ... }
fn parse_term() { 
    let expr = parse_factor();  // Should be parse_unary()
    ...
}

// ✅ Correct: follow precedence hierarchy
fn parse_term() {
    let expr = parse_factor();
    ...
}
fn parse_factor() {
    let expr = parse_unary();
    ...
}

2. Forgetting to Box Recursive Types

// ❌ Won't compile: infinite size
pub enum Expr {
    Binary { left: Expr, right: Expr }  // Error!
}

// ✅ Correct: use Box
pub enum Expr {
    Binary { left: Box<Expr>, right: Box<Expr> }
}

3. Not Consuming Tokens

// ❌ Wrong: infinite loop
while self.check(&Token::Plus) {
    // forgot to advance!
}

// ✅ Correct
while self.matches(&Token::Plus) {  // matches() advances
    ...
}

Parser Summary

InputOutput
TokensAST (statements and expressions)
[Let, Identifier("x"), Equal, Number(5)]Stmt::Let { name: "x", value: Expr::Number(5) }

What's Next?

We have an AST! Now we need to execute it - the interpreter walks the tree and performs the operations.

→ Continue to Chapter 4: Interpreter

Quick Quiz

What AST does x = 2 + 3 produce?

Stmt::Expression(
    Expr::Assign {
        name: "x",
        value: Box::new(
            Expr::Binary {
                left: Box::new(Expr::Number(2)),
                op: BinaryOp::Add,
                right: Box::new(Expr::Number(3)),
            }
        ),
    }
)