Parsing & Abstract Syntax Trees
The parser is the second stage of our interpreter. It takes the flat stream of tokens from the lexer and builds a tree structure that represents the program's structure - the Abstract Syntax Tree (AST).
What is Parsing?
Parsing transforms tokens into a hierarchical structure:
Tokens: [Let, Identifier("x"), Equal, Number(5), Plus, Number(3)]
AST:
LetStmt {
name: "x",
value: BinaryExpr {
left: Number(5),
op: Plus,
right: Number(3)
}
}
Why Trees?
Programs have nested structure:
if x > 5 { // Outer if statement
if y < 10 { // Nested if statement
print x + y; // Nested expression
}
}
A tree naturally represents this nesting.
Defining the AST
Add to src/ast.rs:
// Expression nodes
#[derive(Debug, Clone, PartialEq)]
pub enum Expr {
// Literals
Number(f64),
String(String),
Bool(bool),
Nil,
// Variable reference
Identifier(String),
// Unary operation: -x, not x
Unary {
op: UnaryOp,
expr: Box<Expr>,
},
// Binary operation: x + y, x == y
Binary {
left: Box<Expr>,
op: BinaryOp,
right: Box<Expr>,
},
// Grouping: (expr)
Grouping(Box<Expr>),
// Assignment: x = value
Assign {
name: String,
value: Box<Expr>,
},
// Function call: func(arg1, arg2)
Call {
callee: Box<Expr>,
args: Vec<Expr>,
},
// List indexing: list[index]
Index {
object: Box<Expr>,
index: Box<Expr>,
},
// List literal: [1, 2, 3]
List(Vec<Expr>),
// Anonymous function: fn(x) { return x * 2; }
Lambda {
params: Vec<String>,
body: Vec<Stmt>,
},
}
// Statement nodes
#[derive(Debug, Clone, PartialEq)]
pub enum Stmt {
// Expression statement: expr;
Expression(Expr),
// Variable declaration: let x = expr;
Let {
name: String,
value: Expr,
},
// Print statement: print expr;
Print(Expr),
// Block: { stmt1; stmt2; }
Block(Vec<Stmt>),
// If statement: if expr { block } else { block }
If {
condition: Expr,
then_branch: Box<Stmt>,
else_branch: Option<Box<Stmt>>,
},
// While loop: while expr { block }
While {
condition: Expr,
body: Box<Stmt>,
},
// For loop: for item in iterable { block }
For {
variable: String,
iterable: Expr,
body: Box<Stmt>,
},
// Function declaration: fn name(params) { body }
Function {
name: String,
params: Vec<String>,
body: Vec<Stmt>,
},
// Return statement: return expr;
Return(Option<Expr>),
}
// Operators
#[derive(Debug, Clone, PartialEq)]
pub enum UnaryOp {
Minus, // -
Not, // not
}
#[derive(Debug, Clone, PartialEq)]
pub enum BinaryOp {
// Arithmetic
Add, // +
Subtract, // -
Multiply, // *
Divide, // /
Modulo, // %
// Comparison
Equal, // ==
NotEqual, // !=
Less, // <
LessEqual, // <=
Greater, // >
GreaterEqual, // >=
// Logical
And, // and
Or, // or
}
Recursive Descent Parsing
We'll use recursive descent parsing - each grammar rule becomes a function.
Operator Precedence
Operators have different priorities:
| Precedence | Operators | Example |
|---|---|---|
| 1 (lowest) | or | a or b |
| 2 | and | a and b |
| 3 | == != | a == b |
| 4 | < > <= >= | a < b |
| 5 | + - | a + b |
| 6 | * / % | a * b |
| 7 (highest) | - not | -a, not a |
Expression 2 + 3 * 4 parses as 2 + (3 * 4), not (2 + 3) * 4.
The Parser Structure
Create src/parser.rs:
use crate::ast::*;
pub struct Parser {
tokens: Vec<TokenInfo>,
current: usize,
}
impl Parser {
pub fn new(tokens: Vec<TokenInfo>) -> Self {
Parser { tokens, current: 0 }
}
/// Parse the entire program
pub fn parse(&mut self) -> Result<Vec<Stmt>, String> {
let mut statements = Vec::new();
while !self.is_at_end() {
statements.push(self.parse_statement()?);
}
Ok(statements)
}
// ========== Statements ==========
fn parse_statement(&mut self) -> Result<Stmt, String> {
match &self.peek().token {
Token::Let => self.parse_let(),
Token::Print => self.parse_print(),
Token::LeftBrace => self.parse_block(),
Token::If => self.parse_if(),
Token::While => self.parse_while(),
Token::For => self.parse_for(),
Token::Fn => self.parse_function(),
Token::Return => self.parse_return(),
_ => self.parse_expression_statement(),
}
}
fn parse_let(&mut self) -> Result<Stmt, String> {
self.consume(Token::Let, "Expected 'let'")?;
let name = if let Token::Identifier(n) = &self.peek().token {
let name = n.clone();
self.advance();
name
} else {
return Err(self.error("Expected identifier after 'let'"));
};
self.consume(Token::Equal, "Expected '=' after variable name")?;
let value = self.parse_expression()?;
self.consume(Token::Semicolon, "Expected ';' after value")?;
Ok(Stmt::Let { name, value })
}
fn parse_print(&mut self) -> Result<Stmt, String> {
self.consume(Token::Print, "Expected 'print'")?;
let expr = self.parse_expression()?;
self.consume(Token::Semicolon, "Expected ';' after expression")?;
Ok(Stmt::Print(expr))
}
fn parse_block(&mut self) -> Result<Stmt, String> {
self.consume(Token::LeftBrace, "Expected '{'")?;
let mut statements = Vec::new();
while !self.check(&Token::RightBrace) && !self.is_at_end() {
statements.push(self.parse_statement()?);
}
self.consume(Token::RightBrace, "Expected '}' after block")?;
Ok(Stmt::Block(statements))
}
fn parse_if(&mut self) -> Result<Stmt, String> {
self.consume(Token::If, "Expected 'if'")?;
let condition = self.parse_expression()?;
let then_branch = Box::new(self.parse_block()?);
let else_branch = if self.matches(&Token::Else) {
Some(Box::new(if self.check(&Token::If) {
self.parse_if()?
} else {
self.parse_block()?
}))
} else {
None
};
Ok(Stmt::If {
condition,
then_branch,
else_branch,
})
}
fn parse_while(&mut self) -> Result<Stmt, String> {
self.consume(Token::While, "Expected 'while'")?;
let condition = self.parse_expression()?;
let body = Box::new(self.parse_block()?);
Ok(Stmt::While { condition, body })
}
fn parse_for(&mut self) -> Result<Stmt, String> {
self.consume(Token::For, "Expected 'for'")?;
let variable = if let Token::Identifier(n) = &self.peek().token {
let name = n.clone();
self.advance();
name
} else {
return Err(self.error("Expected identifier after 'for'"));
};
self.consume(Token::In, "Expected 'in' after loop variable")?;
let iterable = self.parse_expression()?;
let body = Box::new(self.parse_block()?);
Ok(Stmt::For {
variable,
iterable,
body,
})
}
fn parse_function(&mut self) -> Result<Stmt, String> {
self.consume(Token::Fn, "Expected 'fn'")?;
let name = if let Token::Identifier(n) = &self.peek().token {
let name = n.clone();
self.advance();
name
} else {
return Err(self.error("Expected function name"));
};
self.consume(Token::LeftParen, "Expected '(' after function name")?;
let mut params = Vec::new();
if !self.check(&Token::RightParen) {
loop {
if let Token::Identifier(p) = &self.peek().token {
params.push(p.clone());
self.advance();
} else {
return Err(self.error("Expected parameter name"));
}
if !self.matches(&Token::Comma) {
break;
}
}
}
self.consume(Token::RightParen, "Expected ')' after parameters")?;
self.consume(Token::LeftBrace, "Expected '{' before function body")?;
let mut body = Vec::new();
while !self.check(&Token::RightBrace) && !self.is_at_end() {
body.push(self.parse_statement()?);
}
self.consume(Token::RightBrace, "Expected '}' after function body")?;
Ok(Stmt::Function { name, params, body })
}
fn parse_return(&mut self) -> Result<Stmt, String> {
self.consume(Token::Return, "Expected 'return'")?;
let value = if self.check(&Token::Semicolon) {
None
} else {
Some(self.parse_expression()?)
};
self.consume(Token::Semicolon, "Expected ';' after return value")?;
Ok(Stmt::Return(value))
}
fn parse_expression_statement(&mut self) -> Result<Stmt, String> {
let expr = self.parse_expression()?;
self.consume(Token::Semicolon, "Expected ';' after expression")?;
Ok(Stmt::Expression(expr))
}
// ========== Expressions (with precedence) ==========
fn parse_expression(&mut self) -> Result<Expr, String> {
self.parse_assignment()
}
fn parse_assignment(&mut self) -> Result<Expr, String> {
let expr = self.parse_or()?;
if self.matches(&Token::Equal) {
if let Expr::Identifier(name) = expr {
let value = Box::new(self.parse_assignment()?);
return Ok(Expr::Assign { name, value });
} else {
return Err(self.error("Invalid assignment target"));
}
}
Ok(expr)
}
fn parse_or(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_and()?;
while self.matches(&Token::Or) {
let right = Box::new(self.parse_and()?);
expr = Expr::Binary {
left: Box::new(expr),
op: BinaryOp::Or,
right,
};
}
Ok(expr)
}
fn parse_and(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_equality()?;
while self.matches(&Token::And) {
let right = Box::new(self.parse_equality()?);
expr = Expr::Binary {
left: Box::new(expr),
op: BinaryOp::And,
right,
};
}
Ok(expr)
}
fn parse_equality(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_comparison()?;
while let Some(op) = self.match_tokens(&[Token::EqualEqual, Token::BangEqual]) {
let binary_op = match op {
Token::EqualEqual => BinaryOp::Equal,
Token::BangEqual => BinaryOp::NotEqual,
_ => unreachable!(),
};
let right = Box::new(self.parse_comparison()?);
expr = Expr::Binary {
left: Box::new(expr),
op: binary_op,
right,
};
}
Ok(expr)
}
fn parse_comparison(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_term()?;
while let Some(op) = self.match_tokens(&[
Token::Less,
Token::LessEqual,
Token::Greater,
Token::GreaterEqual,
]) {
let binary_op = match op {
Token::Less => BinaryOp::Less,
Token::LessEqual => BinaryOp::LessEqual,
Token::Greater => BinaryOp::Greater,
Token::GreaterEqual => BinaryOp::GreaterEqual,
_ => unreachable!(),
};
let right = Box::new(self.parse_term()?);
expr = Expr::Binary {
left: Box::new(expr),
op: binary_op,
right,
};
}
Ok(expr)
}
fn parse_term(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_factor()?;
while let Some(op) = self.match_tokens(&[Token::Plus, Token::Minus]) {
let binary_op = match op {
Token::Plus => BinaryOp::Add,
Token::Minus => BinaryOp::Subtract,
_ => unreachable!(),
};
let right = Box::new(self.parse_factor()?);
expr = Expr::Binary {
left: Box::new(expr),
op: binary_op,
right,
};
}
Ok(expr)
}
fn parse_factor(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_unary()?;
while let Some(op) = self.match_tokens(&[Token::Star, Token::Slash, Token::Percent]) {
let binary_op = match op {
Token::Star => BinaryOp::Multiply,
Token::Slash => BinaryOp::Divide,
Token::Percent => BinaryOp::Modulo,
_ => unreachable!(),
};
let right = Box::new(self.parse_unary()?);
expr = Expr::Binary {
left: Box::new(expr),
op: binary_op,
right,
};
}
Ok(expr)
}
fn parse_unary(&mut self) -> Result<Expr, String> {
if let Some(op) = self.match_tokens(&[Token::Minus, Token::Not, Token::Bang]) {
let unary_op = match op {
Token::Minus => UnaryOp::Minus,
Token::Not | Token::Bang => UnaryOp::Not,
_ => unreachable!(),
};
let expr = Box::new(self.parse_unary()?);
return Ok(Expr::Unary { op: unary_op, expr });
}
self.parse_call()
}
fn parse_call(&mut self) -> Result<Expr, String> {
let mut expr = self.parse_primary()?;
loop {
if self.matches(&Token::LeftParen) {
expr = self.finish_call(expr)?;
} else if self.matches(&Token::LeftBracket) {
let index = Box::new(self.parse_expression()?);
self.consume(Token::RightBracket, "Expected ']' after index")?;
expr = Expr::Index {
object: Box::new(expr),
index,
};
} else {
break;
}
}
Ok(expr)
}
fn finish_call(&mut self, callee: Expr) -> Result<Expr, String> {
let mut args = Vec::new();
if !self.check(&Token::RightParen) {
loop {
args.push(self.parse_expression()?);
if !self.matches(&Token::Comma) {
break;
}
}
}
self.consume(Token::RightParen, "Expected ')' after arguments")?;
Ok(Expr::Call {
callee: Box::new(callee),
args,
})
}
fn parse_primary(&mut self) -> Result<Expr, String> {
match &self.peek().token.clone() {
Token::True => {
self.advance();
Ok(Expr::Bool(true))
}
Token::False => {
self.advance();
Ok(Expr::Bool(false))
}
Token::Nil => {
self.advance();
Ok(Expr::Nil)
}
Token::Number(n) => {
let num = *n;
self.advance();
Ok(Expr::Number(num))
}
Token::String(s) => {
let str = s.clone();
self.advance();
Ok(Expr::String(str))
}
Token::Identifier(name) => {
let id = name.clone();
self.advance();
Ok(Expr::Identifier(id))
}
Token::LeftParen => {
self.advance();
let expr = self.parse_expression()?;
self.consume(Token::RightParen, "Expected ')' after expression")?;
Ok(Expr::Grouping(Box::new(expr)))
}
Token::LeftBracket => {
self.advance();
let mut elements = Vec::new();
if !self.check(&Token::RightBracket) {
loop {
elements.push(self.parse_expression()?);
if !self.matches(&Token::Comma) {
break;
}
}
}
self.consume(Token::RightBracket, "Expected ']' after list elements")?;
Ok(Expr::List(elements))
}
Token::Fn => {
self.advance();
self.consume(Token::LeftParen, "Expected '(' after 'fn'")?;
let mut params = Vec::new();
if !self.check(&Token::RightParen) {
loop {
if let Token::Identifier(p) = &self.peek().token {
params.push(p.clone());
self.advance();
} else {
return Err(self.error("Expected parameter name"));
}
if !self.matches(&Token::Comma) {
break;
}
}
}
self.consume(Token::RightParen, "Expected ')' after parameters")?;
self.consume(Token::LeftBrace, "Expected '{' before function body")?;
let mut body = Vec::new();
while !self.check(&Token::RightBrace) && !self.is_at_end() {
body.push(self.parse_statement()?);
}
self.consume(Token::RightBrace, "Expected '}' after function body")?;
Ok(Expr::Lambda { params, body })
}
_ => Err(self.error("Expected expression")),
}
}
// ========== Helper methods ==========
fn matches(&mut self, token: &Token) -> bool {
if self.check(token) {
self.advance();
true
} else {
false
}
}
fn match_tokens(&mut self, tokens: &[Token]) -> Option<Token> {
for token in tokens {
if self.check(token) {
let matched = self.peek().token.clone();
self.advance();
return Some(matched);
}
}
None
}
fn check(&self, token: &Token) -> bool {
if self.is_at_end() {
return false;
}
std::mem::discriminant(&self.peek().token) == std::mem::discriminant(token)
}
fn advance(&mut self) {
if !self.is_at_end() {
self.current += 1;
}
}
fn is_at_end(&self) -> bool {
matches!(self.peek().token, Token::Eof)
}
fn peek(&self) -> &TokenInfo {
&self.tokens[self.current]
}
fn consume(&mut self, token: Token, message: &str) -> Result<(), String> {
if self.check(&token) {
self.advance();
Ok(())
} else {
Err(self.error(message))
}
}
fn error(&self, message: &str) -> String {
let token_info = self.peek();
format!(
"{} at line {}, column {}",
message, token_info.line, token_info.column
)
}
}
How the Parser Works
Precedence Climbing
Each precedence level has its own function:
parse_expression() → assignment (=)
parse_assignment() → or
parse_or() → and
parse_and() → equality (==, !=)
parse_equality() → comparison (<, >, <=, >=)
parse_comparison() → term (+, -)
parse_term() → factor (*, /, %)
parse_factor() → unary (-, not)
parse_unary() → call (function calls, indexing)
parse_call() → primary (literals, identifiers)
parse_primary()
Example: Parsing 2 + 3 * 4
parse_expression()→parse_assignment()→parse_or()→ ... →parse_term()parse_term()sees2, then+- Recursively calls
parse_factor()for right side parse_factor()sees3, then*- Recursively calls
parse_unary()for4 - Builds:
Binary { left: 2, op: Add, right: Binary { left: 3, op: Multiply, right: 4 } }
Result: 2 + (3 * 4) ✅
Testing the Parser
Update src/main.rs:
mod ast;
mod lexer;
mod parser;
use lexer::Lexer;
use parser::Parser;
fn main() {
let source = r#"
let x = 5 + 3 * 2;
fn factorial(n) {
if n <= 1 {
return 1;
}
return n * factorial(n - 1);
}
print factorial(5);
"#;
// Lexing
let mut lexer = Lexer::new(source.to_string());
let tokens = match lexer.tokenize() {
Ok(t) => t,
Err(e) => {
eprintln!("Lexer error: {}", e);
return;
}
};
// Parsing
let mut parser = Parser::new(tokens);
match parser.parse() {
Ok(statements) => {
println!("AST:");
for stmt in statements {
println!("{:#?}", stmt);
}
}
Err(e) => {
eprintln!("Parser error: {}", e);
}
}
}
Run it:
cargo run
Error Recovery
The parser detects syntax errors:
let x = ; // Error: Expected expression at line 1, column 9
fn { } // Error: Expected function name at line 1, column 4
if x // Error: Expected '{' after condition at line 1, column 5
Common Pitfalls
1. Wrong Precedence Order
// ❌ Wrong: term before factor
fn parse_factor() { ... }
fn parse_term() {
let expr = parse_factor(); // Should be parse_unary()
...
}
// ✅ Correct: follow precedence hierarchy
fn parse_term() {
let expr = parse_factor();
...
}
fn parse_factor() {
let expr = parse_unary();
...
}
2. Forgetting to Box Recursive Types
// ❌ Won't compile: infinite size
pub enum Expr {
Binary { left: Expr, right: Expr } // Error!
}
// ✅ Correct: use Box
pub enum Expr {
Binary { left: Box<Expr>, right: Box<Expr> }
}
3. Not Consuming Tokens
// ❌ Wrong: infinite loop
while self.check(&Token::Plus) {
// forgot to advance!
}
// ✅ Correct
while self.matches(&Token::Plus) { // matches() advances
...
}
Parser Summary
| Input | Output |
|---|---|
| Tokens | AST (statements and expressions) |
[Let, Identifier("x"), Equal, Number(5)] | Stmt::Let { name: "x", value: Expr::Number(5) } |
What's Next?
We have an AST! Now we need to execute it - the interpreter walks the tree and performs the operations.
→ Continue to Chapter 4: Interpreter
Quick Quiz
What AST does x = 2 + 3 produce?
Stmt::Expression(
Expr::Assign {
name: "x",
value: Box::new(
Expr::Binary {
left: Box::new(Expr::Number(2)),
op: BinaryOp::Add,
right: Box::new(Expr::Number(3)),
}
),
}
)