Language Design & Architecture | Building a Programming Language

Before writing any code, we need to design our language. What will it look like? What features will it have? How will the interpreter work?

What Makes a Programming Language?

A programming language has three main components:

Syntax: The rules for what code looks like (grammar)
Semantics: The meaning of code (what it does)
Runtime: The environment where code executes

The Interpreter Pipeline

Our Lux interpreter follows this flow:

Source Code (text file or REPL input)
         ↓
    LEXER (src/lexer.rs)
    - Breaks text into tokens
    - "let x = 5" → [LET, IDENTIFIER("x"), EQUAL, NUMBER(5)]
         ↓
   PARSER (src/parser.rs)
    - Builds Abstract Syntax Tree (AST)
    - Tokens → Tree structure representing program structure
         ↓
  INTERPRETER (src/interpreter.rs)
    - Walks the AST
    - Executes statements, evaluates expressions
    - Uses Environment for variable storage
         ↓
    Output / Side Effects

Interpreter vs Compiler vs JIT

Aspect	Interpreter	Compiler	JIT Compiler
Execution	Executes source directly	Produces machine code first	Compiles during runtime
Startup	Instant	Compile time required	Slight delay first run
Performance	Slower (10-100x)	Fastest	Fast (near-native)
Memory	Moderate	Low (no runtime)	Higher (code + data)
Errors	Runtime only	Compile-time + runtime	Runtime only
Examples	Python, Ruby, PHP	C, C++, Rust, Go	JavaScript (V8), Java (JVM)
Best For	Scripting, rapid dev	System software	Web browsers, VMs

We are building a tree-walking interpreter, the simplest and most educational approach.

Lux Language Design Decisions

1. Dynamically Typed

let x = 5;        // x is a number
x = "hello";      // now x is a string, this is OK

Why? Simpler to implement, no type checker needed. Good for scripting.

Alternative: Static typing (like Rust, requires type inference or annotations)

2. Expression-Based

Almost everything is an expression that returns a value:

let x = if true { 5 } else { 10 };  // if returns a value
let y = { let a = 1; let b = 2; a + b };  // block returns last expression

3. First-Class Functions

Functions are values that can be passed around:

fn add(a, b) { return a + b; }
let operation = add;  // assign function to variable
let result = operation(2, 3);  // call through variable

4. Closures

Functions capture variables from their environment:

fn outer() {
    let x = 10;
    fn inner() {
        return x;  // captures x from outer
    }
    return inner;
}

5. C-Like Syntax with Modern Touches

Familiar to developers coming from JavaScript, C, Java, etc:

// C-style blocks and keywords
if condition {
    // ...
} else {
    // ...
}

// Modern: no semicolons required (optional)
// Modern: fn keyword (like Rust)
// Modern: let for variables (like JavaScript/Rust)

The Complete Lux Language Specification

Keywords

let      - variable declaration
fn       - function declaration
if       - conditional
else     - alternative branch
while    - loop
for      - iteration loop
return   - return from function
true     - boolean literal
false    - boolean literal
nil      - null/none value
print    - built-in print statement
and      - logical AND
or       - logical OR
not      - logical NOT

Operators

Arithmetic: +  -  *  /  %
Comparison: ==  !=  <  >  <=  >=
Logical:    and  or  not
Assignment: =
Grouping:   ( )

Data Types

42              // Number (f64 internally)
3.14159
"hello"         // String
true, false     // Boolean
nil             // Nil (null/none)
[1, 2, 3]       // List
fn() { }        // Function

Grammar Overview

program        → statement* EOF ;

statement      → exprStmt
               | letStmt
               | printStmt
               | blockStmt
               | ifStmt
               | whileStmt
               | forStmt
               | returnStmt
               | fnDecl ;

letStmt        → "let" IDENTIFIER "=" expression ";" ;
printStmt      → "print" expression ";" ;
blockStmt      → "{" statement* "}" ;
ifStmt         → "if" expression blockStmt ( "else" blockStmt )? ;
whileStmt      → "while" expression blockStmt ;
forStmt        → "for" IDENTIFIER "in" expression blockStmt ;
returnStmt     → "return" expression? ";" ;
fnDecl         → "fn" IDENTIFIER "(" parameters? ")" blockStmt ;
exprStmt       → expression ";" ;

expression     → assignment ;
assignment     → IDENTIFIER "=" assignment | logic_or ;
logic_or       → logic_and ( "or" logic_and )* ;
logic_and      → equality ( "and" equality )* ;
equality       → comparison ( ( "==" | "!=" ) comparison )* ;
comparison     → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term           → factor ( ( "+" | "-" ) factor )* ;
factor         → unary ( ( "*" | "/" | "%" ) unary )* ;
unary          → ( "not" | "-" ) unary | call ;
call           → primary ( "(" arguments? ")" | "[" expression "]" )* ;
primary        → NUMBER | STRING | "true" | "false" | "nil"
               | IDENTIFIER | "(" expression ")"
               | "[" arguments? "]"
               | "fn" "(" parameters? ")" blockStmt ;

Sample Lux Programs

Hello World

print "Hello, World!";

Variables and Arithmetic

let a = 10;
let b = 20;
let sum = a + b;
print "Sum: " + str(sum);

Functions

fn greet(name) {
    return "Hello, " + name + "!";
}

print greet("Alice");
print greet("Bob");

Conditionals

fn max(a, b) {
    if a > b {
        return a;
    } else {
        return b;
    }
}

print max(10, 5);

Loops

// While loop
let i = 0;
while i < 5 {
    print i;
    i = i + 1;
}

// For loop
let numbers = [1, 2, 3, 4, 5];
for n in numbers {
    print n * n;
}

Recursion

fn factorial(n) {
    if n <= 1 {
        return 1;
    }
    return n * factorial(n - 1);
}

print factorial(5);  // 120

Closures

fn make_adder(x) {
    fn adder(y) {
        return x + y;
    }
    return adder;
}

let add5 = make_adder(5);
print add5(10);  // 15
print add5(20);  // 25

Higher-Order Functions

fn apply_twice(f, x) {
    return f(f(x));
}

fn double(x) {
    return x * 2;
}

print apply_twice(double, 5);  // 20

Lists

let items = [1, 2, 3];
push(items, 4);
push(items, 5);

for item in items {
    print item;
}

print "Length: " + str(len(items));

Project Setup

Create the Rust project:

# Create the project
cargo new lux
cd lux

The initial project structure looks like this:

lux/
├── Cargo.toml
└── src/
    ├── main.rs          # Entry point (REPL and file runner)
    ├── lexer.rs         # Tokenization
    ├── parser.rs        # AST construction
    ├── ast.rs           # AST node definitions
    ├── interpreter.rs   # Evaluation engine
    └── environment.rs   # Variable storage and scoping

Initial Cargo.toml

[package]
name = "lux"
version = "0.1.0"
edition = "2021"

[dependencies]
# Add dependencies as needed

Initial src/main.rs

fn main() {
    println!("Lux Programming Language");
    println!("Coming soon...");
}

Test that it works:

cargo run

You should see:

Lux Programming Language
Coming soon...

Architecture Overview

Module Responsibilities

main.rs
- CLI argument parsing
- REPL loop
- File reading and execution
- Error display
lexer.rs
- Character-by-character scanning
- Token generation
- Position tracking (line/column)
- Comment and whitespace handling
ast.rs
- Token enum (keyword, operator, literal types)
- Expr enum (all expression types)
- Stmt enum (all statement types)
- Helper types
parser.rs
- Recursive descent parsing
- Operator precedence handling
- AST construction
- Syntax error detection
interpreter.rs
- Value enum (runtime values)
- Expression evaluation
- Statement execution
- Built-in function implementation
environment.rs
- Variable storage
- Scope management
- Variable lookup and assignment

Why These Choices?

Why Rust?

Memory safety: No crashes from bad pointers
Performance: Fast even for a tree-walker
Enums and pattern matching: Perfect for AST representation
Error handling: Result/Option types for clean error propagation
Learning: Rust teaches you to think carefully about data ownership

Why Tree-Walking Interpreter?

Simplest to understand: Direct mapping from AST to execution
Easy to debug: Can print the tree and trace execution
Quick to implement: Complete interpreter in ~1000 lines
Foundation for more: Can optimize later (bytecode, JIT)

Why Dynamic Typing?

Simpler implementation: No type checker needed
Faster prototyping: Get interpreter working quickly
Familiar to many: JavaScript, Python, Ruby all dynamic
Good for scripting: Flexibility in small programs

Trade-off: Runtime errors vs compile-time safety. We handle this with good error messages.

Development Workflow

As we build Lux, follow this pattern:

Design: Plan the feature (syntax, semantics)
Test: Write test cases first
Implement: Add code incrementally
Test: Verify it works
Refine: Clean up and optimize

Common Pitfalls to Avoid

Forgetting position tracking: Always track line/column for error messages
Inconsistent precedence: Use a precedence table, test thoroughly
Scope bugs: Be careful with environment parent chains
Memory leaks: Use Rc/RefCell appropriately for cycles
Poor error messages: Always include source location and context

Quick Reference

Project Commands

# Run REPL
cargo run

# Run a file
cargo run -- script.lux

# Run tests
cargo test

# Run with debugging
RUST_LOG=debug cargo run

# Build release version
cargo build --release
./target/release/lux

Key Concepts

Lexer: Text → Tokens
Parser: Tokens → AST
Interpreter: AST → Execution
Environment: Variable storage with scoping
Value: Runtime representation of data

Next Steps

Continue to 02-lexer.md to build the first component of the pipeline: the lexer that turns source text into tokens.