Context-Free Grammars and Pushdown Automata

Introduction

Context-free grammars (CFGs) are more powerful than regular expressions and finite automata. They can describe the syntax of most programming languages, including nested structures like balanced parentheses, matching if-then-else statements, and recursive data structures. CFGs form the theoretical foundation of compiler design and parsing algorithms.

Learning Objectives

By the end of this reading, you will be able to:

  • Define and construct context-free grammars
  • Understand derivations, parse trees, and the language generated by a CFG
  • Identify and resolve ambiguous grammars
  • Convert CFGs to Chomsky Normal Form
  • Understand pushdown automata (PDAs) as recognizers for context-free languages
  • Prove the equivalence of CFGs and PDAs
  • Apply the pumping lemma for context-free languages
  • Recognize the limitations of context-free languages

Context-Free Grammars: Formal Definition

Basic Definitions

A context-free grammar (CFG) is a 4-tuple G = (V, Σ, R, S) where:

  1. V is a finite set of variables (also called nonterminals)
  2. Σ is a finite set of terminals (the alphabet), disjoint from V
  3. R is a finite set of production rules of the form A → w where A ∈ V and w ∈ (V ∪ Σ)*
  4. S ∈ V is the start variable

Notation:

  • Variables: Usually uppercase letters A, B, C, S
  • Terminals: Usually lowercase letters a, b, c or digits
  • Strings of terminals: w, x, y, z
  • Strings of variables and terminals: α, β, γ

Derivations

A derivation is a sequence of substitutions using production rules.

We write α ⇒ β if we can derive β from α in one step by applying a single production rule.

We write α ⇒* β if we can derive β from α in zero or more steps.

Notation:

  • α ⇒ β: derives in one step
  • α ⇒* β: derives in zero or more steps
  • α ⇒+ β: derives in one or more steps

Types of derivations:

  • Leftmost derivation: Always replace the leftmost variable
  • Rightmost derivation: Always replace the rightmost variable

Language Generated by a Grammar

The language generated by grammar G is:

L(G) = {w ∈ Σ* : S ⇒* w}

A language L is context-free if there exists a CFG G such that L = L(G).

Example 1: Balanced Parentheses

Design a CFG for the language of balanced parentheses.

Grammar:

  • V = {S}
  • Σ = {(, )}
  • S = S (start variable)
  • R:
    • S → (S)
    • S → SS
    • S → ε

Derivations:

Example: Generate "(())()"

S ⇒ SS          (use S → SS)
  ⇒ (S)S        (use S → (S) on first S)
  ⇒ ((S))S      (use S → (S))
  ⇒ (())S       (use S → ε)
  ⇒ (())(S)     (use S → (S))
  ⇒ (())()      (use S → ε)

Alternative derivation (leftmost):

S ⇒ SS ⇒ (S)S ⇒ ((S))S ⇒ (())S ⇒ (())(S) ⇒ (())()

Proof that L(G) = {balanced parentheses}:

  • (⊆) By induction on derivation length, every string derived is balanced
  • (⊇) By induction on string structure, every balanced string can be derived

Example 2: L = {0ⁿ1ⁿ : n ≥ 0}

Grammar:

  • V = {S}
  • Σ = {0, 1}
  • R:
    • S → 0S1
    • S → ε

Derivation of "000111":

S ⇒ 0S1 ⇒ 00S11 ⇒ 000S111 ⇒ 000111

This language is context-free but not regular (proved via pumping lemma in previous reading).

Example 3: Arithmetic Expressions

Design a CFG for simple arithmetic expressions with +, *, and parentheses.

Grammar:

  • V = {E, T, F} (Expression, Term, Factor)
  • Σ = {+, *, (, ), a} (a represents variables/numbers)
  • Start: E
  • R:
    • E → E + T | T
    • T → T * F | F
    • F → (E) | a

Derivation of "a + a * a":

E ⇒ E + T
  ⇒ T + T
  ⇒ F + T
  ⇒ a + T
  ⇒ a + T * F
  ⇒ a + F * F
  ⇒ a + a * F
  ⇒ a + a * a

This grammar captures operator precedence: * binds tighter than +.

Example 4: Programming Language Constructs

If-then-else statements:

S → if E then S
S → if E then S else S
S → other
E → condition

While loops:

S → while E do S
S → begin S end
S → S; S
S → a := b

Parse Trees

A parse tree is a graphical representation of a derivation.

Structure:

  • Root: Start variable S
  • Internal nodes: Variables
  • Leaves: Terminals (reading left-to-right gives the derived string)
  • Parent-child relationship: If A → α is used, A is parent and symbols in α are children

Example 5: Parse Tree for "a + a * a"

Using the arithmetic grammar from Example 3:

              E
              |
          ┌───┴───┬───┐
          E   +   T
          |       |
          T   ┌───┼───┬───┐
          |   T   *   F
          F   |       |
          |   F       a
          a   |
              a

Reading leaves left-to-right: a + a * a

Relationship Between Derivations and Parse Trees

  • Each parse tree corresponds to multiple derivations (leftmost, rightmost, etc.)
  • But all these derivations represent the same syntactic structure
  • Parse trees capture the structural meaning, while derivations show the process

Theorem: A string w has more than one parse tree if and only if it has more than one leftmost derivation.

Ambiguity

Definition

A CFG is ambiguous if there exists a string with two or more distinct parse trees (equivalently, two or more distinct leftmost derivations).

A CFG is unambiguous if every string has at most one parse tree.

Example 6: Ambiguous Grammar

Consider this grammar for arithmetic expressions:

E → E + E
E → E * E
E → (E)
E → a

The string "a + a * a" has (at least) two parse trees:

Parse Tree 1 (+ at root):

        E
    ┌───┼───┐
    E   +   E
    |   ┌───┼───┐
    a   E   *   E
        |       |
        a       a

Parse Tree 2 (* at root):

        E
    ┌───┼───┐
    E   *   E
┌───┼───┐   |
E   +   E   a
|       |
a       a

These represent different interpretations: (a+a)a vs. a+(aa).

Resolving Ambiguity

Strategy 1: Rewrite the grammar (as in Example 3)

  • Use separate variables for different precedence levels
  • Encode associativity in recursive structure

Strategy 2: Add disambiguating rules (used in parser generators)

  • Specify operator precedence: * before +
  • Specify associativity: left-to-right
  • Example (yacc/bison): %left '+' %left '*'

Strategy 3: Modify the language

  • Sometimes ambiguity is inherent in the language
  • Example: "dangling else" problem

The Dangling Else Problem

Consider:

S → if E then S
S → if E then S else S
S → other

The string "if E₁ then if E₂ then S₁ else S₂" has two parse trees:

Parse 1: else matches second if

           S
           |
    ┌──────┼──────┬──────┬─────┐
   if     E₁    then     S
                         |
                ┌────────┼──────┬──────┬──────┬─────┐
               if       E₂    then    S₁    else    S₂

Parse 2: else matches first if

                S
                |
    ┌───────────┼───────┬──────┬──────┬──────┬─────┐
   if          E₁     then     S     else    S₂
                               |
                        ┌──────┼──────┬──────┬─────┐
                       if     E₂    then    S₁

Standard resolution: "else matches the nearest unmatched if" (Parse 1)

This can be encoded in the grammar:

S → Matched | Unmatched
Matched → if E then Matched else Matched | other
Unmatched → if E then S | if E then Matched else Unmatched

Inherently Ambiguous Languages

Some context-free languages have no unambiguous grammar.

Example: L = {aⁿbⁿcᵐdᵐ : n,m ≥ 1} ∪ {aⁿbᵐcᵐdⁿ : n,m ≥ 1}

Every CFG for this language is ambiguous. The string a²b²c²d² belongs to both parts, and any grammar must derive it in fundamentally different ways.

Chomsky Normal Form

Definition

A CFG is in Chomsky Normal Form (CNF) if every production rule is of the form:

  • A → BC (where B, C are variables, not the start variable)
  • A → a (where a is a terminal)
  • S → ε (only if ε ∈ L(G), and S doesn't appear on the right side of any rule)

Importance

  • Simplifies parsing algorithms: CYK algorithm runs in O(n³) for CNF grammars
  • Theoretical tool: Many proofs are easier with CNF
  • Normal form theorem: Every CFG can be converted to CNF

Conversion to CNF

Algorithm:

  1. Add new start variable: Create S₀ → S to ensure start variable doesn't appear on right side

  2. Eliminate ε-rules: Remove A → ε rules

    • For each rule B → αAβ, add B → αβ
    • Repeat for all nullable variables
  3. Eliminate unit rules: Remove A → B rules

    • For each rule A → B and B → α, add A → α
    • Repeat until no unit rules remain
  4. Convert long rules: Replace A → u₁u₂...uₖ (k ≥ 3) with:

    • A → u₁A₁
    • A₁ → u₂A₂
    • ...
    • Aₖ₋₂ → uₖ₋₁uₖ
  5. Convert terminal rules: Replace A → u₁u₂ where uᵢ is a terminal with:

    • A → U₁U₂
    • Uᵢ → uᵢ (new rules)

Example 7: Converting to CNF

Original grammar:

S → ASA | aB
A → B | S
B → b | ε

Step 1: Add new start variable

S₀ → S
S → ASA | aB
A → B | S
B → b | ε

Step 2: Eliminate ε-rules (B is nullable)

S₀ → S
S → ASA | aB | a    (added S → a since B is nullable)
A → B | S | ε       (A becomes nullable)
B → b

Now A is nullable too:

S₀ → S
S → ASA | AS | SA | S | aB | a
A → B | S
B → b

Step 3: Eliminate unit rules

S₀ → ASA | AS | SA | aB | a | b
S → ASA | AS | SA | aB | a | b
A → b
B → b

Step 4: Convert long rules (ASA has length 3)

S₀ → AX₁ | AS | SA | aB | a | b  where X₁ → SA
S → AX₁ | AS | SA | aB | a | b
X₁ → SA
A → b
B → b

Step 5: Convert terminal rules (like aB)

S₀ → AX₁ | AS | SA | UB | a | b  where U → a
S → AX₁ | AS | SA | UB | a | b
X₁ → SA
A → b
B → b
U → a

Final CNF:

S₀ → AX₁ | AS | SA | UB | a | b
S → AX₁ | AS | SA | UB | a | b
X₁ → SA
A → b
B → b
U → a

Pushdown Automata

Motivation

Regular languages are recognized by finite automata (finite memory). Context-free languages need more memory, but not arbitrary memory - they need a stack.

A pushdown automaton (PDA) is an NFA augmented with a stack.

Formal Definition

A pushdown automaton is a 6-tuple P = (Q, Σ, Γ, δ, q₀, F) where:

  1. Q is a finite set of states
  2. Σ is the input alphabet
  3. Γ is the stack alphabet
  4. δ: Q × Σ_ε × Γ_ε → P(Q × Γ_ε) is the transition function
  5. q₀ ∈ Q is the start state
  6. F ⊆ Q is the set of accept states

Notation: Σ_ε = Σ ∪ {ε} and Γ_ε = Γ ∪ {ε}

Transition function: δ(q, a, X) = {(p₁, γ₁), (p₂, γ₂), ...} means:

  • In state q
  • Reading input symbol a (or ε)
  • With X on top of stack (or ε to ignore stack)
  • Can transition to state pᵢ and push γᵢ onto stack

Stack operations:

  • Push X: δ(q, a, ε) = {(p, X)} or δ(q, a, Y) = {(p, XY)}
  • Pop X: δ(q, a, X) = {(p, ε)}
  • No-op: δ(q, a, X) = {(p, X)}

Acceptance

A PDA accepts a string w if:

  1. By final state: After reading all of w, PDA is in an accept state (stack can be anything)
  2. By empty stack: After reading all of w, the stack is empty (state can be anything)

These two notions are equivalent in power.

Example 8: PDA for {0ⁿ1ⁿ : n ≥ 0}

Informal description:

  1. Push each 0 onto the stack
  2. Pop a 0 for each 1 read
  3. Accept if stack becomes empty

Formal specification:

  • Q = {q₀, q₁, q₂}
  • Σ = {0, 1}
  • Γ = {0, $} ($ is stack bottom marker)
  • Start: q₀
  • Accept: {q₂}

Transitions:

δ(q₀, ε, ε) = {(q₁, $)}       // Push $ onto empty stack
δ(q₁, 0, ε) = {(q₁, 0)}       // Push 0 for each 0 read
δ(q₁, 1, 0) = {(q₁, ε)}       // Pop 0 for each 1 read
δ(q₁, ε, $) = {(q₂, ε)}       // Accept when $ is on top

State diagram:

        ε, ε → $       0, ε → 0         1, 0 → ε        ε, $ → ε
   ┌────┐         ┌────┐           ┌────┐           ┌────┐
→  │ q₀ │────────►│ q₁ │◄──────────│ q₁ │──────────►│(q₂)│
   └────┘         └────┘           └────┘           └────┘

Trace for "0011":

Configuration: (state, unread input, stack)
(q₀, 0011, ε)   ⊢  (q₁, 0011, $)
                ⊢  (q₁, 011, 0$)
                ⊢  (q₁, 11, 00$)
                ⊢  (q₁, 1, 0$)
                ⊢  (q₁, ε, $)
                ⊢  (q₂, ε, ε)  ✓ Accept

Example 9: PDA for Balanced Parentheses

Strategy:

  • Push '(' onto stack when seen
  • Pop from stack when ')' is seen
  • Accept if stack is empty at end

Transitions:

δ(q₀, ε, ε) = {(q₁, $)}         // Initialize stack
δ(q₁, '(', ε) = {(q₁, '(')}     // Push (
δ(q₁, ')', '(') = {(q₁, ε)}     // Pop ( when seeing )
δ(q₁, ε, $) = {(q₂, ε)}         // Accept when done

Example 10: PDA for {wcwᴿ : w ∈ {a,b}*}

This language consists of strings with a 'c' in the middle, where the right half is the reverse of the left half.

Strategy:

  • Push symbols onto stack until 'c' is seen
  • Then pop and match with remaining input
  • Accept if stack becomes empty

Transitions:

δ(q₀, ε, ε) = {(q₁, $)}           // Initialize
δ(q₁, a, ε) = {(q₁, a)}           // Push a's and b's
δ(q₁, b, ε) = {(q₁, b)}
δ(q₁, c, ε) = {(q₂, ε)}           // Switch to matching mode
δ(q₂, a, a) = {(q₂, ε)}           // Pop and match
δ(q₂, b, b) = {(q₂, ε)}
δ(q₂, ε, $) = {(q₃, ε)}           // Accept

Trace for "abcba":

(q₀, abcba, ε)  ⊢  (q₁, abcba, $)
                ⊢  (q₁, bcba, a$)
                ⊢  (q₁, cba, ba$)
                ⊢  (q₂, ba, ba$)
                ⊢  (q₂, a, a$)
                ⊢  (q₂, ε, $)
                ⊢  (q₃, ε, ε)  ✓

Equivalence of CFGs and PDAs

Theorem: CFG ↔ PDA Equivalence

Theorem: A language is context-free if and only if some PDA recognizes it.

Proof outline:

  1. CFG → PDA: Construct PDA that simulates leftmost derivations
  2. PDA → CFG: Construct CFG that simulates PDA computation

CFG to PDA Construction

Given CFG G = (V, Σ, R, S), construct PDA P:

Idea: Use stack to store sentential form, nondeterministically apply productions.

Algorithm:

  1. Push $ then S onto stack
  2. Repeat:
    • If top of stack is variable A, nondeterministically choose A → w and replace A with w
    • If top of stack is terminal a, read a from input and pop stack
    • If top of stack is $, accept

Formal construction:

  • States: {q_start, q_loop, q_accept}
  • δ(q_start, ε, ε) = {(q_loop, S$)}
  • For each rule A → w: δ(q_loop, ε, A) = {(q_loop, w)}
  • For each terminal a: δ(q_loop, a, a) = {(q_loop, ε)}
  • δ(q_loop, ε, $) = {(q_accept, ε)}

PDA to CFG Construction

Given PDA P, construct CFG G such that L(G) = L(P).

Idea: Variable A_pq generates all strings that take PDA from state p with empty stack to state q with empty stack.

Construction (sketch):

  • For each p, q ∈ Q, create variable A_pq
  • Start variable: A_q₀,f for some f ∈ F
  • For each transition, create corresponding production

This construction is complex; see textbook for full details.

Pumping Lemma for Context-Free Languages

The Pumping Lemma

Not all languages are context-free. The pumping lemma for CFLs helps prove non-context-freeness.

Theorem (Pumping Lemma for CFLs):

If L is a context-free language, then there exists p ≥ 1 (pumping length) such that every string s ∈ L with |s| ≥ p can be divided into five parts, s = uvxyz, satisfying:

  1. |vxy| ≤ p
  2. |vy| > 0 (at least one of v or y is non-empty)
  3. For all i ≥ 0, uvⁱxyⁱz ∈ L

Intuition: Parse trees for long strings must have a repeated variable on some path, creating a "pumpable" region.

Proof of Pumping Lemma

Proof sketch:

  • Let G be a CFG in CNF for L with |V| variables
  • Let p = 2^|V|
  • Any string s with |s| ≥ p has a parse tree of height > |V|
  • By pigeonhole principle, some variable R repeats on a path from root to leaf
  • This creates two subtrees rooted at R, one inside the other
  • The inner subtree generates x, the region between generates v and y
  • These can be pumped independently

Example 11: L = {aⁿbⁿcⁿ : n ≥ 0} is Not Context-Free

Proof: Assume L is context-free with pumping length p.

Choose: s = aᵖbᵖcᵖ ∈ L (clearly |s| = 3p ≥ p)

Consider any division s = uvxyz where |vxy| ≤ p and |vy| > 0:

Case 1: vxy contains only one type of symbol (all a's, all b's, or all c's)

  • Then pumping (i=2) increases count of only one symbol
  • Result: unequal numbers → uv²xy²z ∉ L

Case 2: vxy contains two types of symbols

  • vxy spans at most p symbols, so it can't contain all three types
  • Subcase 2a: vxy contains a's and b's but no c's
    • Pumping changes ratio of a's and b's to c's
    • Result: uv²xy²z ∉ L
  • Subcase 2b: vxy contains b's and c's but no a's
    • Similar reasoning
    • Result: uv²xy²z ∉ L
  • Subcase 2c: vxy contains a's and c's
    • This is impossible since a's and c's are not adjacent in aᵖbᵖcᵖ

All cases lead to contradiction, so L is not context-free. ∎

Example 12: L = {ww : w ∈ {a,b}*} is Not Context-Free

Proof: Assume L is context-free with pumping length p.

Choose: s = aᵖbᵖaᵖbᵖ ∈ L

Consider any division s = uvxyz where |vxy| ≤ p and |vy| > 0:

Since |vxy| ≤ p, the substring vxy cannot span both halves of s.

Case 1: vxy is entirely in the first half (first aᵖbᵖ)

  • Pumping up (i=2) makes first half longer than second half
  • Result: uv²xy²z ∉ L

Case 2: vxy is entirely in the second half

  • Similar reasoning
  • Result: uv²xy²z ∉ L

Case 3: vxy spans the boundary between the two halves

  • vxy can span at most p symbols around the boundary
  • The middle of s is ...bᵖaᵖ...
  • If vxy is in this region, pumping changes the balance
  • Result: uv²xy²z ∉ L

Therefore L is not context-free. ∎

Closure Properties of Context-Free Languages

Positive Closure Properties

Context-free languages are closed under:

  1. Union: L₁ ∪ L₂

    • Proof: S → S₁ | S₂ where S₁ generates L₁ and S₂ generates L₂
  2. Concatenation: L₁L₂

    • Proof: S → S₁S₂
  3. Kleene Star: L*

    • Proof: S → SS | ε
  4. Reversal: L^R

    • Proof: Reverse all productions
  5. Homomorphism: h(L)

    • Replace each terminal a with h(a) in the grammar

Negative Closure Properties

Context-free languages are NOT closed under:

  1. Intersection: L₁ ∩ L₂

    • Counterexample: L₁ = {aⁿbⁿcᵐ}, L₂ = {aᵐbⁿcⁿ} are CF
    • But L₁ ∩ L₂ = {aⁿbⁿcⁿ} is not CF
  2. Complement: L̄

    • Follows from non-closure under intersection (De Morgan's laws)

However: If L is context-free and R is regular, then L ∩ R is context-free.

  • Proof: Simulate PDA for L and DFA for R simultaneously (product construction)

Applications of Context-Free Grammars

1. Programming Language Syntax

Most programming language syntax is context-free:

program → statement*
statement → assignment | if-stmt | while-stmt | block
assignment → identifier = expression ;
if-stmt → if ( expression ) statement
       | if ( expression ) statement else statement
while-stmt → while ( expression ) statement
block → { statement* }
expression → term | expression + term | expression - term
term → factor | term * factor | term / factor
factor → identifier | number | ( expression )

2. Parsing Algorithms

Top-down parsing (LL):

  • Recursive descent
  • Predictive parsing
  • Works for LL(k) grammars

Bottom-up parsing (LR):

  • Shift-reduce
  • LR(k), SLR, LALR parsers
  • More powerful, handles more grammars

CYK Algorithm: O(n³) parsing for CNF grammars

3. XML and HTML Structure

element → <tag> content </tag> | <tag/>
content → element* | text

4. Natural Language Processing

Context-free grammars model phrase structure:

S → NP VP
NP → Det N | Det Adj N
VP → V NP | V NP PP
PP → P NP

Decidability for Context-Free Languages

Decidable Problems

  1. Membership: Given CFG G and string w, is w ∈ L(G)?

    • Yes, using CYK algorithm: O(n³)
  2. Emptiness: Is L(G) = ∅?

    • Yes: Check if start variable can derive any terminal string
  3. Finiteness: Is |L(G)| < ∞?

    • Yes: Check for cycles in derivation graph

Undecidable Problems

  1. Equivalence: Given CFGs G₁ and G₂, is L(G₁) = L(G₂)?

    • Undecidable
  2. Inclusion: Is L(G₁) ⊆ L(G₂)?

    • Undecidable
  3. Ambiguity: Is G ambiguous?

    • Undecidable
  4. Intersection emptiness: Is L(G₁) ∩ L(G₂) = ∅?

    • Undecidable

These are in stark contrast to regular languages, where all analogous problems are decidable.

Exercises

Basic Exercises

  1. CFG Construction: Design context-free grammars for:

    • a) L = {aⁿb²ⁿ : n ≥ 0}
    • b) Palindromes over {a, b}
    • c) L = {aⁱbʲcᵏ : i = j or j = k}
  2. Derivations: For the grammar S → aSb | SS | ε, give:

    • a) A leftmost derivation of "aabbab"
    • b) A rightmost derivation of "aabbab"
    • c) A parse tree for "aabbab"
  3. Ambiguity Detection: Determine if this grammar is ambiguous:

    S → aS | Sa | a
    

    If ambiguous, show two parse trees for some string.

Intermediate Exercises

  1. CNF Conversion: Convert this grammar to Chomsky Normal Form:

    S → AB | C
    A → aA | ε
    B → bB | b
    C → cC | c
    
  2. PDA Construction: Design a PDA for:

    • a) L = {aⁱbʲcᵏ : i = k}
    • b) L = {w ∈ {a,b}* : #a(w) = #b(w)}
    • c) Palindromes over {a, b}
  3. Pumping Lemma: Use the pumping lemma to prove these languages are not context-free:

    • a) L = {aⁿbⁿcⁿdⁿ : n ≥ 0}
    • b) L = {aⁱbʲcᵏ : i < j < k}
    • c) L = {ww : w ∈ {a,b,c}*}

Advanced Exercises

  1. Inherent Ambiguity: Prove that every CFG for L = {aⁿbⁿcᵐdᵐ : n,m ≥ 1} ∪ {aⁿbᵐcᵐdⁿ : n,m ≥ 1} must be ambiguous. (Hint: Consider strings in both languages.)

  2. Closure Proof: Prove that if L₁ is context-free and L₂ is regular, then L₁ ∩ L₂ is context-free. (Hint: Product construction for PDA and DFA.)

  3. PDA to CFG: Convert this PDA to a CFG:

    Q = {q₀, q₁}, Σ = {a, b}, Γ = {X, Z₀}, Start: q₀, Accept: {q₁}
    δ(q₀, a, Z₀) = {(q₀, XZ₀)}
    δ(q₀, a, X) = {(q₀, XX)}
    δ(q₀, b, X) = {(q₁, ε)}
    δ(q₁, b, X) = {(q₁, ε)}
    δ(q₁, ε, Z₀) = {(q₁, ε)}
    
  4. Ogden's Lemma: Research Ogden's Lemma (a stronger version of the pumping lemma) and use it to prove L = {aⁱbʲcᵏ : i = j or i = k} is not context-free.

  5. Grammar Design: Design an unambiguous CFG for arithmetic expressions with operators +, -, *, / and parentheses, where:

      • and / have higher precedence than + and -
    • All operators are left-associative
    • Prove your grammar is unambiguous
  6. Decidability: Prove that the emptiness problem for CFGs is decidable. Design an algorithm and analyze its time complexity.

Summary

In this reading, we explored context-free grammars and pushdown automata:

  1. Context-Free Grammars provide a generative formalism for languages with recursive structure, using production rules

  2. Parse Trees represent the syntactic structure of derivations and reveal ambiguity

  3. Ambiguity occurs when a string has multiple parse trees; some languages are inherently ambiguous

  4. Chomsky Normal Form is a restricted form where every production is A → BC or A → a; every CFG can be converted to CNF

  5. Pushdown Automata are NFAs augmented with a stack, providing operational view of context-free languages

  6. Equivalence: CFGs and PDAs recognize exactly the context-free languages

  7. Pumping Lemma for CFLs provides a tool to prove languages are not context-free

  8. Closure Properties: CFLs are closed under union, concatenation, star, but not intersection or complement

  9. Applications: Programming language syntax, parsing, XML structure

  10. Decidability: Membership is decidable (O(n³)), but equivalence and ambiguity are undecidable

Key Insights:

  • Context-free languages extend regular languages by adding stack memory
  • Parse trees make structure explicit and reveal ambiguity
  • Not all natural languages (like {aⁿbⁿcⁿ}) are context-free
  • Some problems decidable for regular languages become undecidable for CFLs

Limitations:

  • Cannot express "equal numbers of a's, b's, and c's"
  • Cannot enforce non-local dependencies (context-sensitive features)
  • Cannot solve the general ambiguity problem

Next Steps

In the next reading, we'll explore Turing Machines and Computability, which remove all memory limitations and reach the theoretical limits of computation:

  • Turing machine model and variants
  • Church-Turing thesis
  • Decidability and the Halting Problem
  • Complexity classes P and NP
  • Reductions and NP-completeness

← Previous: Regular Languages | Continue to Turing Machines →

References and Further Reading

  1. Sipser, Michael. "Introduction to the Theory of Computation" (Chapter 2)
  2. Hopcroft, John E., et al. "Introduction to Automata Theory" (Chapters 5-7)
  3. Aho, Alfred V., et al. "Compilers: Principles, Techniques, and Tools" (Dragon Book)
  4. Chomsky, Noam. "Three models for the description of language" (1956)
  5. CYK Algorithm: Cocke, Kasami, Younger (1960s)