Persistence: Checkpointers and Threads

This chapter covers checkpointers: how a graph remembers its state across invocations, resumes from interruptions, and lets you inspect what happened.

The Problem

The graphs we've written so far are stateless between invocations. Every invoke starts with whatever state you pass in. For a one-shot tool, fine. For a conversational agent, every turn is a fresh mind.

# Turn 1
graph.invoke({"messages": [HumanMessage("Hi, my name is Ada.")]})

# Turn 2: starts from scratch, no memory of turn 1
graph.invoke({"messages": [HumanMessage("What's my name?")]})
# Agent: "I don't know your name."

Checkpointers fix this. A checkpointer persists state at every step. Subsequent invocations can resume from the last saved state.

Checkpointer Options

Three built-in options, by storage.

MemorySaver (In-Process)

State lives in a Python dict. Gone when the process dies.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = builder.compile(checkpointer=memory)

Useful for: tests, notebooks, prototyping.

SqliteSaver (File)

State lives in a SQLite file. Survives process restarts.

from langgraph.checkpoint.sqlite import SqliteSaver

with SqliteSaver.from_conn_string("./graph.db") as memory:
    graph = builder.compile(checkpointer=memory)
    graph.invoke(...)

Useful for: local development, simple production, single-node deployments.

PostgresSaver (Production)

State lives in Postgres. Survives, scales, supports concurrent access.

from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = "postgresql://user:pass@localhost:5432/langgraph"

with PostgresSaver.from_conn_string(DB_URI) as memory:
    memory.setup()   # one-time: create tables
    graph = builder.compile(checkpointer=memory)

Useful for: production, multiple processes, durable conversations.

Async variants exist: AsyncSqliteSaver, AsyncPostgresSaver. Use when your graph is async.

Threads

A thread is a conversation ID. All invocations with the same thread ID share state.

config = {"configurable": {"thread_id": "user-42"}}

# Turn 1
graph.invoke(
    {"messages": [HumanMessage("Hi, my name is Ada.")]},
    config=config,
)

# Turn 2, same thread
graph.invoke(
    {"messages": [HumanMessage("What's my name?")]},
    config=config,
)
# Agent: "Your name is Ada."

Every conversation gets its own thread ID. User-based? Use user_id as thread_id. Chat-session-based? Use a session ID. Short-lived task? Any UUID.

The Invocation Shape

With a checkpointer, invoke works slightly differently.

  • Initial invoke: pass the full initial state.
  • Subsequent invokes on the same thread: pass only the new input. LangGraph merges it with checkpointed state.
# Turn 1 (initial)
graph.invoke(
    {"messages": [HumanMessage("Hi")]},
    config=config,
)

# Turn 2 (append to existing state, no need to resend previous messages)
graph.invoke(
    {"messages": [HumanMessage("What did I just say?")]},
    config=config,
)

Because messages uses the add_messages reducer, the new HumanMessage is appended to whatever's in state, not replacing it.

Inspecting State

Checkpointers let you read the current state and the history.

get_state

state = graph.get_state(config=config)
print(state.values)    # current state dict
print(state.next)       # tuple of next nodes to run (empty if ended)
print(state.tasks)      # pending tasks (for interrupts)
print(state.config)     # config with current checkpoint ID

get_state_history

for checkpoint in graph.get_state_history(config=config):
    print(checkpoint.config["configurable"]["checkpoint_id"], checkpoint.values)

Every step of every invocation on this thread, newest first. Useful for debugging "what happened in turn 3 again?"

Time Travel

Checkpoints are identified; you can resume from a specific checkpoint instead of the latest.

# Find a historical checkpoint
for cp in graph.get_state_history(config):
    if some_condition(cp.values):
        target = cp
        break

# Resume as if that checkpoint were current
graph.invoke(None, config=target.config)

Invoke with None means "continue from the checkpoint without new input". You can also pass new input, which is merged as usual.

Use cases: debugging (rewind to the moment before the bug), speculative replays (fork from a checkpoint, try a different path), user "undo" features.

update_state: Editing State Manually

You can modify state without running the graph.

graph.update_state(
    config,
    {"messages": [HumanMessage("actually, I meant Tokyo not London")]},
)

This appends a message (via the reducer) and creates a new checkpoint. Subsequent invoke starts from the updated state.

Useful with human-in-the-loop (Chapter 6): user edits the draft, you update state, graph continues.

Keeping It Simple

Most real agents need persistence. The minimum viable setup:

from langgraph.checkpoint.sqlite import SqliteSaver

with SqliteSaver.from_conn_string("./state.db") as memory:
    graph = builder.compile(checkpointer=memory)

    config = {"configurable": {"thread_id": "user-42"}}

    # First turn
    out1 = graph.invoke({"messages": [HumanMessage("Hi, I'm Ada.")]}, config=config)
    print(out1["messages"][-1].content)

    # Second turn, same thread
    out2 = graph.invoke({"messages": [HumanMessage("What's my name?")]}, config=config)
    print(out2["messages"][-1].content)
    # "Your name is Ada."

Three lines of infrastructure, full memory across turns.

Production Considerations

State Size

Every checkpoint serializes the full state to the backend. If state grows unboundedly (message history), the DB grows and serialization gets slow.

Strategies:

  • Message trimming. Keep only the last N messages, or prune older messages via a node that calls trim_messages.
  • Summarization. Periodically summarize old messages into a shorter summary; store the summary in state and drop the originals.
  • External storage for large blobs. Don't put 10MB of fetched content into state. Store it somewhere else and keep a reference.

Cleanup

Old threads pile up. Checkpointers don't auto-expire. Write a cleanup script:

-- Postgres
DELETE FROM checkpoints WHERE created_at < NOW() - INTERVAL '30 days';

Or use thread-level TTLs in application code (track last-accessed; purge idle threads).

Concurrent Writes

Two processes invoking on the same thread simultaneously can conflict. PostgresSaver uses transactions to prevent corruption. Your app logic should prevent concurrent writes per thread (e.g. a queue, or a per-thread lock).

Security

State may contain sensitive data: user messages, tool outputs, PII. Encrypt the Postgres volume, restrict access, consider redacting before persisting.

Common Pitfalls

Forgetting the config on subsequent calls. Without thread_id, LangGraph has no idea which thread to resume. Always pass it.

Using MemorySaver in production. It's in-process. Multi-worker deployments lose state on any given request.

Unbounded state growth. Every message stays forever. Add trimming or summarization from day one.

Treating the checkpoint as a source of truth. The DB is your source of truth. Checkpoints are the agent's runtime memory. Don't query the checkpoints from application code for business data; store that elsewhere.

Not calling setup() on first use for SQL-backed savers. The first invoke errors because tables don't exist. Run saver.setup() once.

Next Steps

Continue to 06-human-in-the-loop.md to pause a graph for human input.