Best Practices: State Design and Anti-Patterns
This chapter collects the habits and anti-patterns that separate a LangGraph codebase you can maintain from one you'd rewrite.
State Design
State is the most important decision in any LangGraph. Get it right, everything composes; get it wrong, nothing works smoothly.
Keep It Flat
Deeply nested state (state["user"]["profile"]["settings"]["theme"]) is a pain to reduce, update, and inspect. Prefer flat state with clear top-level keys.
Use Typed Dicts
TypedDict for simple cases, Pydantic for external input. Never plain dicts; you lose editor support and correctness.
Use Reducers Deliberately
Default reducer is replace. For lists, use Annotated[list, add] or add_messages. For dicts, write a merge reducer if needed. Think about each field's reducer when you add it.
Minimize State
Anything a node can recompute from other state doesn't belong in state. Anything purely transient (a loop counter for a single invocation) may still belong in state to survive interrupts, but ask.
Large blobs (embeddings, fetched documents over a few KB) don't go in state. Store them externally (a vector DB, S3) and keep references in state.
Separate Input and Output
If your graph has a clearly structured output, consider a separate output key. Makes it easy for callers to consume without wading through working state.
Node Design
One Job per Node
A node that calls an LLM, parses its output, writes to a DB, and decides the next step does too much. Split: LLM call, parse, write, route.
When nodes do one thing:
- They're easier to test.
- They're easier to replace.
- Their errors are easier to understand.
Node Purity
A node is a function from state to partial state. Treat it that way:
- No global side effects if you can avoid them (external API calls are necessary, but keep them contained).
- No mutation of the input state.
- Deterministic given the same state and external environment.
Pure-ish nodes are easier to reason about and to replay.
Error Handling
Three choices when a node can fail:
- Let it propagate. The graph fails. Use when a failure means the whole invocation should fail.
- Catch and write to state.
{"error": "..."}in state. A subsequent node decides what to do. Use when you want the agent to see the error and react. - Retry in the node. With exponential backoff. Use for transient failures like network hiccups.
Pick per node. Don't swallow errors silently; either propagate them or encode them as state for downstream handling.
Testing
Unit Test Each Node
Nodes are functions; call them with state dicts and assert on outputs. No LLM needed if you mock it.
def test_route_to_tools():
state = {"messages": [AIMessage(content="", tool_calls=[{"name": "x", "args": {}, "id": "1"}])]}
assert route(state) == "tools"
Integration Test the Graph
With a real checkpointer (SQLite in a temp file) and possibly a mocked LLM, call invoke and assert on the final state.
def test_research_flow():
graph = build_graph()
result = graph.invoke({"topic": "dogs"}, config={"configurable": {"thread_id": "t1"}})
assert "summary" in result
assert len(result["summary"]) > 0
Record-Replay with Fixtures
For LLM-heavy tests, record responses once and replay in tests. LangSmith datasets, VCR, or a custom fixture adapter all work.
Eval Suites for Behavior
Unit tests verify structure. Evals verify behavior. A dataset of (input, expected-ish output) pairs, scored by either string match, semantic similarity, or another LLM. Run before any merge that touches prompts.
Prompts
Version Prompts Explicitly
Prompts are code. Version them, review them, don't inline 500-character templates in a function.
AGENT_PROMPT_V2 = """You are a helpful assistant..."""
Better: prompts in separate files (.txt or .md) loaded at startup.
Test Prompt Changes with Evals
A new prompt might improve the common case and break an edge case. Evals catch this.
Log Prompts in LangSmith
Every change to a prompt changes the observed agent behavior. Traces let you compare.
Graph Organization
Build Functions, Not Scripts
Put graph construction in a function:
def build_agent(llm=None, tools=None):
llm = llm or ChatAnthropic(model="claude-sonnet-4-5")
tools = tools or default_tools
# ... build ...
return graph
Makes it testable (pass a mock LLM), parameterizable (different models), and reusable.
Compile Once
compile() is expensive. Do it once at startup, not per request. The compiled graph is thread-safe; reuse it.
Keep Files Short
One main graph per file. Subgraphs in their own files. Node functions grouped by concern. If your agent.py is over 500 lines, split it.
Security and Privacy
Don't Trust Tool Arguments
The LLM generates tool arguments. It can generate anything. Validate:
@tool
def read_file(path: str) -> str:
"""Read a file. Path must be under ./allowed/."""
if ".." in path or not path.startswith("./allowed/"):
raise ValueError("Invalid path")
return open(path).read()
Especially for tools that touch the filesystem, make HTTP calls, or execute code.
Scope What the LLM Can Do
Don't give the agent a tool that runs arbitrary shell commands unless you have sandboxing. Prefer many narrow tools over one broad tool.
Redact Before Logging
If state contains PII, redact before writing to LangSmith or your own logs. Either at the log layer or at the state boundary.
Rate Limit Tools
A runaway agent calling a paid API 1000 times in a loop is an expensive bug. Wrap paid tools in a rate limiter.
Cost Control
Cap the Loop
Always. Recursion limit plus your own guard. An agent that can't end is an agent that will spend all your money.
Trim Message History
Long-running conversations grow. Every message in state goes into every LLM call. Trim old messages, summarize, or both.
from langchain_core.messages import trim_messages
def trimmer(state: State) -> State:
trimmed = trim_messages(
state["messages"],
max_tokens=4000,
strategy="last",
token_counter=llm,
)
return {"messages": trimmed}
Choose the Right Model
Claude Opus is expensive. For many agent tasks, Sonnet or Haiku is cheaper and fast enough. Use Opus for hard reasoning, Sonnet for everyday work, Haiku for classification and routing.
Monitor Per-Request Cost
Log token counts per invocation. Flag outliers. A request using 10x the normal tokens is a prompt injection, a runaway loop, or a user who pasted a book.
Anti-Patterns
Giant State Object
20 keys, all touched by every node, no clear ownership. Refactor: either split into subgraphs with their own state, or separate concerns into distinct top-level keys.
Magical Routing
Conditional edge functions that inspect 5 keys and return one of 10 destinations. Break into explicit intermediate nodes; the routing reads like documentation.
LLM Calls Outside Nodes
Calling the LLM in a helper outside the graph breaks tracing. Wrap LLM calls in nodes.
Stateful Globals
memory_store = {} # Don't.
def node(state):
memory_store[state["user"]] = state["data"]
Use the graph's state, or an external store (a DB) that every worker can see.
One Node That Does Everything
The classic "call LLM, execute tools, write DB, respond" mega-node. Hard to debug, hard to test, hard to stream. Split.
No Persistence for Conversation
"It's in memory for the session." Your session is one HTTP request. Add a checkpointer.
Ignoring Tool Errors
Silent try/except in tools that returns "success". The LLM now thinks the call worked and acts on garbage. Return errors honestly.
Hardcoded Thread IDs
thread_id = "default" for every request. All users share one conversation. Scope per user.
The One-Page Checklist
Before shipping any LangGraph service:
- State is typed (TypedDict or Pydantic).
- Lists use reducers (
add_messagesorAnnotated[list, add]). - Every loop has a clear exit condition and a recursion cap.
- Persistent checkpointer (not MemorySaver).
- Thread IDs scoped per user or session.
- LLM errors and tool errors are handled, not swallowed.
- LangSmith tracing is on (at least in staging).
- Rate limits on endpoints and on paid tools.
- Auth on the HTTP layer.
- Cost monitoring and alerts.
Most of these are the difference between a demo and a product.
Where to Go From Here
You have the primitives, the patterns, the deployment path, and the habits. The next level is depth:
- LangGraph docs: the canonical reference, including prebuilt agents and integrations.
- LangChain Academy: structured courses covering LangGraph in more depth than a tutorial can.
- Awesome LangGraph lists: curated examples, templates, and production case studies on GitHub.
- Read production repos. Open-source agents on GitHub show how patterns from this tutorial play out in real codebases.
- Build something small. A Slack bot, a research agent, a support triage system. Ship it to a handful of real users. Iterate from there.
The tutorial is the map. The territory is what you'll discover by running agents that real people use.