Building Multi-Step AI Agents with LangGraph

State-of-the-art AI agents are built using LangGraph to manage complex, cyclic workflows that require memory and self-correction. By structuring your agent as a stateful graph, you can move beyond simple linear prompts to create autonomous systems that reliably execute multi-turn tasks - ones that loop, branch based on tool output, recover from failures, and persist their progress across hours or even days of work.

This post covers LangGraph from its conceptual foundations through to production deployment. You will learn how to design a robust state schema, implement self-correcting retry logic, build multi-agent collaboration patterns, and serve your agent via a production-grade API - with working Python code throughout.

Prerequisites

Before diving in, you should be comfortable with Python 3.11+ and have a basic familiarity with LangChain ’s core abstractions (LLMs, tools, prompts). The examples below use the following package versions, which were current at time of writing:

pip install langgraph==0.3 langchain-openai==0.2 langchain-core==0.3 pydantic==2.7

You will also need an OPENAI_API_KEY in your environment (or swap in any LangChain-compatible LLM). All code is tested against Python 3.11 and 3.12.

What Is LangGraph and Why It Replaced Chains

For most of LangChain’s early history, the primary abstraction for chaining LLM calls together was the Chain - a linear pipeline where output from one step flowed into the next. For simple, predictable tasks this worked fine. But the moment you introduced a tool that could fail, an output that needed validation, or a step that required retrying with different parameters, the sequential model collapsed. Chains have no native concept of looping back, branching on a condition, or routing to a different step based on what a tool returned. They are directed acyclic graphs (DAGs) in disguise, and DAGs cannot express the retry logic that makes agents actually reliable in the real world.

LangGraph solves this by treating the agent as a proper stateful graph with support for cyclic edges. A cycle is simply a directed edge that points to a node that has already been visited - and that one primitive unlocks retry logic, self-correction loops, approval workflows, and multi-turn conversation management. Rather than a pipeline, you are building a state machine whose transitions are driven by the agent’s own outputs and tool results. This is the key conceptual shift: your LLM is no longer just generating text at the end of a chain; it is making routing decisions that control the execution flow of the entire program.

The core vocabulary of LangGraph is small but precise. A StateGraph is the container that holds everything. Nodes are ordinary Python functions (or async functions) that receive the current state, do some work - call an LLM, invoke a tool, validate a result - and return a partial state update. Edges are the transitions between nodes; they can be unconditional (always go from A to B) or conditional (call a function that inspects the state and returns the name of the next node to visit). The State object itself is a typed dictionary that flows through every node in the graph, accumulating updates as it goes. Here is the pattern in its simplest form:

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]  # append-only list
    task_complete: bool

def my_node(state: AgentState) -> dict:
    # do work, return partial state update
    return {"task_complete": True}

graph = StateGraph(AgentState)
graph.add_node("worker", my_node)
graph.set_entry_point("worker")
graph.add_edge("worker", END)

app = graph.compile()
result = app.invoke({"messages": [], "task_complete": False})

This is enough to run a single-node graph. Everything else - memory, self-correction, multi-agent patterns - is built on this foundation.

LangGraph vs. the Alternatives in 2026

Before committing to LangGraph, it is worth understanding where it sits in a crowded ecosystem. The table below compares the four most popular agent frameworks as of early 2026:

Framework	Paradigm	Cyclic Graphs	Built-in Persistence	Multi-Agent	Best For
LangGraph	Stateful graph / state machine	Yes (first-class)	Yes (Sqlite, Postgres, Redis)	Yes (via subgraphs)	Complex, production agents requiring reliability
CrewAI	Role-based crews	Limited	No (bring your own)	Yes (role abstraction)	Teams of specialized agents, rapid prototyping
AutoGen	Conversational agents	Via orchestration	Limited	Yes (agent conversations)	Research, multi-agent dialogue tasks
OpenAI Assistants API	Managed, cloud-hosted	No (managed by API)	Yes (cloud-managed threads)	Limited	Simplest path to a working assistant, OpenAI-only

LangGraph is the right choice when you need fine-grained control over agent behavior, reliable persistence, and the ability to deploy self-hosted. If you are building a quick proof-of-concept with role-playing agents, CrewAI may get you there faster. But for production systems where you need to reason about every state transition and handle failures gracefully, LangGraph’s explicit graph structure is a major advantage - you can read the graph definition and know exactly what the agent will do in every situation.

Designing Your State Schema

The State object is the single most important design decision in any LangGraph agent. Everything the agent knows at any moment in time lives in state: the conversation history, intermediate tool results, error counts, flags, and any domain-specific data your application needs. Getting this design right before you write a single node prevents the most common category of bugs in multi-step agents - state that grows without bound, fields that have ambiguous semantics, and schemas that are impossible to migrate when requirements change.

The first question to answer is what belongs in state at all. A rule of thumb: if a node needs to read a value to make a decision, or write a value for a downstream node to use, it belongs in state. If data is only needed within a single node, keep it as a local variable. Avoid the temptation to treat state as a general-purpose scratchpad - every field you add increases the surface area for bugs and makes the context window cost of serializing state larger.

For most agents, the state schema looks something like this:

from typing import TypedDict, Annotated, Optional
from langgraph.graph.message import add_messages
from pydantic import BaseModel, Field

class ResearchAgentState(TypedDict):
    # The conversation history - LangGraph's add_messages reducer
    # appends new messages rather than replacing the whole list
    messages: Annotated[list, add_messages]

    # URLs discovered during research, to be scraped
    urls_to_scrape: list[str]

    # Scraped content ready for summarization
    scraped_content: list[dict]

    # The final synthesized answer
    final_answer: Optional[str]

    # Error tracking for the self-correction loop
    error_count: int
    last_error: Optional[str]

    # Termination signal
    task_complete: bool

Notice the Annotated[list, add_messages] pattern on the messages field. LangGraph uses reducers to merge partial state updates from nodes. Without a reducer, returning {"messages": [new_message]} from a node would replace the entire messages list. The add_messages reducer appends instead, which is almost always what you want for conversation history. You can write custom reducers for any field - for example, a reducer that caps a list at N items to prevent unbounded growth.

TypedDict vs. Pydantic for State Validation

TypedDict is the default and it is fast - Python does not actually enforce the types at runtime, which means invalid state can silently propagate through your graph. During development, consider using a Pydantic model instead:

from pydantic import BaseModel, Field, field_validator
from typing import Optional

class StrictAgentState(BaseModel):
    messages: list = Field(default_factory=list)
    error_count: int = Field(default=0, ge=0, le=10)
    task_complete: bool = False
    final_answer: Optional[str] = None

    @field_validator("error_count")
    @classmethod
    def error_count_non_negative(cls, v: int) -> int:
        if v < 0:
            raise ValueError("error_count cannot be negative")
        return v

Pydantic validation runs every time a node returns a state update, so you get immediate, clear error messages when a node returns a field with the wrong type or an out-of-range value. The trade-off is a small runtime overhead and the need to configure LangGraph to use your Pydantic model, but for non-trivial agents this overhead is negligible compared to the cost of LLM calls.

Avoiding State Bloat

One failure mode worth addressing explicitly: agents that embed large data directly in state. If your agent scrapes ten web pages, do not store the full HTML in the state object. Instead, write the content to a temporary file or external store (Redis, S3) and store only the key or path in state. This keeps state serializable in milliseconds, prevents context window overflow when state is serialized into prompts, and makes checkpointing cheap. The state object should contain references to data, not the data itself, whenever documents exceed a few kilobytes.

State Management and Long-Term Memory

One of the most powerful features of LangGraph - and the one that most clearly separates it from simpler agent frameworks - is its native support for persistent checkpointing. After every node execution, LangGraph can save the complete graph state to a durable store. If the agent crashes, is killed, or simply needs to pause for human approval, it can resume from the last checkpoint with no data loss.

LangGraph ships with two built-in checkpointers: SqliteSaver for development and single-machine deployments, and PostgresSaver for production multi-instance deployments. Adding persistence to any graph takes four lines:

from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.graph import StateGraph

# Create the checkpointer
checkpointer = SqliteSaver.from_conn_string("agent_state.db")

# Compile the graph with the checkpointer attached
app = graph.compile(checkpointer=checkpointer)

# Each invocation uses a thread_id to identify the conversation
config = {"configurable": {"thread_id": "user-session-42"}}

# First invocation - agent starts working
result = app.invoke({"messages": [], "task_complete": False}, config=config)

# If the process crashes here, the state is safe in SQLite
# Later: resume from the last checkpoint by invoking with the same thread_id
result = app.invoke(None, config=config)  # None resumes from checkpoint

The thread_id is the key concept: every distinct agent session gets its own thread, and LangGraph uses this ID to read and write checkpoints. A single deployed agent can manage thousands of concurrent sessions, each with fully independent state, through this mechanism.

Short-Term vs. Long-Term Memory

It helps to think about agent memory in two distinct tiers. Short-term memory is the messages list in state - the rolling conversation history that fits within the LLM’s context window and is available to every node in the current execution. It is fast, always available, and automatically managed by LangGraph’s checkpointing. But it has a hard upper limit set by the context window of your model, and it is scoped to a single thread (session).

Long-term memory is everything outside the context window: a vector database storing the agent’s notes and learnings across all sessions, a structured SQL database recording past decisions and their outcomes, or a Redis cache storing intermediate results too large for context. The agent accesses long-term memory explicitly through tool calls or dedicated retrieval nodes. Designing which information lives in each tier is one of the most impactful architecture decisions for production agents.

Human-in-the-Loop Checkpoints

Not every action an agent takes should be autonomous. Sending an email, deploying code to production, or making a financial transaction are all irreversible actions where a human should have the final say. LangGraph supports this through interrupt points - checkpoints where the graph explicitly pauses and waits for external input before proceeding:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver

def draft_email_node(state):
    # Agent drafts an email
    draft = call_llm_to_draft(state["messages"])
    return {"draft_email": draft}

def send_email_node(state):
    # This node only runs after human approval
    send_email(state["draft_email"])
    return {"task_complete": True}

graph = StateGraph(AgentState)
graph.add_node("draft", draft_email_node)
graph.add_node("send", send_email_node)
graph.add_edge("draft", "send")
graph.set_entry_point("draft")
graph.add_edge("send", END)

checkpointer = SqliteSaver.from_conn_string("state.db")
# interrupt_before causes the graph to pause BEFORE entering "send"
app = graph.compile(checkpointer=checkpointer, interrupt_before=["send"])

config = {"configurable": {"thread_id": "email-task-1"}}
# Graph runs "draft" then pauses - human reviews state["draft_email"]
app.invoke(initial_state, config=config)

# After human approves, resume - "send" now executes
app.invoke(None, config=config)

This pattern is essential for any agent operating in a high-stakes domain. The agent does all the cognitive work; the human makes the final call before irreversible side effects occur.

Advanced Error Handling: The Self-Correction Loop

Production agents fail - constantly. APIs time out. LLMs return JSON that does not parse. Web scrapers encounter rate limits. Code execution tools hit sandbox restrictions. The difference between a brittle demo and a reliable production agent is how gracefully it handles these inevitable failures. LangGraph’s conditional edges are the mechanism that makes self-correction possible.

The foundational error-handling pattern is the retry edge with a counter: add an error_count field to your state, increment it in the node that handles failures, and use a conditional edge to route either back to the failing node (for another attempt) or to a graceful termination node (after N attempts). Without the counter, a persistent failure causes an infinite loop; with it, the agent degrades gracefully.

from typing import TypedDict, Annotated, Optional
from langgraph.graph import StateGraph, END
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    tool_result: Optional[str]
    error_count: int
    last_error: Optional[str]
    task_complete: bool

MAX_RETRIES = 3

def call_api_node(state: AgentState) -> dict:
    try:
        result = call_external_api(state["messages"][-1].content)
        return {"tool_result": result, "error_count": 0}
    except Exception as e:
        return {
            "tool_result": None,
            "error_count": state["error_count"] + 1,
            "last_error": str(e),
        }

def route_after_api(state: AgentState) -> str:
    if state["tool_result"] is not None:
        return "process_result"
    elif state["error_count"] >= MAX_RETRIES:
        return "handle_failure"
    else:
        return "call_api"  # Loop back for retry

graph = StateGraph(AgentState)
graph.add_node("call_api", call_api_node)
graph.add_node("process_result", process_result_node)
graph.add_node("handle_failure", handle_failure_node)
graph.set_entry_point("call_api")
graph.add_conditional_edges("call_api", route_after_api)
graph.add_edge("process_result", END)
graph.add_edge("handle_failure", END)

app = graph.compile()

The Reflexion Pattern: LLM-Guided Self-Correction

Simple retry loops work for transient failures, but some failures require the agent to actually change its strategy. The Reflexion pattern routes the agent through a dedicated critique node after a failure, where the LLM analyzes what went wrong and proposes a revised approach before the next attempt:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o", temperature=0)

def critique_node(state: AgentState) -> dict:
    """
    After a failure, ask the LLM to reflect on what went wrong
    and generate a revised plan before the next attempt.
    """
    critique_prompt = f"""
    The previous attempt failed with this error:
    {state['last_error']}

    The original request was:
    {state['messages'][0].content}

    Analyze what went wrong and provide a revised approach.
    Be specific about what you will do differently.
    """

    response = llm.invoke([
        SystemMessage(content="You are a debugging assistant."),
        HumanMessage(content=critique_prompt),
    ])

    # The critique is appended to messages, so the next attempt
    # can see what the LLM learned from the failure
    return {
        "messages": [response],
        "error_count": state["error_count"],  # preserve count
    }

# In the graph: failure routes to critique, critique routes back to attempt
graph.add_node("critique", critique_node)
graph.add_edge("critique", "call_api")  # retry after critique

This pattern is remarkably effective for code generation agents - when generated code fails its tests, routing through a critique node that reads the test output and error message before regenerating the code typically produces a correct solution within two or three iterations.

Structured Output Validation

Beyond transient API failures, one of the most common failure modes is an LLM returning text that does not conform to the expected JSON structure. Catching this at the node boundary - before the malformed output propagates to downstream nodes - is critical:

from pydantic import BaseModel, ValidationError
from langchain_openai import ChatOpenAI

class SearchQuery(BaseModel):
    query: str
    num_results: int
    filter_domain: str | None = None

llm = ChatOpenAI(model="gpt-4o")
structured_llm = llm.with_structured_output(SearchQuery)

def generate_search_query_node(state: AgentState) -> dict:
    try:
        query = structured_llm.invoke(state["messages"])
        # query is guaranteed to be a valid SearchQuery instance
        return {"tool_result": query.model_dump()}
    except ValidationError as e:
        return {
            "error_count": state["error_count"] + 1,
            "last_error": f"Structured output validation failed: {e}",
        }

Using with_structured_output with a Pydantic model moves validation out of your application logic and into the LangChain integration layer. The LLM is instructed to produce JSON conforming to the schema, and any deviation raises a ValidationError that your error-handling edge can route around.

Multi-Agent Collaboration Patterns

Individual agents are powerful, but the most capable LangGraph deployments use multiple specialized agents working in concert - each optimized for a narrow task, supervised by an orchestrator that decomposes work and aggregates results. LangGraph supports this natively through its subgraph mechanism, where one StateGraph can invoke another as a node.

The Supervisor-Worker Pattern

The most broadly applicable multi-agent pattern is the supervisor-worker architecture. A supervisor agent receives a high-level task, decomposes it into subtasks, delegates each subtask to a specialized worker agent, and synthesizes the results. The supervisor does not do the domain-specific work itself - its job is planning, delegation, and synthesis.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal
import operator

llm = ChatOpenAI(model="gpt-4o", temperature=0)

# --- Worker Agents (simplified) ---

class WorkerState(TypedDict):
    task: str
    result: str

def research_worker(state: WorkerState) -> dict:
    response = llm.invoke([HumanMessage(content=f"Research: {state['task']}")])
    return {"result": response.content}

def coder_worker(state: WorkerState) -> dict:
    response = llm.invoke([HumanMessage(content=f"Write Python code for: {state['task']}")])
    return {"result": response.content}

research_graph = StateGraph(WorkerState)
research_graph.add_node("research", research_worker)
research_graph.set_entry_point("research")
research_graph.add_edge("research", END)
research_app = research_graph.compile()

coder_graph = StateGraph(WorkerState)
coder_graph.add_node("code", coder_worker)
coder_graph.set_entry_point("code")
coder_graph.add_edge("code", END)
coder_app = coder_graph.compile()

# --- Supervisor ---

class SupervisorState(TypedDict):
    original_task: str
    subtasks: list[dict]  # [{"type": "research"|"code", "task": str}]
    results: Annotated[list, operator.add]
    final_answer: str

def plan_node(state: SupervisorState) -> dict:
    """Decompose the task into subtasks."""
    plan_prompt = f"""
    Decompose this task into subtasks. Return a JSON list where each item
    has "type" (either "research" or "code") and "task" (the subtask description).

    Task: {state['original_task']}
    """
    # In practice, use structured output here
    response = llm.invoke([HumanMessage(content=plan_prompt)])
    subtasks = parse_subtasks(response.content)
    return {"subtasks": subtasks}

def delegate_node(state: SupervisorState) -> dict:
    """Execute all subtasks and collect results."""
    results = []
    for subtask in state["subtasks"]:
        worker = research_app if subtask["type"] == "research" else coder_app
        output = worker.invoke({"task": subtask["task"], "result": ""})
        results.append({"type": subtask["type"], "result": output["result"]})
    return {"results": results}

def synthesize_node(state: SupervisorState) -> dict:
    """Combine worker results into a final answer."""
    synthesis_prompt = f"""
    Original task: {state['original_task']}

    Worker results:
    {state['results']}

    Synthesize these into a comprehensive final answer.
    """
    response = llm.invoke([HumanMessage(content=synthesis_prompt)])
    return {"final_answer": response.content}

supervisor = StateGraph(SupervisorState)
supervisor.add_node("plan", plan_node)
supervisor.add_node("delegate", delegate_node)
supervisor.add_node("synthesize", synthesize_node)
supervisor.set_entry_point("plan")
supervisor.add_edge("plan", "delegate")
supervisor.add_edge("delegate", "synthesize")
supervisor.add_edge("synthesize", END)

supervisor_app = supervisor.compile()

The Coder-Reviewer Loop

A specialized and highly practical variant of multi-agent collaboration is the coder-reviewer loop: one agent writes code, a second agent executes it in a sandboxed environment and reviews the output, and if the tests fail, the result routes back to the coder with the test failure report attached. This mimics the TDD (Test-Driven Development) workflow at the agent level and produces dramatically more reliable code than single-pass generation.

def coder_node(state):
    """Generate or revise Python code based on the task and any prior failures."""
    context = "\n".join([
        f"Prior attempt failed:\n{state['last_error']}"
        if state.get("last_error") else ""
    ])
    prompt = f"Write Python code for: {state['task']}\n{context}"
    response = llm.invoke([HumanMessage(content=prompt)])
    code = extract_code_block(response.content)
    return {"generated_code": code}

def reviewer_node(state):
    """Execute the code in a sandbox and capture output or errors."""
    try:
        result = execute_in_sandbox(state["generated_code"], timeout=10)
        if result.tests_passed:
            return {"task_complete": True, "last_error": None}
        else:
            return {
                "error_count": state["error_count"] + 1,
                "last_error": result.test_output,
                "task_complete": False,
            }
    except TimeoutError:
        return {
            "error_count": state["error_count"] + 1,
            "last_error": "Code execution timed out (>10s)",
            "task_complete": False,
        }

def route_after_review(state) -> str:
    if state["task_complete"]:
        return END
    elif state["error_count"] >= MAX_RETRIES:
        return "handle_failure"
    return "coder"  # Loop back to coder with the error context

Avoiding Agent Storms

Multi-agent systems introduce a failure mode that single agents do not have: agent storms, where agents recursively delegate to each other, spawning an exponentially growing number of subtasks. Prevention requires explicit limits at the architecture level:

Set a maximum recursion depth in the supervisor’s state schema (delegation_depth: int) and enforce it in the routing function.
Rate limit inter-agent calls using a token bucket implemented as a shared counter in Redis.
Log every agent decision to a structured event log, with the calling agent’s ID as a field. Without this, debugging a storm after the fact is nearly impossible.
Design worker agents to be stateless where possible - workers that do not call other workers are immune to storm conditions.

Deploying LangGraph Agents to Production

The gap between a working agent in a notebook and a reliable agent in production is significant. Production introduces concurrent users, long-running tasks that span process restarts, observability requirements, and cost management - none of which appear during development.

Wrapping LangGraph in a FastAPI Service

The most portable self-hosted deployment wraps your compiled graph in an async FastAPI endpoint. LangGraph’s astream method lets you stream intermediate node outputs to the client in real time, which is important for long-running tasks where the user would otherwise wait silently for minutes:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from langgraph.checkpoint.postgres import PostgresSaver
import asyncio
import json

app = FastAPI()

# Use PostgresSaver for production (supports concurrent access)
checkpointer = PostgresSaver.from_conn_string(
    "postgresql://user:pass@localhost/agentdb"
)
agent_app = build_agent_graph().compile(checkpointer=checkpointer)

class TaskRequest(BaseModel):
    task: str
    thread_id: str

@app.post("/run-agent")
async def run_agent(request: TaskRequest):
    config = {"configurable": {"thread_id": request.thread_id}}
    initial_state = {
        "messages": [{"role": "user", "content": request.task}],
        "error_count": 0,
        "task_complete": False,
    }

    async def event_stream():
        async for event in agent_app.astream(initial_state, config=config):
            # Stream each node's output as a server-sent event
            yield f"data: {json.dumps(event)}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

@app.get("/agent-state/{thread_id}")
async def get_state(thread_id: str):
    config = {"configurable": {"thread_id": thread_id}}
    state = agent_app.get_state(config)
    return state.values

Observability with LangSmith

When an agent fails in production, you need to know which node it failed in, what the state looked like at that point, and what the LLM was prompted with. LangSmith provides this automatically for LangChain and LangGraph applications - every node execution, every LLM call, and every tool invocation is traced and stored:

import os

# Enable LangSmith tracing via environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "my-production-agent"

# No code changes required - LangGraph integrates automatically

With tracing enabled, every production invocation generates a trace in the LangSmith UI that shows the full execution path through the graph, latency per node, and the exact inputs and outputs of every LLM call. For debugging production incidents, this is indispensable.

LangGraph Platform vs. Self-Hosting

LangGraph Platform (the managed offering, formerly LangServe) handles persistence, scaling, and the REST API layer for you in exchange for a monthly fee. For teams without dedicated infrastructure engineers, it is a compelling option. For teams that need full control over data residency, self-hosting with the FastAPI pattern above and PostgresSaver is straightforward and adds no per-request cost beyond the LLM API calls themselves.

Cost Management for Multi-Step Agents

A single “task” processed by a multi-step agent can easily generate 20 to 50 individual LLM calls - the supervisor’s plan, each worker’s execution, critique nodes, retry attempts, and the final synthesis. At GPT-4o prices (approximately $15 per million output tokens as of early 2026), a complex task costs $0.30 to $1.50 per invocation. Before deploying at scale, instrument your agent to log token usage per node and per thread. Identify which nodes consume the most tokens and ask whether a smaller, cheaper model is sufficient for that node’s job. Routing simple classification decisions through gpt-4o-mini instead of gpt-4o can reduce overall costs by 40 to 60 percent with minimal quality impact.

Putting It All Together: A Complete Research Agent

Here is a condensed but complete agent that demonstrates all the patterns covered in this post - stateful graph, checkpointing, conditional routing with retry logic, and streaming deployment:

from typing import TypedDict, Annotated, Optional
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.checkpoint.sqlite import SqliteSaver
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from langchain_community.tools import DuckDuckGoSearchRun

llm = ChatOpenAI(model="gpt-4o", temperature=0)
search_tool = DuckDuckGoSearchRun()

MAX_RETRIES = 3

class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]
    search_results: list[str]
    final_report: Optional[str]
    error_count: int
    last_error: Optional[str]
    task_complete: bool

def search_node(state: ResearchState) -> dict:
    """Perform a web search based on the latest user message."""
    query = state["messages"][-1].content
    try:
        results = search_tool.run(query)
        return {"search_results": [results], "error_count": 0, "last_error": None}
    except Exception as e:
        return {
            "search_results": [],
            "error_count": state["error_count"] + 1,
            "last_error": str(e),
        }

def synthesize_node(state: ResearchState) -> dict:
    """Synthesize search results into a structured report."""
    context = "\n\n".join(state["search_results"])
    prompt = f"""
    Based on these search results, write a concise, accurate research report.
    Address the original query: {state['messages'][0].content}

    Search results:
    {context}
    """
    response = llm.invoke([HumanMessage(content=prompt)])
    return {
        "final_report": response.content,
        "messages": [AIMessage(content=response.content)],
        "task_complete": True,
    }

def route_after_search(state: ResearchState) -> str:
    if state["search_results"]:
        return "synthesize"
    elif state["error_count"] >= MAX_RETRIES:
        return END
    return "search"  # Retry

# Build the graph
graph = StateGraph(ResearchState)
graph.add_node("search", search_node)
graph.add_node("synthesize", synthesize_node)
graph.set_entry_point("search")
graph.add_conditional_edges("search", route_after_search)
graph.add_edge("synthesize", END)

checkpointer = SqliteSaver.from_conn_string("research_agent.db")
research_agent = graph.compile(checkpointer=checkpointer)

# Run the agent
config = {"configurable": {"thread_id": "research-001"}}
initial_state = {
    "messages": [HumanMessage(content="What are the latest developments in quantum computing?")],
    "search_results": [],
    "final_report": None,
    "error_count": 0,
    "last_error": None,
    "task_complete": False,
}

result = research_agent.invoke(initial_state, config=config)
print(result["final_report"])

What to Build Next

The patterns in this post - stateful graphs, checkpointed persistence, self-correction loops, and multi-agent supervision - are the building blocks of the most capable autonomous systems being built today. Once you have a working single-agent loop, the natural progression is to add a vector database for long-term memory (so the agent learns from past tasks), connect it to real tools via the Model Context Protocol, and deploy it behind a streaming API that your frontend can consume in real time.

LangGraph’s explicitness is its greatest strength: unlike black-box agent frameworks, you can read the graph definition and know precisely what will happen in every situation. That predictability is what makes the difference between an agent that impresses in a demo and one that earns trust in production.

Contents