Self-Hosted AI Agent Frameworks in 2026: Local-First Compared

2026-06-09 9 minutes

Four distinct robots in a sealed glass workshop, each cabled to one central llama-stamped engine, with an eight-link reliability gauge fading at the end.

Contents

A self-hosted AI agent needs to run entirely on your own Ollama or vLLM with no OpenAI key. All four major frameworks claim that support, but only LangGraph and CrewAI wire to a local model with zero workarounds. AutoGen needs a client swap, and Flowise needs one base-URL field. The model, not the framework, is the real reliability ceiling.

Key Takeaways

All four run on Ollama, but only LangGraph and CrewAI need zero workarounds.
The small local model, not the framework, is what breaks tool calling.
Flowise is the only true no-code pick; LangGraph is the most code-heavy.
Most framework docs still assume an OpenAI key, so budget setup time.
Use Qwen3 or larger for agents; smaller models drop tool calls under load.

Why Local-First Fitness Is the Axis That Counts

Most “best agent framework” roundups assume you have an OpenAI key and a credit card. The first code sample spins up a hosted client, and the “swap to local” path is a footnote if it shows up at all. Self-hosters ask a sharper question about whether any of these run on their own box with no cloud call.

To answer that, I graded each one on local-first fitness, which I treat as three concrete things. The first is how much config it takes to point the tool at an Ollama, vLLM, or llama.cpp endpoint. The second is how reliably it calls tools with a model small enough to self-host. The third is whether the docs assume a hosted provider by default.

The third axis hides a trap. Per-call tool reliability stacks up across an agent loop. A 95% success rate per call over 8 steps lands at only about 66% end to end. According to PromptQuorum’s local tool-calling benchmarks , the Qwen3 series is the most stable local family for tool calls. The Ollama structured-outputs docs show that JSON mode is the main fix.

Here is what I saw on my own Ollama box. The wiring took the easy 20 minutes, but the real wall was the model. Small models that pass a single tool-call demo start to return prose instead of a tool_calls field once the loop runs five or more steps. A 7B model fell apart fast, while a Qwen3 30B-class model was the first that held up across a full run on my hardware. Wrapping calls with format="json" plus a Pydantic retry loop saved the runs that slipped.

This post also grades a second lens: the no-code to code spectrum. It runs from Flowise’s visual canvas to LangGraph’s explicit Python graph. Move along it and you trade flexibility and debugging power for speed.

Running an AI Agent Framework on Ollama With No OpenAI Key

This is the exact query a self-hoster types. The phrase “supports Ollama” hides a wide range of real effort. Here is the config friction per framework.

LangGraph : install langchain-ollama, instantiate ChatOllama(model="qwen3"), and pass it as the node’s model. No hosted client sits anywhere in the path. The LangChain Ollama integration docs cover Qwen3, Gemma3, GPT-OSS, and DeepSeek-R1.

from langchain_ollama import ChatOllama
llm = ChatOllama(model="qwen3", base_url="http://localhost:11434")
# pass llm into your LangGraph node

CrewAI : set llm="ollama/qwen3" using the LiteLLM-style provider prefix, or pass an LLM object with a local base_url. There is no LangChain dependency underneath. The CrewAI LLM connections docs walk through both paths.

from crewai import LLM
llm = LLM(model="ollama/qwen3", base_url="http://localhost:11434")

AutoGen : use the native OllamaChatCompletionClient from autogen-ext instead of the default OpenAI client. For models with no native Ollama client, you route through LiteLLM. The AutoGen models tutorial covers both. The friction is small but real. You must swap the default client by hand, because the default is hosted.

Flowise : drop a ChatOllama node on the canvas and set the base URL to http://host.docker.internal:11434. No API key field is required. The Flowise GitHub repo ships these local nodes out of the box, alongside LlamaCpp and HuggingFace nodes.

Flowise builds agents on a drag-and-drop canvas, where a local ChatOllama node needs only a base URL.

Image: Flowise GitHub repo

One note on production serving. All four can target a vLLM OpenAI-compatible endpoint. You override the base URL and use a dummy key. The override is a single line.

llm = ChatOpenAI(base_url="http://localhost:8000/v1", api_key="dummy", model="qwen3")

vLLM is the production-serving path while Ollama is the dev-layer path, per the aicompetence local-stack comparison . For a fuller breakdown of each local runtime , including where llama.cpp and LM Studio fit, the inference-engine genealogy explains why these two cover the two ends of the spectrum.

Local-First Scorecard

Framework	Runs on Ollama out of the box?	Local tool-calling reliability	OpenAI assumed in docs?	Code spectrum	Best for
LangGraph	Yes, native via `langchain-ollama`	Good with Qwen3-class; you control retries and grammar	Partly; examples mix hosted and local	Full code (Python graph)	Engineers who want explicit control and durable state
CrewAI	Yes, `ollama/<model>` provider string	Good; the LiteLLM layer adds some prompt overhead	Partly; quickstart shows hosted first	Mostly code (declarative Python)	Fast multi-agent crews without LangChain
AutoGen	Yes, but swap to `OllamaChatCompletionClient`	OK; needs careful client and model choice	Yes; defaults are OpenAI clients	Code (conversational Python)	Multi-agent conversation patterns and research
Flowise	Yes, `ChatOllama` node, set base URL	Depends entirely on the chosen model	No; local nodes are first-class	No-code (visual canvas)	Non-developers and rapid prototyping

The Capability Matrix

Before the verdicts, here is the base comparison every reader needs. One row per framework, so you can rule options out at a glance.

The programming model differs sharply. LangGraph is a graph of nodes and edges with explicit control flow. CrewAI is role-based crews of agents with tasks, plus event-driven Flows. AutoGen is conversational, so agents talk to each other. Flowise is a visual canvas of connected nodes.

The language choice shapes your team’s options. LangGraph supports Python and JS/TS. CrewAI is Python. AutoGen is Python, with a .NET track now folded into the Microsoft Agent Framework. Flowise runs as a Node.js app, so you build visually and write no code at all.

On licensing, all four are free to self-host. LangGraph, CrewAI, and AutoGen are MIT. AutoGen docs are CC-BY 4.0. Flowise is Apache 2.0. The Flowise site confirms no flow, user, or run caps when self-hosted.

State and persistence is where durability lives. LangGraph has first-class checkpointers (MemorySaver, SQLite, Postgres) for resumable runs. CrewAI offers @persist() plus SQLite checkpointing and an on-disk memory layer built on LanceDB, described in CrewAI’s cognitive memory writeup . AutoGen can save and load agent and team state. Flowise stores chat state and flows in its own database.

Human-in-the-loop support splits the field too. LangGraph’s interrupt() pauses the graph, saves state, and resumes on approval. That is the strongest HITL story here. CrewAI supports human input on tasks. AutoGen uses a UserProxyAgent for human turns. Flowise has human-input nodes in flows.

One status flag deserves attention. AutoGen entered maintenance mode in early 2026. Microsoft now points new builds at the Microsoft Agent Framework , which merges AutoGen with Semantic Kernel. AutoGen still gets community bug fixes. Still, self-hosters should weigh this signal before starting a new build on it.

Framework	Model	Language	License	State / persistence	Human-in-the-loop
LangGraph	Graph (nodes + edges)	Python, JS/TS	MIT	Checkpointers: memory, SQLite, Postgres	`interrupt()` pause and resume (strongest)
CrewAI	Role-based crews + Flows	Python	MIT	`@persist()`, SQLite checkpoint, LanceDB memory	Human input on tasks
AutoGen	Conversational multi-agent	Python (.NET via MAF)	MIT (docs CC-BY 4.0)	Save and load agent + team state	`UserProxyAgent` human turns
Flowise	Visual no-code canvas	None (Node.js app)	Apache 2.0	Built-in DB for flows and chat state	Human-input nodes

Control Versus Convenience: Where Each Framework Sits

The second lens maps each tool on the flexibility and debugging axis, then turns it into a who-it-is-for verdict. This is the opinionated payoff.

Spectrum diagram placing Flowise as no-code, CrewAI as declarative code, AutoGen as conversational code, and LangGraph as explicit graph code

LangGraph gives the most control with the steepest curve. You hand-build the state graph. That buys maximum flexibility and the best debugging story, since you can inspect and replay any node through checkpoints. The cost is a hard learning curve. The 2026 line shipped better interrupt() behavior and broad Python support. Verdict: pick it when correctness, durability, and step-level visibility beat speed to first demo.

CrewAI is fast, opinionated, and fairly code-light. Role-and-task abstractions get a crew running in a few dozen lines, and it stands alone with no LangChain underneath. CrewAI claims it runs up to 5.76x faster than LangGraph on some tasks per the CrewAI repo . Treat that vendor number with care. Verdict: best when you want agents that collaborate quickly and the role metaphor fits.

AutoGen is conversational and research-leaning. The agents-in-conversation model is elegant for brainstorming and research. However, the maintenance-mode status and OpenAI-first defaults make it a weaker bet for a new production build. Verdict: great for learning multi-agent patterns, but check the Microsoft Agent Framework before you commit new work.

Flowise is no-code, fastest to a working flow, and the hardest to version-control. The visual canvas is great for prototyping and for handing agent-building to non-developers, and local nodes are first-class. The tradeoff is weaker source control, harder debugging, and a ceiling on custom logic. Verdict: pick it when the builder is not a coder or when you need a demo today.

The spectrum runs Flowise (no-code), then CrewAI (declarative code), then AutoGen (conversational code), then LangGraph (graph code). Move right and you buy flexibility and production-readiness. Move left and you buy speed.

Observability and Debugging Your Self-Hosted Agent

A self-hosted agent you cannot see inside is a risk. Here is what each one gives you, with weight on what works without a paid cloud account.

LangGraph offers native tracing through LangSmith (hosted, free tier) plus full local state inspection through checkpoints, so you can replay a run node by node. It has the strongest debugging story of the four. CrewAI ships built-in tracing and works with open tools like Arize Phoenix for fully local traces, as shown in this CrewAI, Ollama, and Phoenix walkthrough .

AutoGen exposes logging and OpenTelemetry-style telemetry. The conversation transcript itself also reads as a trace of how the agents reasoned. Flowise shows a visual run view in the UI that walks node by node, which is the most beginner-friendly form here. Still, that view is shallower than code-level tracing when failures get complex.

One tip applies to all four. Wrap local tool calls with format="json" plus Pydantic validation and a retry loop. That catches and recovers from the structured-output failures that dominate local-model debugging, as the Instructor Ollama guide shows. On my own runs, this one pattern turned flaky loops into repeatable ones.