Contents

AI Web Search Backends: Who Owns, Who Rents

Only Google Gemini and Microsoft Copilot run on a search index their parent actually crawls. Anthropic Claude rents Brave Search , Mistral Le Chat rents Brave too, OpenAI ChatGPT rents Bing plus its own crawler, and Meta AI rents both. The non-obvious tell: Claude’s web_search tool exposes a literal BraveSearchParams field, and citation overlap with Brave runs around 86.7%.

Key Takeaways

  • Only Google and Microsoft own a true web-scale search index.
  • Claude and Mistral both reportedly run on the Brave Search API.
  • ChatGPT pulls from Bing, OpenAI’s own crawler, and publisher licensing deals.
  • IndexNow tells Bing about new pages but not Brave or Google.
  • Brave is now AI’s third search pole, alongside Google and Bing.

Only Five Companies Actually Crawl the Open Web

Before mapping each AI lab to its backend, the underlying constraint matters: the open web at scale is crawled by exactly five operators. Everything else marketed as a “search engine” is a reseller of one of those five indexes. The five are Google, Microsoft Bing, Yandex, Baidu, and Brave Search, with Mojeek sometimes counted as a niche sixth that maintains its own (much smaller) index.

Crawling the open web is a moat for boring infrastructure reasons. Storage costs run into petabytes once you keep historical snapshots. Freshness latency means re-crawling popular URLs on the order of minutes, not days. Anti-cloaking infrastructure has to detect when a page serves different HTML to a crawler than to a real browser. Link-graph computation rebuilds a directed multigraph of trillions of edges to compute authority signals. None of that can be bootstrapped by a 200-person AI startup, and the capex involved is measured in billions of dollars per year, not millions.

Brave’s place in this list is the part most readers miss. Brave Search is built on top of Tailcat , which Brave acquired in March 2021 from a team formerly at Cliqz, a privacy-search project funded by German publisher Hubert Burda Media. Tailcat is, per Brave’s own announcement, “built on top of a completely independent index” and explicitly does not rely on Bing or Google for its primary results. That independence is what made Brave acquirable as an AI backend in the first place. Brave also augments freshness through the Web Discovery Project, which collects anonymized telemetry from consenting Brave Browser users and feeds new URL discovery back into the index.

The takeaway for the rest of the post is that every “AI search” experience you have is one of these five indexes wearing different chrome. An AI lab without a $100M+/year search-infra budget has three realistic vendors to call: Bing, Brave, or (with caveats) a downstream reseller like SerpAPI. The vendor each lab picked tells you something real about its strategy. The picking decision (when to call search at all, which index to query, how to merge results) is where agentic RAG retrieval orchestration lives, and it shows up in every product covered below.

ChatGPT Runs on Bing, OAI-SearchBot, and Publisher Licensing

OpenAI’s web search stack is the most-documented in the field, and also the most layered. The 2023 product was simple: Microsoft and OpenAI launched the new Bing with ChatGPT built in on February 7, 2023. “Browse with Bing” was the explicit ChatGPT feature, and the dependency was acknowledged on both sides. In October 2024, OpenAI replaced the legacy browse mode with ChatGPT Search , which it described as relying on “third-party search providers” without naming any of them. The infrastructure overlap and the partnership history make Bing the dominant signal in that mix, even though OpenAI never restated it explicitly.

The second layer is OpenAI’s own crawler. OAI-SearchBot is documented on the OpenAI platform docs as the user agent that fetches public web content for ChatGPT Search. It is intentionally separate from GPTBot, which is the training-data crawler, and from ChatGPT-User, which is the on-demand fetcher invoked when a user asks ChatGPT to look at a specific URL. Third-party log analyses have put OpenAI’s crawl volume at roughly triple what it was in mid-2025, which is consistent with OpenAI building out a curated index of high-value sources rather than relying entirely on Bing’s link graph.

Microsoft Bing and ChatGPT joint launch hero graphic from the February 2023 reinvention announcement, showing the new Bing search and Edge browser AI experience side by side
Microsoft and OpenAI's February 2023 announcement of the new Bing with ChatGPT built in
Image: Microsoft Blog

The third layer is publisher licensing. OpenAI has signed direct content deals with News Corp, Associated Press, Axel Springer, the Financial Times, Vox Media, Condé Nast, Reuters, Time, Le Monde, Hearst, Prisa, and The Atlantic, among others. Those deals bypass open-web crawling entirely: licensed content flows into ChatGPT through structured feeds and API integrations rather than through the bot stack. The practical effect is that ChatGPT’s answers are Bing-shaped at the URL graph layer, but seeded with structured content from premium publishers, which is what gives the assistant its noticeably higher citation density on news-adjacent queries compared to raw web search.

Anthropic launched Claude web search on March 20, 2025. The next day, TechCrunch reported that Anthropic appears to be using Brave to power those searches, based on inspecting the Claude tool-call internals. The smoking gun, surfaced by Simon Willison the same day, is that Claude’s web_search tool exposes a parameter literally named BraveSearchParams. Independent overlap analyses report roughly 86.7% citation match between Claude’s cited results and Brave’s top non-sponsored results, a percentage too high to be coincidence and consistent with Anthropic also adding Brave Search to its public subprocessor list around the same date.

Anthropic has never publicly named the provider in its own product surfaces. The official Claude API docs for the web search tool describe behavior, encryption, citation handling, and rate limits, but stay silent on the underlying index. That silence is why this post tags the claim as reported, not confirmed. The technical evidence is as strong as it gets short of an official announcement, but it would be a category error to treat it as identical to Mistral’s case below.

Mistral Le Chat is the officially confirmed half of the Brave story. Mistral’s Le Chat product runs Brave-powered search , publicly acknowledged in February 2025 alongside the Black Forest Labs image partnership. Le Chat’s premium tier adds direct AFP and AP news licensing on top of the Brave layer, the same hybrid pattern OpenAI uses with Bing. Two of the four largest non-Big-Tech AI labs running on the same independent index is the structural fact that makes Brave Search relevant beyond its own consumer product: it has moved from privacy-curio search engine to load-bearing AI infrastructure.

Brave Search results page screenshot from October 2025 showing the independent search engine’s user interface with web results, sidebar widgets, and the Brave branding
Brave Search's consumer UI in October 2025, the same independent index that backs Claude and Mistral Le Chat behind the scenes
Image: Wikimedia Commons , CC-BY-SA 4.0

A note on Claude’s own crawlers, since they are routinely confused with the search backend. ClaudeBot, anthropic-ai, and Claude-User are documented user agents tied to training-data collection and on-demand URL fetches. They do not feed the live web_search tool. The training crawl and the live search index are different systems, and that distinction matters for SEO discussions: blocking ClaudeBot in robots.txt does not affect whether Claude finds your URL when a user asks it to search.

Gemini and Copilot Are the Only AI Products That Own Their Search Index

Google Gemini and Microsoft Copilot are the two AI products that don’t have to negotiate with anyone for retrieval. Both run on a fully-owned, web-scale index built and maintained by the parent company. For Gemini, the architecture is documented as “Grounding with Google Search” on the Gemini API docs, and as Vertex AI grounding on the Google Cloud side. When grounding is enabled, the API returns a groundingMetadata object containing the queries the model executed, the chunks it pulled back, and the citation links surfaced to the user. Google bills grounded Gemini calls per search query the model decides to issue, a pricing model consistent with internal API access rather than third-party reseller fees.

The retrieval layer behind Gemini grounding is the same retrieval layer behind AI Overviews and the broader Search Generative Experience. The Gemini API is essentially a thin wrapper over Google Search’s chunk-store and ranker, with the LLM-side glue handling query rewriting, multi-hop fan-out, and citation rendering. A direct consequence is that Gemini’s freshness and coverage move with Google Search itself: any URL Google has indexed is in scope, no extra IndexNow ping required.

Microsoft Copilot’s architecture is the Bing equivalent of the same setup. Copilot runs the Prometheus orchestrator on top of the Bing index, with GPT-4o (and now GPT-5-class successors) as the foundation model. Bing’s infrastructure is shared between consumer Copilot, Microsoft 365 Copilot, and the Bing Chat surface, which is why those products have nearly identical citation behavior on the same query. From a webmaster’s perspective, getting indexed by Bing is the same operation whether the consumer is reading Bing search results, asking Copilot in Edge, or hitting GPT-4o through ChatGPT (because ChatGPT, downstream, also reads from Bing).

Why no startup can replicate this: open-web crawling at Google or Bing scale needs petabytes of storage, real-time anti-spam, distributed link-graph indexing, and a decade of accumulated relevance signals. The capex floor is in the low billions of dollars annually. Even a well-funded AI lab with a $10B war chest has no rational reason to build that from scratch when Brave will rent it for a fraction of the cost.

Perplexity, Meta AI, and xAI Grok: Hybrid and Undisclosed

The remaining three players are messier to characterize. Perplexity started fully on Bing in 2022 and has been migrating off it ever since. The current stack is hybrid: Perplexity’s own crawler (user agents PerplexityBot and Perplexity-User) maintains a custom index of roughly 5 billion URLs, with Bing as the long-tail fallback for queries the in-house index cannot answer well. The hybrid model gives Perplexity room to differentiate on citation quality on its strongest verticals while still benefiting from Bing’s coverage on niche queries, but it also means the same query can hit very different retrieval paths depending on what the router decides. If you want a hands-on view of what that hybrid actually looks like, building a self-hosted AI search stack with SearXNG on your own hardware exposes the same routing problems Perplexity is solving at scale.

Meta AI runs the only dual integration in the field. Meta announced its Bing partnership in September 2023, then added Google search results in April 2024, and the Meta AI assistant now routes per-query between the two backends based on the topic. Meta has also stated publicly that it is building its own search engine to reduce dependency on both vendors, though there is still no public evidence that an in-house Meta index is yet serving live traffic. The dual-rent setup is unusual and probably temporary; running two reseller contracts in parallel is expensive and complicates citation provenance.

xAI Grok is the least transparent of the major AI products. The official Grok web search docs confirm two distinct tools, web_search and x_search, and the Live Search guide describes “an agentic search component that fans out to web, X, news, RSS.” What the docs never name is the underlying web index provider. Every other lab in this post has at least a partnership announcement, a subprocessor disclosure, or a reverse-engineered parameter name. xAI has none of those. That opacity is itself information: it is the only major operator running fully undisclosed retrieval, which is relevant if you care about jurisdiction, content licensing, or bias-surface analysis. It also means there is no actionable channel for site owners who want their content to surface in Grok beyond standard SEO.

What Does This Mean for AI Search Visibility on Your Site?

If you run a site and want it cited by AI chatbots, the backend map decides which discovery channels actually do anything. The mapping is mechanical once you know who rents from whom.

IndexNow is the fastest discovery channel for everyone on the Bing side. A single POST to api.indexnow.org fans out to Bing, Yandex, Naver, Seznam, and Yep, the five participating engines. Google tested the protocol in 2021 but never joined , and remains absent years later ; Brave has not joined either. So IndexNow accelerates ChatGPT (Bing-backed at the URL graph layer), Microsoft Copilot (Bing-native), Perplexity’s long-tail fallback (Bing), and Meta AI’s Bing path. In practice, Bing-side discovery typically reflects new URLs within minutes of a deploy, which is why even Google-first publishers tend to wire IndexNow on top of their normal pipeline.

What IndexNow does not accelerate: Claude (Brave’s own crawler discovers updates on its own schedule), Mistral Le Chat (same), Gemini (Google Search’s standard crawl-and-index pipeline, which IndexNow does not touch), and xAI Grok (undisclosed backend). For the Brave side specifically, the only webmaster surface is a single-URL submit form that fires and forgets. There is no console, no sitemap submission, no URL inspection, and no feedback on whether a submitted URL actually got indexed. Standard SEO hygiene plus the Web Discovery Project signal from Brave Browser users is the only lever, which means Brave-backed AI products discover your content organically or not at all.

The practical playbook that falls out of all of this is short. Ship a sitemap.xml referenced from robots.txt so every crawler can self-discover. Wire IndexNow into the deploy pipeline to cover the Bing-family AI products. Use Google Search Console for Gemini and AI Overviews coverage, and Bing Webmaster Tools for the Bing side, because those are the only two backends that give you a real feedback loop. Accept that for Claude and Le Chat, you are in Brave’s hands.