Contents

AI Web Search Backends: Who Owns, Who Rents

Only Google Gemini and Microsoft Copilot run on a search index their parent company crawls itself. Anthropic Claude rents Brave Search , Mistral Le Chat rents Brave too, OpenAI ChatGPT rents Bing plus its own crawler, and Meta AI rents both. The key clue: Claude’s web_search tool exposes a literal BraveSearchParams field, and citation overlap with Brave runs about 86.7%.

Key Takeaways

  • Only Google and Microsoft own a web-scale search index.
  • Claude and Mistral both reportedly run on the Brave Search API.
  • ChatGPT uses Bing, OpenAI’s own crawler, and publisher deals.
  • IndexNow helps Bing-backed AI products, not Brave or Google.
  • Brave now acts as AI’s third search pole beside Google and Bing.

Only Five Companies Actually Crawl the Open Web

Before mapping each AI lab to its backend, the key constraint is simple: only five operators crawl the open web at scale. Everything else sold as a “search engine” resells one of those indexes. The five are Google, Microsoft Bing, Yandex, Baidu, and Brave Search, with Mojeek as a much smaller niche sixth.

Crawling the open web is expensive for dull, practical reasons. Storage runs into petabytes once you keep old snapshots. Freshness means recrawling popular URLs within minutes, not days. Anti-cloaking systems must catch pages that show one HTML version to crawlers and another to real browsers. Link-graph systems rebuild huge graphs of links to score authority. A 200-person AI startup can’t spin that up, and the bill is in the billions each year, not the millions.

Brave’s place on that list is the part most people miss. Brave Search is built on Tailcat , which Brave bought in March 2021 from former Cliqz team members. Cliqz was a privacy-search project backed by German publisher Hubert Burda Media. In Brave’s own announcement, Tailcat was “built on top of a completely independent index” and did not rely on Bing or Google for primary results. That independence made Brave usable as an AI backend. Brave also improves freshness through the Web Discovery Project, which gathers anonymous signals from consenting Brave Browser users and feeds new URL discovery back into the index.

The rest of the post follows from that setup: every “AI search” product sits on one of these five indexes. An AI lab without a $100M+/year search budget has three real options: Bing, Brave, or a downstream reseller like SerpAPI. The vendor each lab picks shows part of its strategy. The harder choice is when to call search, which index to query, and how to merge the results. That is where agentic RAG retrieval orchestration shows up across the products below.

ChatGPT Runs on Bing, OAI-SearchBot, and Publisher Licensing

OpenAI’s web search stack is the best-documented in this field, and it also has the most layers. The 2023 product was simple: Microsoft and OpenAI launched the new Bing with ChatGPT built in on February 7, 2023. “Browse with Bing” was the named ChatGPT feature, and both sides said so. In October 2024, OpenAI replaced that mode with ChatGPT Search , saying it used “third-party search providers” without naming them. Given the shared infrastructure and the partnership history, Bing still looks like the main source in that mix, even though OpenAI stopped naming it.

The second layer is OpenAI’s own crawler. OAI-SearchBot is documented on the OpenAI platform docs as the user agent that fetches public web content for ChatGPT Search. It is separate from GPTBot, which gathers training data, and from ChatGPT-User, which fetches a specific URL when a user asks for it. Third-party log studies say OpenAI’s crawl volume was about triple its mid-2025 level. That fits a strategy where OpenAI builds a curated index of high-value sources instead of leaning only on Bing’s link graph.

Microsoft Bing and ChatGPT joint launch hero graphic from the February 2023 reinvention announcement, showing the new Bing search and Edge browser AI experience side by side
Microsoft and OpenAI's February 2023 announcement of the new Bing with ChatGPT built in
Image: Microsoft Blog

The third layer is publisher licensing. OpenAI has signed direct content deals with News Corp, Associated Press, Axel Springer, the Financial Times, Vox Media, Condé Nast, Reuters, Time, Le Monde, Hearst, Prisa, and The Atlantic, among others. Those deals skip open-web crawling: licensed content reaches ChatGPT through structured feeds and API integrations. The result is a stack that still looks Bing-shaped at the URL graph layer, but it also includes structured material from premium publishers. That helps explain why ChatGPT shows denser citations on news-heavy queries than raw web search.

Anthropic launched Claude web search on March 20, 2025. One day later, TechCrunch reported that Anthropic appears to be using Brave based on Claude tool-call internals. Simon Willison surfaced the key clue that same day: Claude’s web_search tool exposes a parameter literally named BraveSearchParams. Separate overlap studies found about 86.7% citation match between Claude’s cited results and Brave’s top non-sponsored results. That is too close to ignore, and it lines up with Anthropic adding Brave Search to its public subprocessor list at about the same time.

Anthropic has never publicly named the provider in its own product surfaces. The official Claude API docs for the web search tool describe behavior, encryption, citation handling, and rate limits, but they do not name the underlying index. That is why this post tags the claim as reported, not confirmed. The evidence is as strong as it gets short of an official statement, but it is still not the same as Mistral’s case below.

Mistral Le Chat is the officially confirmed half of the Brave story. Mistral’s Le Chat product runs Brave-powered search , a link made public in February 2025 alongside the Black Forest Labs image partnership. Le Chat’s premium tier adds direct AFP and AP news licensing on top of the Brave layer, the same hybrid pattern OpenAI uses with Bing. Two of the four largest non-Big-Tech AI labs now run on the same independent index. That is the structural fact that makes Brave Search important beyond its own consumer product. It moved from privacy-curio search engine to load-bearing AI infrastructure.

Brave Search results page screenshot from October 2025 showing the independent search engine’s user interface with web results, sidebar widgets, and the Brave branding
Brave Search's consumer UI in October 2025, the same independent index that backs Claude and Mistral Le Chat behind the scenes
Image: Wikimedia Commons , CC-BY-SA 4.0

A note on Claude’s own crawlers, since people often mix them up with the search backend. ClaudeBot, anthropic-ai, and Claude-User are documented user agents for training-data collection and on-demand URL fetches. They do not power the live web_search tool. Those are separate systems, so blocking ClaudeBot in robots.txt does not change whether Claude finds your URL when a user asks it to search.

Gemini and Copilot Are the Only AI Products That Own Their Search Index

Google Gemini and Microsoft Copilot are the only AI products that do not need to rent retrieval. Both sit on a fully owned, web-scale index built and run by the parent company. For Gemini, Google documents the setup as “Grounding with Google Search” on the Gemini API docs and as Vertex AI grounding on the Google Cloud side. When grounding is on, the API returns a groundingMetadata object with the queries the model ran, the chunks it pulled back, and the citation links shown to the user. Google charges grounded Gemini calls per search query, which fits internal Google Search access rather than a rented index.

That retrieval layer is the same one behind AI Overviews and the broader Search Generative Experience. The Gemini API is basically a thin wrapper over Google Search’s chunk store and ranker. The LLM layer handles query rewriting, multi-hop fan-out, and citation rendering. The practical effect is simple: Gemini’s freshness and coverage move with Google Search itself, so any URL Google has indexed is already in scope. No extra IndexNow ping changes that.

Microsoft Copilot is the Bing version of the same setup. Copilot runs the Prometheus orchestrator on top of the Bing index, with GPT-4o and now GPT-5-class successors as the model layer. Bing infrastructure is shared across consumer Copilot, Microsoft 365 Copilot, and Bing Chat. That is why those products cite in near-identical ways on the same query. For site owners, getting indexed by Bing covers Bing search, Copilot in Edge, and much of ChatGPT’s live web retrieval too.

No startup is going to copy this. Open-web crawling at Google or Bing scale needs petabytes of storage, real-time anti-spam systems, distributed link-graph indexing, and years of relevance signals. The capex floor sits in the low billions each year. Even a well-funded AI lab has little reason to build that from scratch when Brave will rent it for far less.

Perplexity, Meta AI, and xAI Grok: Hybrid and Undisclosed

The remaining three players are harder to pin down. Perplexity started fully on Bing in 2022 and has been moving away from it ever since. The current stack looks hybrid. Perplexity’s own crawlers, PerplexityBot and Perplexity-User, maintain a custom index of about 5 billion URLs. Bing stays as the long-tail fallback when the in-house index cannot answer well. That gives Perplexity more control over citations on its strongest topics while keeping Bing coverage on obscure queries. It also means the same question can hit very different retrieval paths depending on what the router picks. A hands-on self-hosted AI search stack with SearXNG shows the same routing tradeoffs at smaller scale.

Meta AI is the only dual integration in the field. Meta announced its Bing partnership in September 2023, then added Google search results in April 2024. Meta AI now routes between the two backends by topic. Meta has also said it is building its own search engine to cut that dependence. There is still no public sign that a Meta-owned index serves live traffic. Renting from two vendors at once is unusual, costly, and harder to audit for citation provenance.

xAI Grok is the least transparent of the major AI products. The official Grok web search docs confirm two distinct tools, web_search and x_search, and the Live Search guide says the system can fan out to web, X, news, and RSS. The docs never name the underlying web index provider. Every other lab in this post has at least a partnership announcement, a subprocessor disclosure, or a reverse-engineered parameter name. xAI has none of those. That opacity is useful information on its own: xAI is the only major operator with fully undisclosed retrieval. If you care about jurisdiction, content licensing, or bias-surface analysis, that missing detail is relevant. It also leaves site owners with no clear lever beyond standard SEO.

What Does This Mean for AI Search Visibility on Your Site?

If you run a site and want AI chatbots to cite it, the backend map tells you which discovery channels actually help. Once you know who rents from whom, the rest is mechanical.

IndexNow is the fastest discovery channel for anything on the Bing side. One POST to api.indexnow.org fans out to Bing, Yandex, Naver, Seznam, and Yep, the five participating engines. Google tested the protocol in 2021 but never joined , and remains absent years later . Brave has not joined either. So IndexNow can speed ChatGPT at the URL-graph layer, Microsoft Copilot, Perplexity’s Bing fallback, and Meta AI’s Bing path. In practice, Bing often reflects new URLs within minutes, which is why many Google-first publishers still wire IndexNow into their deploy flow.

What IndexNow does not speed up is just as important: Claude, Mistral Le Chat, Gemini, and xAI Grok. Claude and Le Chat depend on Brave’s own crawl schedule. Gemini follows Google Search’s normal crawl-and-index pipeline, which IndexNow does not touch. Grok’s backend is still undisclosed. On the Brave side, site owners get only a single-URL submit form . There is no console, no sitemap submission, no URL inspection, and no feedback on whether a submitted URL got indexed. That leaves standard SEO plus Web Discovery Project signals from Brave Browser users as the only clear lever.

The practical playbook is short. Ship a sitemap.xml referenced from robots.txt so crawlers can find new pages. Wire IndexNow into the deploy pipeline for Bing-backed AI products. Use Google Search Console for Gemini and AI Overviews coverage, and Bing Webmaster Tools for the Bing side, because those are the only two backends with a real feedback loop. For Claude and Le Chat, you mostly wait for Brave.