Pinterest's MCP Deployment: 66,000 Monthly Invocations and 7,000 Engineering Hours Saved

Pinterest’s Model Context Protocol rollout hits 66,000 calls per month across 844 active users. It’s the most detailed public case study of MCP at scale. A central registry, two-layer auth, safety reviews, and human checkpoints set this apart from a prototype. The payoff: about 7,000 engineering hours saved each month.

The story comes from Pinterest’s engineering blog post in March 2026 and later coverage by InfoQ . For any team weighing MCP for live use, this rollout is a solid guide.

Cloud-Hosted Domain Servers, Not a Monolith

Pinterest chose many small, domain-specific MCP servers over one big stack. Each server holds a tight set of tools with its own access rules. Teams own their servers and define the tool surface . The platform handles deploys, scaling, and the rest of the service’s life.

MCP architecture showing how AI application hosts connect to multiple MCP servers through dedicated client connections
The Model Context Protocol connects AI applications to multiple backend servers through dedicated client connections
Image: Model Context Protocol

Four servers are publicly documented:

MCP ServerFunctionScope
Presto MCPEngineers query data through natural language instead of writing SQL against internal dashboardsRestricted to Ads, Finance, infrastructure teams
Spark MCPDiagnoses job failures, summarizes logs into structured root-cause analysesAvailable only in relevant support channels
Airflow MCPWorkflow orchestration for managing and monitoring data pipelinesTeam-specific access
Knowledge MCPInstitutional knowledge access across internal documentation and debugging sourcesHorizontal - supports all teams

The cloud-hosted setup is a clear choice. Engineers don’t run MCP servers on their laptops. A central setup gives Pinterest one place for logs, one set of safety rules, and the option to scale each server on its own. This tracks with the shift toward enterprise-ready MCP in 2026 . Teams now move past local-first setups toward managed server fleets. Smaller shops still pull the opposite way, pairing these tool servers with agent frameworks that stay fully local-first so the model and orchestration never leave their own infrastructure.

There are likely more than four servers in production. The blog post names these four, but the registry pattern and shared deploy pipeline suggest the company has scaled past the public set.

The Central Registry: Source of Truth for Server Discovery

The registry turns a set of separate MCP servers into a ruled set. Without it, server sprawl creates the same shadow-IT problem that loose API growth causes in big companies. Individual developers hit a consumer version of the same trust gap with unvetted endpoints .

Pinterest’s registry is “the source of truth for which MCP servers are approved and how to connect to them.” Only registry-listed servers count as approved for production. The registry has two faces:

  • Web UI (for humans): Engineers browse to find servers, see who owns them, check support channels, and view review status.
  • API (for machines): AI clients look up servers, check them, and confirm user access before calling a tool. An agent can’t call a tool the user isn’t cleared to use.

This two-face design keeps rules the same across every client surface: LLM chat, IDE plugins, and team chat tools (likely Slack). If a server isn’t in the registry, no surface can call it.

The registry pattern fills a gap the MCP 2026 roadmap calls out. There’s no standard way for a registry or crawler to learn what a server does without first hooking in. The roadmap notes work on a shared metadata format served via .well-known endpoints. That would make Pinterest’s pattern easier to copy. For now, Pinterest built their own lookup layer ahead of the spec.

Two-Layer Authentication and Least-Privilege Enforcement

Pinterest’s safety model deserves the closest read from any team weighing MCP. The two-layer auth design splits human identity from service identity. Group gating keeps tool access from creeping.

Layer 1, end-user JWT flow: OAuth against Pinterest’s auth stack. An Envoy proxy maps user creds to forwarded headers (X-Forwarded-User, X-Forwarded-Groups). Tools use a light @authorize_tool tag for per-tool checks. This layer drives every human-in-the-loop call.

Layer 2, SPIFFE-based mesh identity: For low-risk, read-only traffic between services with no human in the loop. SPIFFE (Secure Production Identity Framework for Everyone) gives each workload a crypto ID. This layer uses mesh IDs in place of human JWTs. It handles agent calls that don’t change data.

Business-group gating: Servers like Presto MCP lock down access to set groups (Ads, Finance, infra), even though the server is in the registry. A wider surface doesn’t mean wider data access.

Human-in-the-loop controls: Agents send action plans through MCP tools. Humans say yes or no, often in batches, before they run. The system uses prompts to check risky moves before data changes. In short, the agent must pause and ask before it writes or overwrites. This mirrors the spec’s elicitation feature , which lets servers ask users for more input mid-call.

Diagram of Pinterest's two-layer MCP authentication model showing JWT-based human flow and SPIFFE-based machine flow

This isn’t a theoretical model. The @authorize_tool decorator means each tool method carries its own check. Access to one tool on a server doesn’t grant access to all tools on it.

Every MCP server at Pinterest clears four review gates before it ships. This step is what splits a proof-of-concept from a real production setup. It’s the part most teams will need to copy in some form.

Required pre-ship reviews: Every server opens Security, Legal/Privacy, and (where it fits) GenAI review tickets. All of them must clear before the registry lists the server for live use.

The reviews cover distinct risk surfaces:

  • Security review looks at each tool’s attack surface, auth setup, leak risk, and fit with Pinterest’s in-house rules.
  • Legal/Privacy review checks data flow, PII exposure, user-data access, and the laws that bind each server’s domain (GDPR, CCPA, and so on).
  • GenAI review targets agent fit: prompt injection risks , output checks, hallucination control, and whether a tool’s output could be misused by an agent in odd ways.

Even after every review clears, some servers face extra limits. Spark MCP, for example, only shows up in relevant support channels, not org-wide. This adds a surface-level control on top of the auth layer. It’s a belt-and-suspenders setup: the server lives in the registry but only appears in specific contexts.

The GenAI review gate stands out. Most enterprise safety models still don’t cover AI risks like prompt injection or agent misuse. Pinterest baking this into the standard flow is a signal. As MCP grows, teams will need to extend their safety checklists to cover AI agent flows.

Metrics, Observability, and What 7,000 Hours Actually Means

Pinterest’s usage numbers need context. The 7,000-hours-saved figure is self-reported via user feedback, not an A/B test. The call count is more telling: it shows how use spread across the engineering org.

Usage metrics (January 2025):

MetricValue
Monthly invocations66,000
Monthly active users844
Estimated hours saved per month7,000
Average invocations per user per month~78
Estimated time saved per invocation~6.4 minutes

Tool owners gauge savings through user feedback against the manual flow. The north-star metric is “time saved per call.” That’s a self-reported number based on before-and-after guesses, not a hard test. It’s useful for direction. It tells you the order of size. It is not a precise number.

Pinterest had about 4,700 staff before recent layoffs. If roughly half work in engineering or close to it, 844 monthly users land somewhere between 25 and 35 percent of engineering. That kind of pickup within a year of launch suggests the tools deliver real value. Engineers adopt them on their own.

On the telemetry side, the system logs inputs and outputs, counts calls per server, traces errors, and tags impact. This lets Pinterest see which servers add the most value and which ones break.

MCP tools live across many surfaces: internal LLM chat, IDE plugins, and team chat tools. Engineers hit MCP inside the workflow they already have, not in a new app. That’s a roll-out plan as much as a tech choice. Put the tools where engineers work, and adoption follows without forcing a change in habit.

The Infrastructure Investment Trade-Off

Pinterest built a shared deployment pipeline so teams “define their tools and the platform handles deployment and scaling.” The upfront platform cost is real. But it cuts repeat ops work for every new MCP server teams ship after that.

It’s a classic platform-team bet: spend big on shared infra once, then spread the cost across every team that uses it. The flip side is each team running their own MCP server. That stacks up ops work and leaves you with uneven safety across the org.

The MCP 2026 roadmap hints at spec changes that may cut this infra cost over time. Its top areas are transport and scale (servers easier to scale out without state), rule design (review handoff), and big-org needs (audit logs, SSO login, gateway flow). The New Stack called out big-org readiness as one of four key areas, which fits what teams like Pinterest already hit.

The November 2025 MCP spec update added Streamable HTTP transport, OAuth 2.1 with PKCE, .well-known URL discovery, and structured tool annotations. Pinterest’s setup predates some of those, so they wrote custom answers to problems the protocol now solves on its own. Teams starting fresh in 2026 get these features for free.

What This Means for Other Organizations

Pinterest’s setup is useful as a reference design. But the full build takes a level of platform spend most teams don’t have. The lessons you can lift are more specific than “do what Pinterest did”:

  • Smaller, focused MCP servers are easier to secure, review, and scale than one monolith. It’s the microservices argument applied to MCP.
  • A central registry that doubles as a governance choke point stops unmanaged server sprawl. Without it, MCP at scale turns into a shadow-IT problem.
  • The two-layer auth model isn’t Pinterest-specific. Any team running MCP in production needs to split “a human asked for this” from “an agent is doing this on its own.”
  • Standard security reviews don’t cover AI agent tooling. Prompt injection, hallucination risk, and agent misuse need their own review criteria.
  • Self-reported time savings work fine for internal justification, but ship them with clear caveats about how they were measured.

Pinterest isn’t the only team running MCP at scale . Block and Bloomberg also run MCP live. MCP server downloads grew from about 100,000 in November 2024 to over 8 million by April 2025 . But Pinterest’s is the most public write-up. That makes it the closest thing to a guide build that teams have today.