Gemini 3.5 Flash: 76% on Terminal-Bench, 4x Faster Output

A lightning-bolt-shaped racing vehicle speeds across a landscape of terminal windows while small subagents fan out and a rocket waits on a launchpad.

Contents

Google released Gemini 3.5 Flash on May 19, 2026. The fast, lower-cost tier scored 76.2% on Terminal-Bench 2.1 and, by Google’s own measure, generates output about 4 times faster than other frontier models. Flash is available today across the Gemini app, Search, and the API. Gemini 3.5 Pro is confirmed for next month.

Key Takeaways

Gemini 3.5 Flash launched on May 19, 2026 and is free to use in the Gemini app and Google Search.
It scored 76.2% on Terminal-Bench 2.1, a test of finishing real terminal tasks end to end.
Google says Flash produces output about 4 times faster than rival frontier models.
The model is built for agents that run long, multi-step jobs and call tools.
Gemini 3.5 Pro, the larger sibling, is confirmed for next month.

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is Google’s new fast, lower-cost tier of the Gemini 3.5 family. It was announced and made generally available on May 19, 2026, according to the Google announcement post . The “Flash” name has always meant a model tuned for speed and price.

What sets this release apart is the positioning. Google frames Flash as a model built for action, not only chat. In practice, it targets jobs where a model plans steps, calls tools, and pushes a task forward on its own.

The model is also built to run as many small subagents at once. Several model copies can work on one task together. Each copy takes a slice of the job and reports back. That fan-out pattern is common in agent tools, and a cheaper tier makes it affordable to run at scale.

Historically the Flash tier traded some peak reasoning for speed and cost. This release narrows that gap. Google describes the model in plain terms:

3.5 Flash delivers frontier performance for agents and coding, excelling at complex long-horizon tasks that deliver real-world utility.

Google Gemini team

Gemini 3.5 Flash Benchmarks: Terminal-Bench, GDPval, MCP Atlas

The benchmark numbers are the core of this news. Google published four headline scores for Gemini 3.5 Flash, plus a speed claim. Each test checks a different skill, such as terminal work, tool calls, or chart reading.

The table below lists the scores Google reported in its launch announcement , with a one-line note on what each test covers.

Benchmark	Gemini 3.5 Flash	What it measures
Terminal-Bench 2.1	76.2%	Finishing real command-line tasks end to end
GDPval-AA	1656 Elo	An economically grounded task suite on an Elo scale
MCP Atlas	83.6%	Multi-step tool calls over the Model Context Protocol
CharXiv Reasoning	84.2%	Multimodal reasoning over charts and figures

Benchmark comparison table showing Gemini 3.5 Flash scores against Claude and GPT models across multiple tests — Google's published benchmark comparison for the Gemini 3.5 family

Image: Google

The most striking comparison is internal. Google says Flash beats Gemini 3.1 Pro, its own flagship from February 2026, on the coding and agent suite. On Terminal-Bench 2.1, Flash scored 76.2%. Gemini 3.1 Pro scored 70.3% on the same test. So the cheap, fast tier passed the flagship that shipped just three months earlier.

The table below shows that gap on the headline benchmark.

Model	Terminal-Bench 2.1	Tier
Gemini 3.5 Flash	76.2%	Fast, lower cost
Gemini 3.1 Pro	70.3%	Prior flagship

On speed, Google claims Flash produces output tokens about 4 times faster than other frontier models. It puts the model in what Google calls the top-right quadrant: high intelligence and high speed at once.

One caveat belongs in plain sight. Every figure here comes from Google’s own announcement. Independent runs were not available at publication. So treat the numbers as vendor-reported until outside labs confirm them.

Where You Can Use Gemini 3.5 Flash Today

Flash shipped to a wide set of surfaces on day one. The split between free consumer access and paid developer channels is clean, so it is easy to pick the right entry point.

Consumer: the Gemini app and AI Mode in Google Search, rolling out globally.
Developer: the Gemini API through Google AI Studio, plus Android Studio.
Agent building: Google Antigravity, Google’s agentic development platform, and Vertex AI.
Enterprise: Gemini Enterprise and the Gemini Enterprise Agent Platform.

For most people, the easiest path is the Gemini app , where Flash is free. Developers who want API access can start in Google AI Studio .

Google AI Studio interface showing a gallery of starter app templates for building with the Gemini API — Google AI Studio, the developer entry point for the Gemini API

Image: Google Developers Blog

One detail is still open. Google did not state per-token pricing or the context-window size for the 3.5 family in the launch post. Rather than guess, treat both as unconfirmed until Google publishes them.

Gemini 3.5 Pro: What We Know and When It Arrives

The launch is only half a story. Gemini 3.5 Pro, the higher-capability tier, is confirmed but not yet shipped. Google’s announcement says Pro is being used internally and will roll out the following month, putting it in June 2026.

Google has not published Pro benchmark numbers, pricing, or a precise date. So the gap between Flash and Pro is still an unknown, and that gap is the figure production teams will care about most.

Shipping Flash first is itself notable. The cheaper, faster tier leads the release instead of trailing it. That order suggests Google sees the fast tier as the model most people and most agents will reach for by default.

Two things are worth watching when Pro lands. First, whether its agent and coding scores meaningfully beat Flash. Second, whether the price gap justifies Pro for production agents, or whether Flash is good enough for most runs.

What Gemini 3.5 Flash Means for Agentic Coding in 2026

The pitch behind Flash is economic. Long agent workflows fan out into many model calls per task. A 4x speed claim and low pricing compound across a run. So a cheap, fast model changes the math for anyone building agents.

The benchmark choices line up with that pitch. Strong scores on MCP Atlas and Terminal-Bench point at the workloads developers actually run today: tool-calling agents and terminal automation. Google did not pick benchmarks at random here.

The release also fits a broader 2026 trend. Frontier-level capability keeps moving into the cheap, fast tier instead of staying locked in the flagship. When the budget model passes last quarter’s flagship, the whole price curve shifts.

There is a practical effect for teams already running agents. If your stack fans out dozens of calls per task, a faster, cheaper model cuts both wall-clock time and bill. That is true even if Flash trails Pro by a few points on a hard benchmark. Speed and price often win at the workflow level.

Still, the honest read is cautious. The numbers are promising but vendor-reported. The real test is independent benchmarks and hands-on agent runs over the coming weeks, plus the pricing detail Google has not yet shared. Until then, treat the 76.2% and the 4x speed claim as a strong starting point, not a settled result.