Strategy6 minMay 20, 2026

Google Launches Gemini 3.5 Flash at $1.50 per Million Tokens, Targeting Corporate Inference Costs

Escritório executivo no Vale do Silício pela manhã com cadeira Eames, caderno aberto com anotações manuscritas, xícara de café fumegante e monitor lateral mostrando keynote tech fora de foco

Model presented by Sundar Pichai at I/O 2026 surpasses Claude Opus 4.7 and GPT-5.5 in agentic benchmarks, is 50% cheaper in input than the previous generation, and opens beta for the autonomous agent Spark for AI Ultra subscribers.

Google positioned the Gemini 3.5 Flash yesterday (19) as the central vector in its competition for corporate AI workloads, and this time the central argument was price. In a presentation at Google I/O 2026 in Mountain View, Sundar Pichai unveiled the model at $1.50 per million input tokens and $9.00 per million output tokens, with input caching at $0.15. These figures place Flash 3.5 in direct collision with Anthropic's Claude Opus 4.7, which costs $5 for input and $25 for output per million, and OpenAI's GPT-5, priced at $1.25 for input and $10 for output, but with a crucial competitive advantage in benchmarks: Flash 3.5 outperforms both on the MCP Atlas and the majority of agentic evaluation suites, according to figures released by Google.

The internal reading of Google's own portfolio is equally aggressive. The company's default agent orchestration has become 50% cheaper in input and 40% cheaper in output compared to the previous generation of Flash, with a superior benchmark profile. Pichai used this metric to support the thesis of mass migration of workloads: companies processing one trillion tokens per day could save around one billion dollars annually if they transition 80% of their workloads to Flash 3.5, as calculated on stage. The message has a clear recipient. In previous cycles, the buying argument for the model was raw performance. Now, the argument has shifted to unit inference cost at scale, an area where Google operates with an advantage due to proprietary hardware, particularly the co-designed TPU 8T and 8I generations with the models.

The immediate reaction from the technical community acknowledged the move. Simon Willison, a reference in technical model analysis, noted on the same day that Flash 3.5 is more expensive than the previous generation in absolute terms, but that Google plans to use it for everything, making the relevant comparison not against Flash 3.0 but against the Pro tiers of competitors. This reading aligns with Mountain View's strategy: shifting the perception from a cheap small model to a frontier model at the price of Flash.

Spark Enters the Race for Personal Agents

Alongside Flash, Google launched Gemini Spark, described as an active partner that performs work on behalf of the user and under their direction. Unlike assistants that respond to questions, Spark runs on dedicated virtual machines in Google Cloud and maintains execution in the background even when the user disconnects. The integration with external tools uses the MCP protocol, with support for third parties expected in the coming months.

The release begins this week for selected testers. AI Ultra plan subscribers in the US will receive the beta next week, with integration into Chrome set for this summer and into the Android Halo interface by the end of the year. The choice of channel signals priority: Spark is born as a premium product aimed at those already paying for the upper tier of the Gemini ecosystem, only later extending to corporate versions within Gemini Enterprise. For Google, this is the first serious attempt to transform the AI assistant into a persistent execution layer, a territory so far dominated by vertical offerings such as OpenAI's Operator and Claude's Computer Use.

Commoditisation Arrives at the Model Layer

The scenario leaves the market's middle layer in an uncomfortable position. Providers selling intermediate APIs at prices close to the cost of Flash 3.0 lose margin overnight. Consultancies pricing based on model capacity rather than full execution need to reformulate proposals in ongoing RFPs. The reading for CIOs in the second half budgeting cycle is straightforward: the cost per model query is no longer a critical variable for large-scale AI solution architecture, and the discussion is shifting to tool latency, agent observability, and governance of sensitive data. Pichai summarised the phase as one where people want to see the value in the products they use every day. This statement defines the 2026 competition less by model capability and more by its translation into verifiable operational savings.