Thursday, March 5, 2026

Google Gemini's Lightweight Flash model boosts performance at lower costs

See All on AI Model Releases
<<< Previously    Next >>>
Google introduced Gemini 3.1 Flash-Lite, a cost-optimized model designed for high-volume developer workloads, now available in preview via Google AI Studio and Vertex AI. 

Priced at $0.25 per million input tokens and $1.50 per million output tokens, the model achieves 2.5X faster time to first answer token and 45 percent faster output speed than Gemini 2.5 Flash while maintaining similar or better quality. 

On industry benchmarks, Flash-Lite scores 1432 on Arena.AI’s leaderboard and outperforms larger Gemini models from prior generations, reaching 86.9 percent on GPQA Diamond and 76.8 percent on MMMU Pro despite its smaller footprint. 

The model ships with adjustable thinking levels, allowing developers to control reasoning depth for managing costs for tasks like high-frequency translation and content moderation to more complex ones like UI generation and multi-step agent execution. 

Observers noted that while the new Flash-Lite costs less than Flash or Pro, it costs more than earlier iterations of Flash-Lite. 

Ref: Google

No comments:

Post a Comment