What's So Great About Gemini 3, the Model Everyone's Praising? | Yunqi Tech π

云启资本·November 20, 2025·5·0

AI's evolution is far from over

Google has shattered the relative calm in the AI large-model space.

On November 19, Gemini 3 — long anticipated — officially launched to widespread acclaim. Even Sam Altman, CEO of rival OpenAI, posted a congratulatory tweet. Some reviewers have dubbed it a "triple-threat AI god of war," combining the strongest-ever reasoning, multimodal capabilities, and vibe coding.**

What exactly makes Gemini 3 so powerful? Read on with this edition of "Yunqi Tech π."

This article is republished from "AI Era" (excerpted).

Original title: Google Gemini 3 Launches Overnight, Crushing GPT-5.1 — Altman's Rare Congratulations

Three Key Highlights (Condensed Version)

The birth of Gemini 3 marks another major step forward for Google on the path to AGI!

First, its reasoning ability is exceptionally strong — it can deeply understand problems and deliver more insightful answers.

In particular, it excels at answering complex scientific questions.

Building, deconstructing, and reassembling detailed 3D voxel art with code

Second, it boasts world-leading multimodal understanding, handling text, video, and code with equal ease.

Whether interpreting long videos or transforming papers into interactive guides, Gemini 3 can handle it all.

In vibe coding, Gemini 3 has blown the ceiling off.

With a simple sentence, it can produce a beautiful, dynamic application. And it precisely grasps intent, knowing exactly how to implement it.

Meanwhile, its agentic coding skills have grown even stronger, seamlessly integrating with existing tools and pairing perfectly with the new platform Google Antigravity.

Gemini 3 Pro: PhD-Level Reasoning Crushes Everything

With top-tier reasoning and multimodal capabilities, Gemini 3 Pro can turn any idea into reality!

It comprehensively outclasses its predecessor, 2.5 Pro, with dominant, gap-leading scores across all core benchmarks.

Topped the LM Arena leaderboard with a breakthrough 1501 Elo score
Scored 37.5% on the Humanity's Last Exam (HLE) without using any tools
Achieved 91.9% on GPQA Diamond, demonstrating PhD-level reasoning
Set a new SOTA with 23.4% on Math Arena Apex, establishing a new benchmark in mathematics

Gemini 3 leads the pack across a range of key AI benchmarks

Beyond its stellar text-based performance, Gemini 3 Pro is also the multimodal king —

MMMU-Pro scored a strong 81%, while Video-MMMU hit 87.6%, redefining multimodal reasoning.

It also achieved an industry-leading 72.1% on SimpleQA Verified, representing massive progress in factual accuracy.

This means Gemini 3 Pro possesses ultra-high reliability in tackling complex problems across science, mathematics, and numerous other fields.

Every interaction with Gemini 3 Pro carries unprecedented "depth and nuance."

Its responses are smart, concise, and direct — ditching clichés and flattery to offer genuine insight: telling you what you need to hear, not just what you want to hear.

It acts like a true thought partner, offering new ways to understand information and express yourself.

Whether generating high-fidelity visualization code, explaining obscure scientific concepts, or sparking creative brainstorming sessions, Gemini 3 Pro delivers.

Gemini 3 can write visualization code for plasma flow in a tokamak device, and compose a poem capturing the essence of fusion physics

On Google AI Studio, Gemini 3 Pro API pricing is as follows —

Gemini 3 Deep Think: A New Peak in Intelligence

This time, Gemini 3 Deep Think officially ushers in a new era of "deep thinking," pushing the boundaries of intelligence even further.

Building on Gemini 3's reasoning and multimodal understanding capabilities, it achieves a qualitative leap and is even better equipped to tackle complex problems.

Across multiple benchmarks, Gemini 3 Deep Think outperforms Gemini 3 Pro:

On HLE and GPQA Diamond, it scored 41% (without tools) and 93.8% respectively.

Moreover, it set a new historical high of 45.1% on ARC-AGI-2 (with code execution, ARC Prize Verified), showcasing powerful ability to handle unknown and novel problems.

Gemini 3 Deep Think excels on some of the most challenging AI benchmarks

One Million Tokens, Full-Modality Explosion

From its inception, Gemini was built to be "cross-multimodal" — spanning text, images, video, audio, and code, freely navigating across all forms of information.

Gemini 3 takes this to the next level, integrating state-of-the-art reasoning, visual and spatial understanding, leading multilingual performance, and a 1 million token context window.

It helps people learn in whatever way suits them best.

Say you want to learn a traditional family recipe. Gemini 3 can decipher handwritten recipes in different languages and translate them into a shareable family cookbook.

Or want to learn a new topic? Just throw academic papers, long video lectures, or tutorials at it — Gemini 3 automatically generates interactive flashcards, visualizations, or code in other formats.

It can even analyze pickleball match footage to identify areas for improvement and generate targeted training plans to comprehensively elevate your game.

Not only that, in Search's AI Mode, Gemini 3 now enables new generative UI experiences.

These include immersive visual layouts, plus interactive tools and simulations — all generated on the fly, fully in response to the query.

In AI Mode in Search, learn complex topics like how RNA polymerase works through generative UI

Vibe Coding, Purely by Voice

Building on 2.5 Pro's success, Gemini 3 delivers on its promise to turn any developer's idea into reality.

It excels at zero-shot generation and can handle complex prompts and instructions to render richer, more interactive web UIs.

As mentioned, Gemini 3 is the best "vibe coding" and agentic coding model Google has built to date.

On the WebDev Arena leaderboard, Gemini 3 surged to the top with a 1487 Elo score.

It also scored 54.2% on Terminal-Bench 2.0, which measures tool-use ability for computer operation via terminal;

And on SWE-bench Verified, which measures coding agents, it hit 76.2% — far surpassing 2.5 Pro.

The following demos show Gemini 3's true power in action.

Write a retro 3D spaceship game with rich visuals and stronger interactivity? No problem.

Build a playable sci-fi world with shaders? So easy.

Create a richer, more interactive web UI and application? Still effortlessly handled!

Front-end developers no longer needed — it's real...

Currently, developers worldwide can build with Gemini 3 on Google AI Studio, Vertex AI, Gemini CLI, and the brand-new agent development platform Google Antigravity.

It's also integrated with multiple third-party platforms, including Cursor, GitHub, JetBrains, Manus, Replit, and others.

Long-Horizon Planning, a Human Hand Replacement

Since Google ushered in the agentic era with Gemini 2, it has been continuously evolving.

They've not only improved Gemini's coding agent capabilities but also enhanced its ability to reliably plan further into the future.

And all of this has just been validated on the Vending-Bench 2 leaderboard —

Gemini 3 topped the charts with an overwhelming advantage.

This test deeply examines AI's long-horizon planning ability in complex scenarios by simulating the operation of a vending machine business.

Impressively, throughout the full simulated operating year, Gemini 3 Pro maintained consistent tool use and decision-making, achieving higher returns without deviating from the task.

Compared to other frontier models, Gemini 3 Pro demonstrates superior long-horizon planning, yielding significantly higher returns

This means Gemini 3 can better help humans accomplish tasks in daily life.

It combines deeper reasoning with improved, more consistent tool use, acting on people's behalf by handling more complex, multi-step workflows from start to finish.

For example, booking local services for you, or organizing your inbox. And humans only need to steer and give instructions throughout.

"Google Antigravity": A Revolutionary Agent Development Platform

With the arrival of Gemini 3, Google's agentic capabilities have entered a new phase:

Models can now run for extended periods across multiple platforms without human intervention.

While not yet at the level of "fully unattended + running continuously for days," Google is increasingly approaching a world where

rather than interacting with agents through single prompts or tool calls, you engage with them at a higher level of abstraction.

Therefore, Google's agent development platform Google Antigravity has officially launched — a new platform where developers collaborate with agents on a "task" basis.

Leveraging Gemini 3's advanced reasoning, tool use, and agentic coding capabilities, Google Antigravity upgrades AI assistance from just another tool in the developer toolkit to a fully engaged, proactive collaborator.

Building on the familiar AI IDE experience, Google Antigravity opens a dedicated interface for agents with direct access to the editor, terminal, and browser.

Now, agents can autonomously plan and simultaneously execute complex end-to-end software tasks on your behalf, while verifying the code they generate.

In the following case, on Google Antigravity, a "end-to-end agentic workflow" powered by Gemini 3 drives a flight tracking application.

The agent independently plans, writes the application code, and verifies its execution through browser-based computer operation.

Beyond Gemini 3 Pro, Google Antigravity is also tightly integrated with the Gemini 2.5 computer-use model, and the image editing model Nano Banana (Gemini 2.5 Image).

Netizens Are Going Wild

Right now, Gemini 3 is dominating the conversation online, with a wave of stunning real-world demos released.

Logan, head of Google AI Studio, ran the bouncing ball test at 10x difficulty.

Result: Gemini 3 Pro nailed it perfectly on the first try! (Not an N-best selection — truly generated from the very first prompt)

Pietro Schirano, former AI engineer at Anthropic and founder of MagicPath, first had Gemini 3 Pro create a 3D Lego editor.

Unexpectedly, it perfectly implemented the user interface, complex spatial logic, and all functionality in a single generation.

Meanwhile, Gemini 3 Pro's game development performance is equally astonishing.

With just a single text prompt, it recreated the classic iOS game Ridiculous Fishing, complete with sound effects and background music.

Additionally, it accomplished something few large models had managed before — building a fully functional Game Boy emulator.

And yes, it even drew the Game Boy's appearance directly in SVG.

Most notably, Gemini 3 was trained entirely on Google TPUs. That's Google's moat.

Reference:

https://blog.google/products/gemini/gemini-3/