Elon Musk Unveils Grok 3, But Does It Live Up to the "Strongest Model on Earth" Hype? | Yunqi Tech π

云启资本·February 18, 2025·4·3

The AI Race Heats Up

DeepSeek has intensified global competition among large language models.

Recently, OpenAI has been making frequent announcements, teasing progress on GPT-4.5 and GPT-5. Elon Musk's xAI went a step further and simply dropped a new model — releasing the latest version of its Grok 3 chatbot today (February 18) — claiming that the model's beta reasoning capabilities surpass all AI models.

How good is this new model really? This edition of Yunqi Tech π breaks it down for you.

This article is republished from Tencent Tech.

Original title: "The Grok 3 Elon Musk Has Been Hype-Training Is Finally Here"

By Jin Lu, special contributor to Tencent Tech

February 18 — Elon Musk's AI startup xAI has released its latest Grok 3 chatbot, entering a new round of competition with large models from OpenAI and DeepSeek. Musk has repeatedly described Grok 3 as "the smartest AI on Earth," drawing considerable industry attention even before its launch.

Musk said Grok 3 will first roll out to X's Premium Plus subscribers, who will serve as the initial users. Additionally, xAI has introduced a standalone subscription called Super Grok for dedicated fans, offering the most advanced features and earliest access to new capabilities.

xAI stated that Grok 3's pre-training is complete, and the team has been actively integrating reasoning capabilities into the current Grok 3 model. However, this integration remains in early stages and requires further refinement. Grok 3 is currently undergoing continued training.

Beyond the Grok 3 reasoning model, xAI is also training a mini version of this reasoning model. The reasoning data from Grok 3 can be compared against this mini reasoning model, with the latter sometimes performing slightly better than the Grok 3 reasoning model itself.

01

Claimed Performance Surpasses DeepSeek and ChatGPT

At the launch event, Musk explained that xAI named its chatbot Grok because the term originates from American science fiction writer Robert Heinlein's novel Stranger in a Strange Land. In the book, "Grok" is used by a character raised on Mars to mean comprehensive and profound understanding of something.

Musk and his AI team claim that Grok 3's beta reasoning capabilities exceed existing AI models. In benchmark tests related to reasoning and test-time compute, Grok 3 achieved better results than DeepSeek-R1, OpenAI o1, OpenAI o3 mini-high, and Gemini-2 Flash Thinking.

According to xAI's comparison benchmarks, Grok 3 scored higher than Gemini 2 Pro, DeepSeek V3, and ChatGPT 4o in science, coding, and mathematics. Moreover, in blind testing, xAI's Grok-3 (early version) topped the lmArena leaderboard with a record-breaking score of 1402, becoming the first AI model to break the 1400-point barrier. This achievement surpasses major industry competitors including Google, OpenAI, and DeepSeek.

02

Building an AI Supercomputing Center in 122 Days, Doubling Its Performance

During the demonstration, Musk and several xAI executives shared how they built Grok. Musk revealed that because xAI wanted to launch Grok 3 as quickly as possible, time was extremely tight. They realized they had to build a data center in just four months. xAI stated that it took 122 days to get the first 100,000 GPUs online and running — currently the world's largest fully connected H100 cluster. xAI then further accelerated data center expansion, completing phase two construction in just 92 days and doubling computing capacity again (bringing the total to roughly 200,000 GPUs by this calculation).

The xAI team also demonstrated how Grok 3 handles interesting tasks. For example, calculating an Earth-to-Mars spacecraft mission. Grok 3 generated an animated 3D trajectory map of a space launch — a viable path from Earth to Mars and back. This involves complex physics that Grok 3 needed to understand.

Grok 3 also showcased potential in automated game development. The xAI team asked Grok 3 to create a new game on the spot that combined Tetris and Bejeweled. The Python script Grok 3 generated defined game constants, colors, block shapes, and other elements, presenting unique gameplay: when at least three blocks of the same color connect, a gravity mechanism triggers to eliminate them — similar to Bejeweled.

Furthermore, Grok 3 includes a feature called Big Brain, a reasoning model mode that allows for deeper thinking when processing queries. Musk noted that just 17 months ago, the original Grok model could barely solve high school problems, but it has improved tremendously since. He humorously remarked that "Grok is ready for college."

03

Developing Deep Search Capabilities, Considering Open-Sourcing Grok 2

In addition to launching Grok 3, xAI revealed that the company is establishing an AI game studio focused on serving consumers. Moreover, xAI is developing a deep search capability (DeepSearch) for Grok. This will become one of its AI agent's core capabilities. DeepSearch is a reasoning chatbot that can articulate its process for understanding queries and planning responses. The demonstration showed that DeepSearch has research, brainstorming, and data analysis functions. Musk's team also said they intend to launch a voice-based chatbot "as soon as possible."

Regarding whether Grok 3 will be open-sourced, Musk said, "We typically open-source the previous generation model when a new model is released, so in a few months, we will also open-source Grok 2."

As Grok-3 continues to set new records, the AI race is heating up. By strengthening reasoning capabilities, building massive computing clusters, and making experimental forays into applications like gaming, xAI is rapidly joining the top tier of large models alongside DeepSeek and ChatGPT. Grok-3's continued iteration shows that the AI competition is nowhere near its finish line — instead, it's entering an even more intense phase.