Looking Back at AI's "May": The Five Most Important Little Things | Attent!on Tech Notes

云启资本·May 31, 2024·30·0

The future is already here — it's just not evenly distributed.

In a world of noise and chaos, what is real?

Nonstop LLM launches, an endless stream of AI assistant products, price cuts too numerous to count — describing the AI industry in May, "frenzied" and "cutthroat" barely do it justice. One media outlet put it bluntly: this was the "busiest month in AI since ChatGPT's release."

Deeply rooted in the AI space, Yunqi Capital has been tracking industry developments closely. On the final day of May, we're using this edition of "Attent!on Tech Notebook" to sort through the month's most notable dynamics and share some of our observations and reflections. We'll continue this series to help everyone find direction amid the information deluge.

1. OpenAI, Google, Microsoft

Updating the "AI Full Suite"

"More striking than performance gains: multimodality"

In mid-to-late May, the closely watched trio of OpenAI, Google, and Microsoft held developer conferences back-to-back, rolling out "AI full suites" spanning models to applications to hardware — nothing was off-limits in this race. What new heights did they take AI to? A quick recap:

On May 14, OpenAI struck first with GPT-4o, an end-to-end natively multimodal flagship model. In a scant 26-minute launch, it demonstrated low-latency AI interaction with voice, images, video, and other human languages — making major waves across the industry.

Google responded the next day at I/O 2024. On the model layer, it introduced Gemini 1.5 Pro, video generation model Veo, and image generation model Imagen 3. On the application layer, it launched AI search, AI-powered Gmail, and its heavily promoted multimodal assistant tool "Project Astra." Like GPT-4o, multimodal interaction was a standout highlight of Google's updates.

A week later, Microsoft unveiled over 50 major updates at Build 2024. The dominant theme was Copilot — around intelligent assistant functionality, the company launched Team Copilot, customizable Copilots, and introduced "Copilot+PC" on the eve of the conference, aiming to reshape productivity across both software and hardware dimensions.

Yunqi Observation

End-to-end native multimodality has become the main "arena" for large models. Whether it's the headline-grabbing GPT-4o or Google's Project Astra, the most impressive capabilities were real-time, emotionally expressive voice and video interactions. These abilities rest on algorithmic unification at the model layer — steps like ASR (automatic speech recognition) and TTS (text-to-speech) that were previously broken out are now merged into one.

The evolution of end-to-end native multimodal models will further transform human-computer interaction and open up greater imaginative space for application-layer innovation.

Yet the performance improvements themselves were not particularly striking. The growth curve of large model capabilities appears to be flattening. Yunqi Capital partner Chen Yu told Late Post that this trend will likely continue, with one key reason being that high-quality data for training large language models is nearly exhausted.

2. LLM Startups Enter the Fray

AI Assistant Products

"Each has its selling point, but differentiation remains limited"

The domestic AI arena was equally raucous. While iterating model capabilities, LLM startups also frequently launched assistant applications built on their self-developed models.

MiniMax — "Hailuo" AI launched, voice interaction is the highlight

On May 15, Yunqi angel-round portfolio company MiniMax officially released its productivity product "Hailuo AI." This was MiniMax's second consumer-facing product following its entertainment app "STARFIELD," available on web, iOS, and Android. Hailuo AI integrates MiniMax's self-developed multimodal large models, including the trillion-parameter MoE language model abab 6.5 released in April, as well as voice and image models. This means Hailuo AI supports multimodal interaction across text, images, and voice. Of note is the voice interaction: users can make voice calls to ask questions, with support for voice settings and voice cloning.

01.AI — First closed-source model released, integrated into consumer-facing productivity product

On May 13, 01.AI, where Kai-Fu Lee serves as CEO, released Yi-Large, a hundred-billion-parameter model — the company's first closed-source model in nearly a year since its founding. Yi-Large was also integrated into 01.AI's recently launched AI productivity tool "Wanzhi." Positioned as a "one-stop AI work platform for Q&A, reading, and creation," this is essentially a consumer-facing efficiency tool centered on long-document reading and creation.

Baichuan — Base model updated, AI assistant product launched

On May 22, Baichuan released its latest-generation base model Baichuan 4, alongside its first AI assistant application "Baixiaoying." Beyond standard features like knowledge Q&A, text reading and organization, and assisted creation, multi-turn search and targeted search capabilities are the differentiators Baixiaoying aims to emphasize.

Yunqi Observation

A year into the race, self-developed applications have become an additional focus for LLM startups alongside model capabilities, with the dual-driven approach of technology plus applications becoming increasingly common.

Yet surveying the AI assistant landscape, no qualitative differences in functionality or user experience have emerged — and meaningful differentiation is key to reinforcing user mindshare and building brand identity.

3. Domestic Tech Giants Continue Refining Models

"More models, more application scenarios"

This month, Alibaba, ByteDance, Tencent, and other major tech companies successively released upgraded models, further embedding model capabilities into their respective application ecosystems.

Alibaba — Closed-source Qwen iteration, benchmarked against GPT-4 Turbo

On May 9, Alibaba Cloud released closed-source base model Qwen 2.5, iterating on the 2.1 version from last December. Understanding capability, logical reasoning, instruction following, and coding ability improved by 9%, 16%, 19%, and 10% respectively over the previous generation. This model tied with GPT-4 Turbo for first place in OpenCompass's subjective comprehensive evaluation, while lagging behind GPT-4 Turbo, Claude 3 Opus, and GLM-4 in objective evaluation, but leading Erniebot 4.0. Notably, to adapt to different use cases, Qwen was released in multiple sizes, from 500 million to 110 billion parameters. Alibaba Cloud also open-sourced the 110-billion-parameter model Qwen1.5-110B — currently the largest open-source model domestically.

ByteDance — Doubao model family debuts, nine models released at once

On May 15, ByteDance's self-developed model, rebranded from "Skylark" to "Doubao," officially launched on Volcano Engine to provide external services. The Doubao family comprises nine models: two general-purpose models plus seven vertical-scenario models for role-play, speech recognition, speech synthesis, voice cloning, text-to-image, function call, and vectorization. ByteDance did not disclose model parameters or benchmark results.

Tencent — Hunyuan model upgraded, text-to-image model open-sourced

On May 17, Tencent announced an upgrade to its Hunyuan large model, with overall performance improving 50% over the previous generation and some Chinese capabilities matching GPT-4. Tencent also advanced its open-source efforts: the upgraded Hunyuan text-to-image model was fully open-sourced, with the Hunyuan MoE model soon to follow. For developers, Tencent launched three PaaS products: a large model knowledge engine, image creation engine, and video creation engine. Leveraging its vast application ecosystem, Tencent noted that Hunyuan has been deployed across 600+ internal businesses and scenarios including Tencent Meeting and Tencent Books.

Yunqi Observation

While continuing to chase GPT-4, parameter count is no longer the absolute target for major tech companies. Tailoring the "hammer" to the "nail" — each company's application ecosystem is becoming a proving ground for model capabilities.

4. The LLM API Price War

"Largely a big-company game"

Amid the flood of new models and applications, the price war running through it all is impossible to ignore.

On May 6, DeepSeek, the LLM startup under High-Flyer Quant, kicked off the price-cutting wave with its open-source model DeepSeek-V2 at 1 RMB per million tokens input — roughly 1/100th of GPT-4's API price. Over the following two weeks, Zhipu AI, ByteDance, iFlytek, Alibaba, Baidu, and Tencent successively announced LLM API price cuts. Baidu, Tencent, and iFlytek made certain lightweight models free under self-defined conditions.

According to statistics from AI industry media outlet "Zhidx," as of May 24, looking only at top-tier models, the steepest discounts came from ByteDance and DeepSeek, with prices dropping to single-digit RMB per million tokens. iFlytek, Tencent Cloud, and Alibaba Cloud followed, with top-tier model input or output pricing falling below 100 RMB per million tokens.

Yunqi Observation

This price-cutting wave objectively reflects the declining cost of LLM inference. And with no clear performance gap between major companies' flagship models, price cuts are seen as an inevitable strategy to attract developers and acquire more high-quality data — keeping the model optimization "flywheel" spinning.

What's evident is that this round of price cuts is dominated by "major companies," with few startups participating. Yunqi Capital partner Chen Yu told Late Post that price wars ultimately favor big companies, "unless startups find a different path with distinct product and commercialization strategies."

5. Embodied Intelligence Continues Evolving

"Humanoid robot R&D and financing both active"

Embodied intelligence is considered one of the ideal scenarios for grounding large language models in the physical world. Amid the bustling AI "May," embodied intelligence continued to advance.

In early-to-mid May, multiple industry exhibitions took place, including ICRA (International Conference on Robotics and Automation), a top academic gathering in robotics automation. Three Yunqi portfolio companies — Keenon Robotics, RealMan Intelligent, and Kujiale — showcased new robotic products and technologies at these events. (Click here for details)

Humanoid robot developments also warrant attention. On the product front, on May 13, Unitree launched a robot world model and its new humanoid robot Unitree G1, with the entry-level G1 standard edition priced at 99,000 RMB — the industry's lowest. On the financing front, according to Yunqi's incomplete statistics, humanoid-track startups including Booster Robotics, LimX Dynamics, and Elephant Robotics raised new funding rounds.

Earlier, Yunqi angel-round portfolio company "Astribot" also unveiled its first self-developed AI robot, the Astribot S1, in late April. Through imitation learning, the S1 can perform multiple complex tasks useful to humans with agility, flexibility, and smoothness rivaling adult humans — establishing a new standard for AI robots. The product has been integrated with large model testing and is expected to commercialize within 2024. (Click here for details)

Yunqi Observation

Progress toward AGI has brought both product development and investment activity in embodied intelligence to an active phase. Yunqi is bullish on robots entering the embodied intelligence value creation phase, believing their generality and generalization capabilities will bring enormous gains to robotic deployment. We look forward to seeing embodied intelligence companies achieve more breakthrough innovations combining software and hardware, building on existing hardware supply chain advantages.

As May ends and summer approaches, Yunqi will continue tracking new sparks and variables in the AGI wave. Technology is ever-renewing, and we hope to journey alongside more innovative forces.

*Factual information in this article compiled from publicly available media reports