Yunqi Research Select | Mid-Year AI Industry Review: Ten Hopes and Heartbreaks in the Year of Implementation

云启资本·July 30, 2024·25·0

How Does the Balance Between Cost and Effectiveness Tip?

Performance surpassing GPT-3.5 Turbo, at 60% of the price. On July 19, OpenAI made headlines again with GPT-4o mini, a lightweight model built around this value proposition, adding another chip to the table for AI application deployment. Just four days later, Meta unveiled Llama 3.1, a "juggernaut"-class open-source model that beat the closed-source flagship GPT-4o on multiple benchmarks.

This seems to encapsulate the trajectory of AI in 2024: the gap between open and closed-source models is narrowing, model costs are falling faster than expected, and the barriers to application development keep dropping. As AI continues to drive technological transformation in 2024, Yunqi Capital remains committed to embracing these shifts — seizing opportunities with agility while maintaining a cool-headed, discerning view of a rapidly evolving industry.

A day in AI, a year on Earth. In this edition of Yunqi Research Select, we share the trends and directions we've observed over the past half-year amid the relentless churn of technological waves and industry dynamics.

I. Review of Core Judgments from Early 2024

In 2023, when GPT ignited the generative AI explosion, foundation models were unquestionably the focal point of capital deployment. Crunchbase data shows that global AI startups raised nearly $50 billion in 2023, with $18 billion flowing to just three model companies: OpenAI, Anthropic, and Inflection AI. In China, six major foundation model startups including MiniMax achieved unicorn status of over $1 billion valuation after multiple rounds of capital injection.

The application layer also saw a flood of new entrants, with product launches moving at breakneck speed. In a16z's September 2023 GenAI Top 50 by web traffic, 80% of listed companies were new. In the updated list four months later, 40% of new entrants were still first-time appearances.

We believe that foundation model capability is the core substrate determining AI application performance. The prevailing evolutionary thread in the AI industry at this stage is to compensate for foundation model deficiencies through sustained cost investment, while refining application effects to better serve diverse scenarios and customer needs. To survive the重重 uncertainty of an industry's early days requires finding dynamic equilibrium between cost and performance. Thus, cost reduction and performance improvement are the two core themes we track.

At the start of 2024, we made the following judgments about the growth trajectory of the AI industry.

Foundation Model Capabilities Continue Advancing; Discrete Technical Breakthroughs Drive Cost Reduction and Performance Gains

The foundation capability race remains white-hot, with startup unicorns and tech giants at home and abroad all pushing forward, honing model capabilities and driving down costs along the Scaling Law path. Potential technical breakthroughs in parallel training, inference optimization, and MoE will further amplify cost-reduction effects; advances in RAG, Planning, Memory, and next-generation architectures will help elevate AI application performance.

Meanwhile, capabilities in video generation models, image generation models, and multimodal models — all strongly linked to application-layer feature enhancement and product innovation — will also continue advancing.

Venture Capital Gravity Shifts Toward Application Layer; Model Efficiency Gains Accelerate Application Deployment

With sustained heavy investment, the foundation model landscape at home and abroad has largely taken shape. Compared to application development demand, large model supply is relatively abundant. Based on this, we judge that incremental venture opportunities at the model layer are limited, and market gravity is shifting toward the application layer.

Capability improvements in video generation models, image generation models, and multimodal models are catalyzing more innovation at the application layer and accelerating deployment. C-side applications in entertainment scenarios such as gaming and social are seeing feature increments and innovation space, especially where underlying technical breakthroughs drive leapfrog effects in quality and cost reductions that may stimulate rising market demand for such applications.

Technical breakthroughs such as on-device model deployment will expand the capability boundaries of consumer electronics. AI functionality promises a differentiation window for traditional categories in a growth-stagnant consumer electronics sector, while also creating fertile ground for the birth and breakout of AI-native new categories better suited to emerging needs.

II. Key Mid-Year Trends

Model Layer: Stronger and Cheaper

1. Foundation Model Cost Reduction Ahead of Expectations

"Price cuts" were the unavoidable keyword in the 2024 large model market. We predicted in early 2024 that, according to AI's cost equivalent of Moore's Law, large model inference costs could fall to 1% of current levels within 3-5 years, and that this cost reduction would herald an inflection point for AI application explosion.

But driven by falling model inference costs and market "price wars," the pace of cost reduction has exceeded our expectations. On May 6, DeepSeek, the foundation model startup under High-Flyer Quant, kicked off a price-cut wave with its open-source model DeepSeek-V2 at 1 RMB per million tokens input — roughly 1/100th of GPT-4's API pricing. Within half a month, Zhipu AI, ByteDance, iFlytek, Alibaba, Baidu, and Tencent all followed suit. ByteDance's price cuts reached 97%, while Baidu, Tencent, and iFlytek made some lightweight models free under certain conditions.

The overseas parallel came with OpenAI's late-July release of GPT-4o mini, whose performance surpasses GPT-3.5 Turbo at 60% of the price.

This wave of price cuts, combined with the narrowing gap between open and closed-source models, challenges the business model of closed-source foundation model startups. Building differentiation through proprietary applications is the key to breaking through.

2. Video Generation Models Enter a New Generation

In 2024, the first blockbuster in the AI model layer still came from OpenAI, the trailblazer of this AI wave. Sora's launch elevated large models' language comprehension, multi-shot generation consistency, image clarity, complex scene creation, and physical world simulation to a "new generation." It validated the effectiveness of the step from "GPT-4 → simple analysis of multimodal content → generation of multimodal content" in text-to-video products, while again "stamping" the cost-efficiency prowess of GPT-4 as the most powerful foundation model of its time.

By July, Sora remained without commercial progress. But video generation has clearly been a key focus of model-layer efforts in recent months. According to incomplete statistics from industry media outlet Zhidx, since Sora's release, at least eight domestic and international players including Stability AI, Shengshu Technology, ByteDance, Kuaishou, Luma AI, and Runway have launched new products or models. Among these, Kuaishou's Keling AI was the first high-quality publicly available video generation service. Its single-generation video length reaches 120 seconds, with an additional continuation feature that enables users to generate videos of approximately 3 minutes.

Beyond generation length, new-generation products have seen substantial improvements in resolution, frame rate, and motion consistency.

3. Open-Closed Source Gap Continues Narrowing

The open-versus-closed-source debate extended into 2024, with the gap between closed and open-source models continuing to shrink and the trend toward AI democratization strengthening.

In February, Google released its open-source Gemma model series. In April, Meta's open-source Llama 3 series saw its "high-end" 70-billion-parameter version benchmark against GPT-3.5. In July, Meta and Mistral launched Llama 3.1 and Mistral Large 2 on consecutive days, with the former leading GPT-4 across general capability, coding, mathematics, reasoning, tool use, long-context, and multilingual benchmarks, while trading blows with GPT-4o in some capabilities. Domestically, in May, Alibaba Cloud's open-source 110-billion-parameter Qwen1.5-110B topped domestic open-source model performance.

4. End-to-End Multimodality Becomes Core Competitiveness

Large models' understanding and generation capabilities across modalities (text, speech, vision) are key to optimizing AI application performance and elevating user experience. During the densest month of mainstream player product launches in May, OpenAI introduced GPT-4o with its end-to-end multimodal focus, setting the direction for model-layer iteration through real-time emotional voice interaction and real-time video interaction effects.

These capabilities rest on GPT-4o's algorithmic unification at the model layer — integrating previously separate multimodal large models into one. For example, in speech recognition, previously discrete steps like ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) were merged, making multimodal content understanding and generation faster and more efficient. As end-to-end multimodal capabilities optimize, application-layer innovation gains greater imaginative space.

5. Pace of Top-Tier Model Iteration Slows

Despite the model layer's continued push toward greater power in 2024, more than a year after GPT-4's release, the pace of advancement at the top has visibly decelerated. Take GPT-4o: while regarded as the most significant closed-source large model update of 2024, beyond audio, video, and image capabilities, other performance gains were limited. On traditional benchmarks for text, reasoning, and code intelligence, it remained flat against GPT-4 Turbo from November 2023.

From an application effects standpoint, however, subtle differences in model capabilities still produce vastly different user experiences. GPT-4, for instance, achieves only 60-70% accuracy on many test metrics, which is why most large model products can only exist in relatively fault-tolerant forms like conversational interfaces. Given this, refining foundation capabilities and achieving breakthroughs at the model layer remain pressing priorities.

Application Layer: Awaiting the Breakout Moment

1. Model Capability Advances Catalyze Application-Layer Supply

From 2022 to present, significantly lowered engineering barriers have reduced the difficulty and accelerated the speed of AI application feature development. Meanwhile, capability improvements in video generation, image generation, and multimodal models have drawn more players into the application layer, with industry dynamics reshuffling rapidly. In a16z's March 2024 updated GenAI Application (Web) Top 50, 40% of listed applications were from new companies.

We continue tracking application opportunities across Consumer (representative sectors: gaming, virtual companionship, content communities, consumer electronics, search), Pro User (representative sectors: video editing, knowledge management, music editing, learning tools, browser sidebars), and Enterprise (representative sectors: various B2B software + AI).

2. Yet Application-Layer Commercialization Progress Remains Slow

From an overseas market perspective, 2B scenarios require deep integration with business workflows due to model hallucination and controllability issues. Combined with B-side budget contractions in the current economic cycle, B2B AI applications struggle to generate revenue from AI alone, with overall deployment falling short of expectations. 2C/Prosumer tool applications have proven relatively easier to land. Especially for Chinese companies going global, leveraging AI-driven feature increments in non-serious scenarios like gaming and emotional companionship has helped some find product-market fit. But overall retention rates lag behind the previous wave of internet products.

Domestically, on the B side, "traditional applications + AI" have seen faster commercialization progress than AI-native applications. B2B applications in customer service, marketing, and home design are delivering superior service to clients by layering on AI capabilities. On the C side, no breakout hit has yet emerged. Chatbots, text tools, image generation, and code development tools show generally low differentiation, while AI software applications' overall penetration on core terminal devices remains below 1%.

3. Bottleneck: Harder PMF, Model Capabilities Still Need Improvement

This relatively slow commercialization progress also reflects the difficulty of achieving PMF in AI applications. Compared to mobile internet, the boundary between technology and product is blurrier in the AI era. Because model capabilities significantly determine product functionality — meaning every step forward in model capabilities creates possibilities for product feature module iteration — product managers must simultaneously grasp technical boundaries with clarity and pinpoint user demand with precision.

Returning to the relationship between models and applications, current model capabilities remain insufficient to deliver a win-win experience of both cost and performance at the application layer. Foundation model deficiencies still require compensation.

4. Mature Hardware Categories Race to Add AI; AI-Native Hardware Struggles to Gain Traction

ChatGPT's rapid breakout spawned a wave of AI hardware enthusiasm, giving rise to AI-native devices like AI Pin, Tab AI, and Rabbit R1. These products innovated on form factor and marketed themselves as "phone replacements," garnering significant attention in the short term, but most failed to remain in the game. The core reason: AI brought no substantive improvement to hardware functionality itself, insufficient to catalyze new user needs.

On the "hardware + AI" front, industry leaders have moved to embed AI in or announced plans for general-purpose hardware categories like PCs and phones. Microsoft released a series of local large models in May and optimized Microsoft Copilot functionality; Huawei, Samsung, and Vivo have all launched phone products with cloud or on-device large models; and Apple's AI phone ambitions became visible with June's unveiling of Apple Intelligence, with cross-App content interaction and open APIs among the highlights worth anticipating. Beyond this, in mature categories like voice recorders, headphones, and glasses, some established players have also carved out differentiation routes to break growth plateaus under AI feature enhancement.

We are bullish on the differentiation window that generalized AI capabilities bring to mature categories, and also look forward to new opportunities for AI-native hardware in niche markets as AI capabilities advance.

5. Embodied Intelligence R&D and Financing Remain Active

In 2024, embodied intelligence — as the optimal physical-world vehicle for AI capability deployment — remains a hot track, with various players advancing toward general-purpose robots that best match humanity's ultimate imagination of artificial intelligence.

Tesla, the integrated hardware-software player at the industry's forefront, released Optimus Gen2, showing improvements in walking speed over the first generation and rapid growth toward the AI robot of future imagination. Domestic AI robotics startups have also been active in product development and financing. For example, Yunqi Capital angel-round portfolio company Astribot (星尘智能) released its first self-developed AI robot, Astribot S1 Demo, in April, boasting "strongest manipulation performance" among same-specification robots.

We believe that large models are the key to whether embodied intelligence's generalization advantages can truly be unleashed, determining whether robots' perception, decision-making, and action capabilities can achieve step-function breakthroughs. Currently, embodied intelligence foundation model approaches have not yet converged, and targeted breakthroughs at the model layer are worth anticipating.

III. Conclusion

As we enter the second half of 2024, application deployment has become the AI industry's central theme. While continuing to follow the model layer and infrastructure layer, we are particularly focused on AI applications in the following areas:

AI + Enterprise Software. Enterprise software can achieve greater feature innovation under AI enhancement, improving average revenue per customer while reducing costs and expenses. AI agents can also complete complex or repetitive tasks, providing enterprises with digital labor.
AI + Professional Service. AI will significantly alter traditional service companies' unit economics, resolving efficiency and cost bottlenecks while expanding service boundaries.
Productivity tools. Large models have disrupted traditional search engines and information retrieval methods, with generative AI bringing notable efficiency gains. Meanwhile, AI-enabled task automation promises to further liberate human labor from repetitive work, unlocking more creative value.
AI for science. We are optimistic about large model applications in scientific discovery, novel drug research, new materials discovery, and automated mathematical theorem proving, expecting AI to play deeper value in research efficiency and expanding scientific frontiers.
Embodied intelligence. Generalization and generalizability technical advantages will carry the era of robotic automation into the era of intelligence. Current embodied intelligence technical routes have not yet converged, and forms remain in embryonic stages. We welcome and support innovative enterprises' explorations across different paths.
Consumer electronics. Product functionality advantages brought by AI capabilities, combined with China's manufacturing supply chain dividends, create substantial value space for AI + consumer electronics.
Pan-entertainment. AI's incremental value in enhancing gaming playability and creating emotional value, combined with the vast innovation space that end-to-end multimodality brings to entertainment applications.

Portions of this content were AI-generated based on Yunqi Capital internal research.