September AI Watch: New Models, New Apps, and Where AGI Goes From Here | Yunqi Capital Attent!on Tech Notes

云启资本·September 29, 2024

The Revolution Continues

With 2024 nearly three-quarters gone, just as industry skepticism about the pace of AI development was creeping in, the leaders of the wave delivered new answers.

Looking back at September, tech giants like OpenAI, Apple, Meta, and ByteDance made notable splashes on both the model and hardware fronts, while emerging players like MiniMax also scored important advances.

Yunqi Capital, deeply rooted in the AI space, has been closely tracking industry developments. As September draws to a close, we're using this edition of "Attent!on: Yunqi Tech Notebook" to sort through the month's most noteworthy developments and share our observations and reflections — helping everyone find direction amid the vast sea of information.

OpenAI Releases GPT-o1

What Can a New Paradigm Change?

In the early hours of September 13 Beijing time, OpenAI broke from its usual pattern and released its new-generation large model series GPT-o1 without any advance PR buildup — unveiling the project outsiders had dubbed "Strawberry."

This new model, starting its naming from "1" rather than "5," represents OpenAI's first product of reinforcement learning training. Complex reasoning is the core capability of GPT-o1, with standout performance in mathematics and programming tasks. The OpenAI team calls it the company's strongest reasoning model to date. Behind this lies a shift in large model technical approach: through reinforcement learning, o1 can autonomously generate and refine chain-of-thought (CoT). OpenAI founder Sam Altman posted on social media platform X that Open o1 marks the beginning of a new paradigm.

Chain-of-thought refers to breaking a problem's solution into several steps and tackling them sequentially. According to the cognitive science "dual process" model, the human brain employs two modes for decision-making: one is fast, automatic, and unconscious (System 1); the other is slow, deliberate, and conscious (System 2). Chain-of-thought capability is key to approaching System 2. Under the previous "predict next token" training paradigm, GPT only possessed System 1 capabilities. GPT-o1's acquisition of System 2 means that beyond mining existing knowledge, it now also has the ability to generate new knowledge.

However, comparing evaluation results between o1-preview and GPT-4o, GPT-4o still holds advantages in personal collaboration and text editing tasks — indicating that o1 currently doesn't outperform in everyday communication and text generation.

Another hope the industry has pinned on GPT-o1 is whether, as the "stacking more" bottleneck in data and compute resources becomes increasingly apparent, the scaling law path for model capability improvement can shift from the pre-training side to the inference and reinforcement learning side. The answer remains to be seen.

Yunqi Quick Take

  • GPT-o1's core change is the combination of large language models with the reinforcement learning approach, achieving substantial improvements in thinking and reasoning. During the chain-of-thought process, a PRM (Process Reward Model) scores each step's solution from the LLM, enabling optimal results. Theoretically, GPT-o1 should perform better in scenarios demanding deep thinking such as code, mathematics, and broader logical reasoning.
  • Worth noting: GPT-o1's benchmark tests selected mathematics and programming scenarios, which have clear answers and are conducive to reward model training and operation. But in more complex scenarios without clear-cut answers, reward models become considerably more complicated, and corresponding data collection faces greater challenges.
  • Generality and generalization are important traits distinguishing this wave of AI from the previous one. Currently, GPT-o1 hasn't yet shown a clear path for generalizing its reasoning capabilities, leaving substantial room for expectation in subsequent iterations.

Domestic Video Generation Models Explode

Is Sora Still Worth Waiting For?

With Sora's "actual product" still nowhere in sight, domestic generative AI players have already pushed video generation to the next level. In the past month alone, large model teams including MiniMax, Tongyi, Doubao, and the Shanghai AI Laboratory have all launched video generation models. Clearly, this has become a track that neither large model startups nor tech giants can afford to miss.

On August 30, Yunqi Capital angel-round portfolio company MiniMax released a "multimodal full suite", with its video generation model abab-video-1 drawing particular attention. The model can generate high-resolution, high-frame-rate native video from text prompts, demonstrating strong performance in compression rate, text responsiveness, and style diversity. It can generate 6-second video clips from text and supports text generation within videos.

Notably, abab-video-1 is also one of the few domestically available "actual products" already open for use — from its release date, users could access it on the Hailuo website.

Also garnering significant attention were tech giants' video generation models. In mid-to-late September, Alibaba's Tongyi Wanxiang released its video generation model, supporting up to 5-second video generation at 30 frames per second with 720P resolution. ByteDance launched Doubao Video Generation-PixelDance and Doubao Video Generation-Seaweed, characterized by their support for naturally coherent motion and complex multi-subject interactions.

Yunqi Quick Take

  • In 2024, AI video generation has seen significant improvements across key dimensions including duration, resolution, and consistency.
  • However, the cost of using video generation models remains distant from commercial viability. From single-card generation time alone, producing a 7-second video takes tens of minutes, and embedding such features in apps generates substantial costs. So from a cost-reduction perspective, there's still a considerable road ahead.
  • On the business model front, various parties are actively exploring e-commerce marketing, film and television creation, healthcare, and education application scenarios. But how much real demand exists in video workflows for AI-generated material? This remains to be validated through further video generation technology iteration and broader commercial practice.

Apple's First AI Phone Debuts

Has the Smartphone Pioneer Fallen Behind?

In the harvest month of September, leading phone brands also bore new fruit. On September 10, two top phone brands — Apple and Huawei — simultaneously launched new products, making waves in software and hardware respectively.

Apple, which pioneered the smartphone era, positioned its new iPhone 16 series as the first iPhone designed for AI. But its AI feature rollout was notably restrained overall, with the main selling point landing on "Visual Intelligence" — a new "Camera Control" button on the side that also serves as the entry point for AI to read image information and generate related content.

Additionally, improved AI algorithm running speed, upgraded camera capabilities, and AI note-taking features were highlights of this update. The chip was particularly notable: the entire iPhone 16 lineup, including base models, adopted the second-generation 3nm A18 chip with a 16-core neural engine and doubled machine learning capabilities.

But in the most anticipated area of software ecosystem interconnectivity, the iPhone 16 series showed little obvious AI imprint. In fact, for phones — an already mature, red-ocean category — finding new growth curves within the typical 3-4 year user replacement cycle requires major innovations on either the software or hardware front.

Apple, with its complete product ecosystem, was once seen by the industry as having this opportunity. But this release's AI features fell short of expectations, with some quipping that the heavily hyped event only left people remembering a camera button.

By contrast, Huawei's premium trifold screen Mate XT, launched the same day, exceeded market expectations in its initial sales performance. The starting price of 19,999 yuan wasn't low, yet all models sold out rapidly on launch day, with scalper prices surging before gradually declining in late September. This again confirms that in the growth-stagnant phone market, foldable screens are one of the few remaining growth points.

Yunqi Quick Take

  • Facing phone users' lengthy replacement cycles and the trend of increasingly difficult hardware differentiation, software feature updates remain key to boosting purchase intent. Subsequent feature updates from Apple and other leading phone brands based on AI capabilities remain worth watching.
  • But the "division of labor" between models needs further clarification. Which functions, or which parts of a function, should be implemented via on-device models? Which functions need to rely on cloud-based models? Clarifying this question could better enable AI feature innovation while accommodating phone hardware performance as much as possible.

Meta Releases Its Most Expensive AR Glasses Prototype

Why Is Hardware Innovation Focusing on Glasses?

Following its hit Meta Rayban, Meta recently released its first AR glasses prototype, Orion. With hardware costs alone reaching $10,000, this AR glasses prototype leads the industry across multiple key metrics.

For instance, in field of view (FOV) — a dimension with decisive impact on interactive experience — Orion uses optical-grade silicon carbide to deliver up to 70 degrees FOV while maintaining image quality, compared to the 30-50 degrees typical of most AR glasses on the market. Additionally, its body weight of just 100 grams shatters the impression that AR glasses are too bulky and heavy.

Of course, as a prototype-stage product, Orion still has distance to cover before reaching the consumer market. But one hard-to-ignore phenomenon is that amid bottom-layer technology revolutions represented by AI, glasses are becoming one of the most heavily contested categories in smart hardware. From Yunqi Capital Pre-A portfolio company Hive Box Technology's AI audio glasses to video-capable glasses like Meta Rayban, a range of glasses equipped with large model capabilities are "conquering" smart hardware bestseller lists.

Thus, whether glasses will become the next-generation terminal after phones is also worth watching.

Yunqi Quick Take

  • Following PCs and phones, and based on improvements in edge computing power and edge hardware in recent years, the tech world's search for the next-generation terminal has never stopped. Glasses, as relatively smaller and more nimble devices, are beginning to carry more information content and experiences.
  • From a penetration perspective, billions of people worldwide already wear glasses daily, giving them broad market foundations as a new smart interaction medium.
  • Beyond voice and touch, more cutting-edge interaction forms like brain-computer interfaces are worth anticipating.