Replicating Sora? Long-context or RAG? Are Agents a fake demand? | Yunqi Capital's Ten Questions X Attent!on

云启资本·March 29, 2024·3·0

Just enjoy the "petrichor" of AGI.

"Questions Matter More Than Answers"

— Douglas Adams expressed this crucial idea in The Hitchhiker's Guide to the Galaxy, the series often hailed as the "bible of science fiction."

In 1950, Alan Turing, the "father of computer science," posed a famous question — "Can machines think?" — setting the stage for artificial intelligence.

In 2017, the paper "Attention Is All You Need" introduced the Transformer architecture, illuminating OpenAI's possible path toward AGI. Not long ago, Jensen Huang predicted that AGI would pass the Turing test within five years.

Just enjoy the "petrichor" of AGI.

This past week, the AI universe continued to bloom in every direction: OpenAI released the first user feedback videos for Sora; GPT-5 hasn't even launched yet, and GPT-6 training has already maxed out 100,000 H100s; Moonshot AI triggered a long-context arms race among domestic tech giants...

On one hand, technology keeps evolving at breakneck speed. On the other, commercialization challenges are intensifying. We're at an inflection point — the eve of the next wave of AI product explosions. Many questions may not have standard answers yet, but that's precisely what makes the possibilities infinite.

Last Sunday, the "Attent!on" Yunqi AGI+ Salon — Shanghai Premiere kicked off to a packed house. We received nearly 200 registrations, and invited close to 50 frontline engineers and product leads from major tech firms, top algorithm scientists, and founders of star startups for in-depth discussions.

Given the massive volume of closed-door conversation and its private nature, we've excerpted select content and distilled it into 10 key technical and product questions — the "Yunqi Ten Questions" — for your consideration. Enjoy.

Selected Participants

Alibaba, Huawei, Tencent, ByteDance, SenseTime

Shanghai Jiao Tong University, CUHK, Shanghai AI Lab, Fudan University Cancer Hospital

Moonshot AI, Zilliz, Xunpanyun, BentoML, and other AI practitioners

Emily, Dalton — Yunqi Frontier Technology investors

YQ&AI

01. What is Sora's biggest breakthrough?

Lead, Large Model Team | Major Tech Firm

Traditional video generation models are constrained by GPU memory size, limiting training to scarce resources. This typically restricts video model training to just a few seconds, unable to fully capture and express the richness of video content. Sora adopts the DiT architecture, which allows it to distribute the video model across multiple GPUs running in parallel, solving the single-card memory bottleneck. This dramatically enhances the model's expressiveness and training efficiency, enabling Sora to handle longer video content and more complex frame sequences.

Additionally, Sora demonstrates formidable capability in data processing, flexibly handling inputs without forcing images into fixed sizes. Video data collection and processing are far more challenging than text and images, particularly in data cleaning and video-text alignment. Sora's broader insight: future multimodal AI models will need deeper breakthroughs in textual description of video models.

YQ&AI

02. In the long-context race, why is RAG a critical technical approach?

Lead, AI Platform | Zilliz

Long Context and RAG aren't contradictory. RAG offers a more efficient way to provide context — a more stable and performant engineering implementation.

RAG's core lies in combining retrieval and generation capabilities. By leveraging an offline-maintained knowledge base, it extracts information relevant to the query and integrates it into the large model's inference process, yielding high-quality outputs. This approach effectively extends the large model's capacity to process long texts while maintaining precise information capture.

However, RAG faces challenges, particularly in cost control and efficiency optimization. As the technology matures, we expect RAG to achieve more efficient online inference without sacrificing output quality. This demands not just algorithmic optimization but continuous engineering exploration — building effective indexes and leveraging offline computation to reduce costs. RAG's application scenarios are also expanding, from medical products to enterprise knowledge bases, all exploring how to harness this technology to improve service quality and user experience.

YQ&AI

03. How can foundation models differentiate?

Senior R&D Engineer | Moonshot AI

Moonshot AI's Kimi intelligent assistant now supports 2-million-character ultra-long lossless context input — one of our chosen differentiation vectors. Many of you may be waiting to try it; inference speed is still somewhat slow at the moment. We aim to make long-context the best it can be, domestically and globally, standing out enough in certain dimensions to compete head-to-head with OpenAI.

We've also noted that MiniMax excels at Function Call — a key direction we're focusing on next, further enhancing model controllability and applicability.

YQ&AI

04. How can AI agents deliver more value to users?

Product Manager, AI Agent Direction | Major Tech Firm

In current AI agent R&D and real-world application, complex task decomposition and execution still involve significant uncertainty — unpredictable token consumption, potential looping issues, and task deviation during execution. Thus, the prevailing trend leans toward developing simple, consensus-driven task flows that ensure controllable outcomes for better practical deployment.

To maximize AI agent value, we need clarity on several points. First, define clear user goals. Second, consider how to help users understand and effectively use agents. We're exploring how to leverage AI agent capabilities by parsing API documentation and generating corresponding UI interfaces, letting users interact with agents in personalized ways.

When seeking application scenarios, we should focus on domains where agents can dramatically reduce implementation costs compared to traditional approaches. When users feel the cost difference between using an agent versus building with a large model directly, their perception of AI's value deepens.

Algorithm Scientist | Tencent AI Lab

An AI agent's core lies in three interconnected, indispensable capabilities: environmental perception, reasoning and decision-making, and execution control. Perception lets the agent understand its surroundings, whether physical or abstract. Reasoning and decision-making manifest in large language models, which in some respects surpass human capability. Execution control concerns how the agent translates decisions into actual action.

The key to technology-product integration is presenting these capabilities in a user-friendly manner. UI design is a prime example — simplifying complex controls into intuitive operations. Integrating the agent's perception, reasoning, and execution so users interact as naturally as possible.

YQ&AI

05. What technical challenges do AI agents face in real-world scenarios?

Former NASA Scientist

AI agent R&D is trending toward fusing perception, decision-making, and execution capabilities. Technically, we need to solve how agents better adapt to environmental changes, improve planning generalization, and optimize effectively across different goals and scenarios.

Moreover, AI agent development must consider integration with existing systems — how to use agent capabilities to optimize and enhance the efficiency of current tools.

YQ&AI

06. Is Scaling Law the ultimate answer?

Algorithm Expert | Alibaba

Scaling Law isn't a universal key. AI, as an applied science, derives its value first from optimizing industrial scenarios — a major reason for AI's current hype. Industrial scenarios are complex and diverse, demanding deep algorithmic exploration and innovation rather than simply relying on model scale expansion.

For instance, in video generation, video model optimization and fine detail adjustment remain critical. Video generation requires not just video-text coordination but also handling pixel-level coordination within each frame and temporal relationships between frames. These complex interdependencies demand meticulous model design. Furthermore, Scaling Law struggles with different downstream task configurations. Take dance: whether solo, duet, or other forms, each requires specific model adjustments to optimally model and generate the desired signals. Only by combining algorithmic innovation with scenario optimization can we truly advance AI applications.

YQ&AI

07. Will future AI architecture be unified, or pluggable and modular?

Professor | Shanghai Jiao Tong University

On this question, our own thought processes while working provided inspiration. For example, our sense of smell typically only reacts to specific phenomena at critical moments, while vision and the brain collaborate flexibly and continuously. This led us to realize that future AI architecture might set different training objectives for different application scenarios, activating different training modules at appropriate times.

We envision that even if a model can handle various tasks, it should remain controllable. Thus, future architecture design must not only accommodate diverse scenarios but also accept external oversight and regulation.

YQ&AI

08. How can business scenarios be better integrated with AI?

Founder | AI Information Recommendation

In exploring AI and large model applications, we as entrepreneurs focus on two core issues: data boundaries and a model's potential for business optimization. We've realized that in video model training, since available internet data tends to be popular, trending content, models may fail to capture broader, more authentic reactions and behaviors of ordinary people — missing a more genuine slice of reality.

Second, regarding business optimization potential, we believe end-to-end models represent the future trend. Given current large model limitations, we use agents as a temporary compromise, combining different models and tools to adapt to various business scenarios. However, we believe that as technology advances, future powerful models will directly initiate workflows without needing complex agent orchestration.

YQ&AI

09. How should AI products go global with GTM?

Head of APAC | BentoML

Open source, continuous iteration, and overseas entity presence are the three pillars of our globalization strategy. In early overseas expansion, you might rely on personal connections or investor introductions to promote products, but this approach doesn't scale. Open source is a broader way to validate your product — especially when it solves industry pain points, open source can rapidly generate market feedback.

From a sales perspective, remember: "The product is never fully ready." Don't wait for perfection before seeking customers. Instead, find partners and early adopters willing to try during the iteration process, refining the product together. Finally, we believe physical presence in overseas markets is essential. Our regular offline events in San Francisco, for example, help build trust and credibility when promoting products — this "born global" mindset may yield better results.

YQ&AI

10. What lessons does Silicon Valley AI entrepreneurship offer?

Founder | AI Native Productivity Tools | Silicon Valley

Silicon Valley's competition isn't limited to local entrepreneurs — it's global, among immigrant founders everywhere. Silicon Valley's startup environment is unique: founders there are extraordinarily diligent, attending every event, seizing every opportunity.

Meanwhile, Silicon Valley AI entrepreneurs emphasize filtering for paying users from day one. They believe users without willingness to pay are better left uncontacted — this screening mechanism helps them focus on genuinely valuable customers early on. Silicon Valley's networks and communities are extremely important for startups, not just boosting product visibility but enabling rapid, invaluable user feedback critical for fast iteration.

Beyond the "Yunqi Ten Questions" intellectual collisions, the on-site "Open Mic" and "BBQ" sessions were lively too — discussions spanning different industrial scenarios and applications, plus old and new friends chatting over skewers 🍻

Fudan University

Cancer Hospital Professor

I want to understand how new AI technologies can be applied in healthcare, especially tumor pathology.

How far are we from Sora? What's the gap?

Guotai Junan Securities

Xunpanyun

Here to share some AI practices in cross-border and healthcare domains...

...

"Attent!on" Yunqi AGI+ Salon Shanghai — Event Recap

The AI universe keeps evolving. On the journey to find the next AI ACE product, there's no rush to find standard answers. Think, question, explore — embrace more possibilities together.

Due to venue capacity limits, many friends regrettably couldn't join us this time. The next "Attent!on" Yunqi AGI+ Salon will be in Shenzhen in May — stay tuned, and welcome to join the sharing and discussion :)