Talking Shop in a Bubble: How We Accidentally Threw a "Stand-Up Comedy" Show | Attent!on AI Open Source Night

云启资本·August 4, 2025·24·0

Unfiltered Thoughts on AI Open Source from the Front Lines

In 2025, stand-up comedy and open mics have become a surprisingly popular format for exchange in China's AI venture capital circles. Perhaps it's because unvarnished truths, unburied by PowerPoint slides, simply land harder.

Just over a week ago, we accidentally turned an AI open-source night into a comedy show.

On July 23, the evening Yunqi Capital and the OpenAtom Foundation jointly released the China Open Source Development Deep Dive Report, we teamed up with Amazon Web Services and "Cyber Zen" to host an AI open-source gathering at AGI Bar on Zhongguancun Startup Street.

No rigid agenda, no forced consensus. Thirty-plus AI practitioners active on the front lines of tech and entrepreneurship traded easy, witty remarks through four-plus hours of dense conversation. Plenty of substance, all over beer foam.

That night reaffirmed something: the value of open source has never been limited to technology and code — it's a trust mechanism that connects people. Today we're "open-sourcing" some of the highlights to share with you.

The gathering brought together over thirty friends from the trenches, spanning open-source databases, AI infrastructure, major internet platforms, and open-source communities.

There were old hands like PingCAP and Zilliz, with more than a decade in open source; builders who honed their products and content on communities like SegmentFault and OSChina; technical talent from the front lines of Amazon Web Services, Baidu, Xiaomi, TikTok, Ant Group, and Zhipu AI; plus serial entrepreneurs and Gen-Z founders who'd rushed over from the office with takeout and laptops in hand.

For the opening presentations, Xiaoting He, Applied Scientist at Amazon Web Services Greater China Solutions Development Center, and Xin Wei, Vice President at Yunqi Capital, who co-authored the China Open Source Development Deep Dive Report, delivered substantive takes from the angles of business practice and investment insight — covering open-source model deployment and commercialization progress.

The Real Problems and Real Solutions for Open Source's Last Mile

In the current flowering of open-source models, projects that have actually reached the front lines of business and achieved stable deployment remain rare. Applied Scientist Xiaoting He from Amazon Web Services shared a few stories to illuminate the real-world scenarios and thinking behind small-model fine-tuning.

1. The "Realist" Choice of Small Models: Dumb, but Practical

"Large models are powerful, but most clients can't afford private deployment." This was a pattern Xiaoting He observed widely. In his experience, small models at 14B, 7B, even 1.5B parameters are easier to deploy, more cost-controllable, and more readily adopted. On one hand, enterprises lack the capacity to build large-scale clusters; on the other, H100 inference costs are prohibitive. So most landing solutions end up back on the practical path of L40, L4, even edge devices.

Yet small models' biggest weakness is also staring everyone in the face: they're "dumb." Even with fine-tuning, they can't match large models' generalization and reasoning capabilities. In this context, enterprises must make trade-offs within an "impossible triangle" — intelligence, cost, and latency; you can never have all three.

But this doesn't stop small models from shining. Xiaoting He pointed out that many business problems don't require a model to "solve Olympiad math" — as long as it's "good enough," stable, and generalizes well, it creates value. In vertical scenarios like content moderation and translation, for instance, small models can absolutely achieve high-precision processing through fine-tuning, provided the data is well-fed and finely tuned.

2. Fine-Tuning Isn't About Tuning the Model — It's About Tuning the Data

In this process, the true core of fine-tuning isn't technical difficulty; it's the quality and explanatory power of the data structure.

He shared one type of scenario: an internet client wanted to identify violation risks in user nicknames. The client provided a batch of "bad cases" that looked extremely ambiguous — some were hard to even understand how they constituted violations. The team discovered that relying solely on these result labels couldn't train an effective model; they had to get the business side to write out "why this counts as a bad case," and incorporate that explanatory process into the data, before the model could learn accurately.

This type of data he called "Thinking Data" — not just labels, but data inputs that contain human reasoning paths.

"We used to train models like teaching a cat to sit. I'd say 'sit,' push its butt down, and it sat. Later we realized it didn't understand the command — it sat because it saw the chicken breast." This analogy captures the most easily overlooked part of model training: learning the result without learning the reason.

3. Co-Building Data, Integrating Business: The True "Last Mile"

"Compute and tools are no longer the core bottlenecks. What's truly hard is building good data that's tightly bound to the business." Xiaoting He believes that co-building data and integrating business is the key to running open source's true "last mile." Take the violation identification scenario mentioned earlier: different companies, different people, may judge whether a piece of content violates rules differently.

And this data can't be auto-generated through scraping or crowdsourcing; it can only be co-built through deep collaboration between technical and business teams. This isn't a task that engineering can complete in "closed-loop" fashion. It's an organizational collaboration and cognitive co-construction problem. In Xiaoting He's view, the decisive factor for small models isn't parameter size — it's whether you can get "technology and business to co-build a set of effective, structured, clearly explained data." Only then can open-source models truly "run into the business."

Globalization, Lightweight Operations, and Governance: Three Directions for Open Source Heading into Industry

Xin Wei, Vice President at Yunqi Capital, drawing on the recently released China Open Source Development Deep Dive Report, shared observations from an investment perspective on the current wave of AI technology entrepreneurship and open-source ecosystem evolution, while raising several under-discussed critical questions.

1. The Breakthrough for Open-Source Entrepreneurship: From the ToB Trap to Globalization

Chinese open-source projects struggle to write the full "commercialization chapter" that overseas projects can. It's not a capability gap — it's environmental: enterprise users generally lack the habit of paying for software, and low labor costs fuel a DIY orientation.

The real breakthrough path is globalization. Even if monetization isn't immediate, you can start with global community operations, establishing the loop of usage, feedback, and iteration. Once a project has a developer base overseas, customer unit price, renewal willingness, and brand spillover effects all change significantly.

Whether building ToB tools or ToC products, AI entrepreneurs today should be thinking about how to "go out" from Day 1. Globalization isn't just a sales strategy — it's a prerequisite for open-source ecosystem survival.

2. Beyond the Project, the Organization Itself Should Be "Open Source" — Small Teams Win Through Lightness and Speed

A clear shift in this new wave of AI entrepreneurship is the "lightweighting" of organizational structure. Model updates are measured in weeks; tools evolve by the day. Small teams can pivot quickly and trial-and-error rapidly amid high-frequency change.

Young, flat, unburdened teams are becoming more competitive. Compared to large organizations, small teams can "turn the wheel" more easily, dare more readily to abandon existing paths — even if today's direction is off, they can correct course in short order.

This structural flexibility also makes them more effective at spreading in open-source scenarios — a ten-person team releasing an agent demo or open-sourcing a data interface can quickly spark community attention and win early momentum.

3. Beyond the Model, There's Governance: Several Unresolved Questions About Open Source

Earlier this year, with DeepSeek-R1 as a representative, the open-source large language model ecosystem is rapidly approaching or even surpassing some closed-source models. But in video and other multimodal directions, closed-source models still hold advantage.

What's truly worth raising are the governance questions behind open source: Is the model truly "open source"? Are the code, training data, and license transparent? How does the community mechanism operate? If a model is repackaged or even "injected with risky behavior," how do you trace the source? And who bears responsibility?

In the AI era, open source isn't just about open-sourcing code — it requires open-sourcing "trust." From code, to data, to security mechanisms in the usage chain, these questions have no standard answers, but they must start being asked.

Substance in the Foam: The Unvarnished Truths That Got "Open-Sourced"

After the opening presentations, the next several hours belonged to an "open mic" session for the thirty-plus builders, thinkers, and whisperers in the room.

Not long into the free-form exchange, Mars Ma, CEO of OSChina, "ambushed" the gathering with a bottle of champagne, its rising bubbles pushing the evening's conversational energy to another peak.

From large-model company engineers who'd just finished coding and been spontaneously called up, to IT veterans with over a decade of global market experience, to serial entrepreneurs, AI platform technical practitioners, fintech AI platform builders, heads of leading open-source communities, Gen-Z founders...

Some recounted why they were resolutely going overseas. Some offered honest assessments of model paths and engineering practice. Some reflected on the soil and cycles of the ToB market. Some broke down the industry reality of "open-source large models versus small models." And many more met new friends, forged new connections.

Finally, we're sharing some of the evening's quotes anonymously — "open-sourcing" them, as it were — and hoping for more nights like this, where sparks of ideas can fly.

Open-source community leader Ten-plus years of experience across China and overseas

ToC runs six times faster than ToB, but ToB shouldn't rush — wait for ToC to work out the paradigm, then pull it back.

Open-sourcing large models isn't for people to actually use them; it's to prove: I really can train this.

Frontline practitioner at a large-model startup

Senior leader at an open-source distributed database company

AI is the extension of internet fundamentalism; open source + cloud + AI brings the maximum degree of freedom.

The original intention of technologists to change the world can't be lost, but surviving itself is victory.

IT veteran with ten-plus years in the industry

AI data entrepreneur

If industry competition focuses mainly on price wars and service wars,

rather than technology wars. In the long run, great products and technologies won't emerge from such an environment.

I hope this era can let developers work freely, with more right to say no.

AI infrastructure platform

entrepreneur

Major internet platform

AI application development

Open source isn't about copying answers — it's about helping us see that a path can actually work.

I used to think developing AI applications was more about building, but it's actually more about discovering the "connection between application and user," and conveying that part to the model.