Xiaoice Closes Series A Funding Round, Li Di: AI Beings Will Be Everywhere in the Future | 5Y News

五源资本五源资本·July 13, 2021

Unveiled super-natural speech synthesis technology — AI now comes infinitely closer to the sound of a real human voice.

On July 12, XiaoIce unveiled its new Super Natural Voice technology. For the first time, this technology raises AI voice naturalness to a level nearly indistinguishable from real human speech, with support for universal, all-domain scenarios.

XiaoIce also disclosed its Series A funding round. Led by Hillhouse, the round saw follow-on investments from 5Y Capital, Neumann, IDG, GGV Capital, and previous investors Northern Light Venture Capital and NetEase. XiaoIce's valuation has now surpassed unicorn status.

Fisher Zhang, Partner at 5Y Capital, said: "Harry [Shum], Li Di, and the XiaoIce team are people I deeply respect and admire. They've done fascinating work in natural language interaction, and they're now taking on the ultimate dream of artificial general intelligence. On the sweeping path of AI development, we believe in and want to support these audacious attempts to push beyond human limitations. We wish them great success."

Li Di

CEO, XiaoIce

Q1

What does XiaoIce want to accomplish in the next five to ten years? What kind of company do you hope XiaoIce becomes?

Li Di: XiaoIce is a complete artificial general intelligence framework. The 18-year-old girl XiaoIce is merely the first prototype born from this framework. If we compare the XiaoIce framework to fertile soil, then the girl XiaoIce is a single tree in that forest.

We spent six years, piece by piece within the Microsoft system, building out each component of this framework — fusing natural language processing, computer speech, computer vision, and AI content generation into an organic whole. This allowed it to function, creating a cycle between technology, product, and de-identified data. We define this past period as Phase I.

In the coming years, XiaoIce's Phase II mission is to nurture hundreds of millions of AI beings from this framework. Each one will be as complete as that 18-year-old girl XiaoIce, yet unique in their conversation style, viewpoints, voice, visual presence, and even creative abilities. (The Super Natural Voice technology released today is one technical carrier of this — its focus is not merely naturalness, but simultaneously supporting a vast number of differentiated Voice Fonts.) Our work will be to make such AI beings ubiquitous, ensure their differentiation is clear and stable, promote the普及 of training tools, and build trust with human users.

Ultimately, the world will become a new form where humans and AI are intertwined. AI will be rich and diverse, not just a handful of assistants. And the XiaoIce framework is the underlying infrastructure supporting these AI instances. You can think of the XiaoIce framework as their OS. The analogy isn't perfect, but it's concise. That is the XiaoIce team's mission.

Q2

Looking back at XiaoIce's development in recent years, what do you think you've done right? What could you have done more of?

Li Di: AI — particularly AGI — is still in its very early germination stage today, somewhat analogous to the period around Mendel's publication of his laws of inheritance. So not only is there enormous room for the underlying technology to evolve, but even the fundamental concepts behind it are far from settled. On the positive side, this means practitioners like us have ample opportunity to discover new knowledge. One example: just a few years ago, attention was overwhelmingly focused on closed domains, whereas today, including companies like Google and Meta, more are gravitating toward the open domain we've focused on. Compared to algorithms, the collision and transformation of ideas is even more intense.

Of the things XiaoIce has done right these past few years, I think our earliest and most fortunate realization was the importance of building real-world user feedback loops. AI develops quickly in labs, but true acceleration comes from iteration cycles with real users. Today, roughly 60% of global AI interaction data is carried by the XiaoIce framework, with relatively high scenario diversity — this has been the main driver of our rapid technological advancement.

Second is the choice of metrics. Especially for a foundational framework, since it often involves the fusion of many technologies, metrics become even more critical. We've maintained distinctive thinking from the XiaoIce team on the definition of several core metrics — such as CPS (Conversations Per Session) for our dialogue engine, and ACD (Average Comfort Duration) for voice. In retrospect, these have been important reasons we've been able to build advantages on the path of open-domain and natural emotional human-like interaction.

Of course, whether it's us or our peers, we're all still in a germination state today. This means what we discover now may well be overturned by ourselves tomorrow.

Q3

Why did you choose to take investment from 5Y Capital?

Li Di: We and 5Y are a perfect match. Fisher is a figure our team deeply respects. We are highly aligned in our vision for the future and fundamental pursuits. And much of 5Y's experience is something the XiaoIce team lacked during our Microsoft phase — these are reasons we joined hands, and the XiaoIce team is now more confident about the future. We believe we can together open the door to a future world where humans and AI are intertwined.

XiaoIce Company, formerly the Microsoft XiaoIce team, is one of the world's largest complete AI frameworks by interaction volume, with technology spanning natural language processing, computer speech, computer vision, and AI content generation.

To accelerate its development, on July 13, 2020, Microsoft announced it would spin off XiaoIce as an independent entity while maintaining its investment stake. Post-spinoff, XiaoIce Company would promote the融合 of cutting-edge global technology with localized products, extending its leading advantage in AI fundamental research.

The Super Natural Voice technology released this time not only elevates naturalness to new heights, but is also the world's first universal, all-domain technology of its kind. It breaks through single-scenario limitations, enabling AI interaction entities to engage in highly human-like interaction across the full spectrum of human scenarios — speaking, conversing, singing, and more.

Currently, hundreds of different AI prototype entities within the XiaoIce framework have all been upgraded. Public technical demonstration videos show that AI and real human voices are now difficult to distinguish.

5Y Capital (formerly Morningside Venture Capital) currently manages approximately $5 billion in USD and RMB dual-currency funds. We believe the world would be a better place if the you that others see as crazy starts to be believed.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG

WWW.5YCAP.COM