"Let Every Talented Ordinary Person Create Music" — ACE Virtual Singer Closes Pre-A Round | 5Y News

五源资本五源资本·September 8, 2021

The next-generation music creation and distribution platform.

ACE Virtual Singer recently announced the completion of a multi-million-dollar Pre-A funding round, led by ZhiChun Capital with continued participation from existing investor 5Y Capital.

ACE Virtual Singer is a music creation app powered by AI vocal synthesis technology. Its parent company, Beijing Shiyu Technology, is dedicated to empowering users with a new music engine for UGC music creation and distribution. By connecting structured data across the creation and consumption ends, the company aims to generate new music supply while making it possible to revolutionize music media formats and consumption experiences in the future.


Guo Jing

Founder, ACE Virtual Singer


Q1

Why did you start ACE Virtual Singer?

Guo Jing: 2015-2016 was my gap year. I went to Silicon Valley to explore what interesting directions might be worth pursuing. It was a peculiar turning point in my life — to explore while preparing for what I wanted to do next, I, originally from a non-tech background, started writing machine learning code, building websites, hand-drawing animations, hacking phone hardware...

This made me realize that many things aren't that difficult. What's likely holding back people's creativity isn't creativity itself, but rather tool barriers and false authority shaped by vested interests within the field. The former raises the threshold for people to even begin trying; the latter often sets wrong goals, making people feel that "it has to be that refined, high-end, and professional to count as a work." But creativity shouldn't be about people's ability to use tools, or their ability to turn themselves into tools — it's the most fundamental thing about being human, the fleeting idea and the grand feeling... Tools should better serve people. I call this concept "de-toolification."

At the end of 2018, my partner and I wrapped up our first AI venture and were looking for new opportunities when we saw a breakthrough in AI generation — AI could empower music creation, making good music no longer monopolized by top creators but accessible to every talented, spirited ordinary person. This excited us tremendously. Music had never been UGC-ified. We started searching for answers to what exactly was blocking ordinary people from creating music.

Q2

What do you imagine for the future of AI-empowered music creation? What role will ACE Virtual Singer play in it?

Guo Jing: ACE will be the next-generation music creation and distribution platform.

Today's music faces high barriers at every stage — creation, production, distribution, and monetization — resulting in a typical PGC ecosystem. For talented people, each step from wanting to learn songwriting, to producing a releasable track, to actually getting it out there and noticed, to generating real income, is fraught with difficulties. The more cost consumed at each stage, the more the entire industry tends to filter for a small number of certain, low-risk bets — the so-called "head" content.

From day one, ACE has been working on both creation and distribution simultaneously. On the creation side, AI is a powerful technical variable. Our AI vocal synthesis, for instance, digitizes "voice" — the most important instrument in music creation — allowing creators to summon virtual singers of various timbres at any time. They're available 24/7 for you to experiment with your work. While all other instruments have already been digitized and callable, digitizing the human voice has remained the unfinished step in digital music creation.

Furthermore, technologies like AI composition and lyric-writing can lower the creation barrier further, turning some of the "essay questions" in creation into "multiple-choice questions" — driving creation through aesthetics rather than tool proficiency. But we don't limit ourselves to AI technology alone. Any technical variable that can lower music creation barriers will be incorporated into ACE's iteration. The endgame for ACE's creation tools will be a new-generation music creation engine combining AI, cloud rendering, real-time collaboration, and music visualization.

The number of music creators should be in the tens of millions — there's still a gap of one to two orders of magnitude. We believe new creators can bring entirely new and broader content spaces and formats, which is how we approach the optimal solution for music as content. When humanity is sailing on that new ocean of music, they probably won't be able to imagine how dry and scarce our current music is.

Q3

What's been the hardest thing since starting up? What's been most rewarding?

Guo Jing: Most rewarding is that we've pushed AI vocal synthesis technology to possibly the best level globally — in terms of singing naturalness, singer fidelity, and emotional richness, we haven't found any better product in the market. And our team is a group of atypical tech talents: our CTO originally developed game engines, our lead algorithm engineer was born in 2002. We tried many approaches and ultimately achieved breakthroughs in vocal synthesis algorithms and engineering.

On talent discovery, we've also taken an unconventional path, from initially worshipping credentials to actively finding young people with strong hands-on ability. Interestingly, this positive feedback from talent further convinces me that human creativity comes from ideas and spirit — professionalism and experience accumulation are just low-ceiling entry barriers. In any field, once people cross this threshold, doing something well in that domain is fundamentally no different from cooking a good meal or cleaning one's room. Capable people cook well too, as long as they cross the threshold of acquiring pots and pans.

For over a year I struggled with finding technical talent, so I once thought solving technical problems was the hardest. But now I feel that finding the right thing to do is always the hardest part of entrepreneurship. We believe in the grand vision of democratizing music creation, but at each stage, what's the primary problem, how to design the path, what goal to reach first — these are the hardest things, and what I should be solving.

Q4

What most intrigues you about Gen Z? How do you understand Gen Z creators?

Guo Jing: Actually, I don't think Gen Z is fundamentally different from young people of any previous era — they're all sincere people whose true selves haven't degenerated. The process of aging is the process of exchanging your true self for worldly resources, status, and experience.

In the exchange between true self and utilitarianism, today's young people face what may be the most friendly environment. They can preserve their true self to an unprecedented degree while also achieving utilitarian self-realization. So you'll find they always exceed us old folks' imagination. Especially those born after 2000 and 2005 — they have extremely diverse hobbies and aesthetics, yet reach consensus where you least expect it. They don't perform individuality for show's sake; they like what's good, whether it's rock, hip-hop, EDM, Phoenix Legend, or ethnic singers. They have strong preferences but no obvious taboos, plus they have strong moral洁癖.

The way to understand Gen Z is to imagine yourself as someone unbound by worldly utilitarianism and existing values — what would you like, what would you hate. These traits also show in Gen Z's creations. Their songs can be literary and aloof, or absurd and meme-y — there's no adult-imposed hierarchy. If it sounds good and is creative, they think it's good.

Q5

What contrarian thinking do you have in your field? What's the biggest misconception about you?

Guo Jing: One contrarian view is directed at the music production industry. The common belief is that today's short, fast media is destroying people's musical aesthetics. I disagree. I think people should spend more attention appreciating what truly makes your "DNA move" in music — those big ideas. A big idea is when a song depicts a life scene you've never seen in Haifeng dialect, making you feel like you've time-traveled to a 1990s township, riding a beat-up motorcycle in flip-flops. A big idea isn't Blu-ray audio quality, or the rich dynamics in sound that musicians often emphasize.

Also, regarding creation tools lowering barriers — I initially agreed with most people that the lower the barrier, the better. But now I feel that barrier and expression are a trade-off. If the barrier is so low that it compresses the core logic of content, content expression will inevitably be sacrificed. Just as short video tools eventually had to teach creators what transitions and editing are, but Xiaokaxiu didn't need to, because Xiaokaxiu wasn't a content creation tool but a tool for extracting social candy. Such tools need maximum input-output ratio — minimum learning cost for maximum fun content — but don't care about expression, and can't truly create consumable content.

So in our AI-empowered music approach, we didn't choose the fully automatic generation route, because I feel that if you press a button and an algorithm gives you a song saying "here, this is your song," that's not empowering expression.

Q6

What non-work question have you been thinking about recently?

Guo Jing: The elevator constantly plays an ad: "For tiger-skin chicken feet, eat Wang Xiaolu." Why are they called "tiger-skin" chicken feet? The pitted texture seems more like toad skin. I think it's because of "tiger-skin green peppers" — tiger-skin green peppers are called that because after frying, the peppers have dark stripes like a tiger's pattern. But after frying, the pepper skin also shows a pitted texture. With tiger-skin chicken feet, this pitted texture makes people associate with the tiger-skin name. In school, teachers told us "magnificent" (美轮美奂) describes grand architecture, but every time I wrote compositions, I used it to describe our school's beauty queen. Was I wrong? Does language evolution follow inductive consensus, or deductive metadata?

There's a song on ACE with a comment: "Good work, just a bit good, although it sounds good, it's just relatively good, not saying it's not good anywhere, just relatively good." Another user replied: "You're saying it by saying it?" (dying laughing hhhhh) The evolution of internet slang — any human with context can sense how much sense it makes, how much it improves expressive efficiency (not just meaning, but emotion), yet these things are hard to explain through linguistic narrative logic. Can only borrow a meme caption: "Shh, don't speak, feel it with your heart."

Q7

Recommend a work you've recently enjoyed?

Guo Jing: Wu Tiao Ren's Riding a Bicycle Leading a Pig.

Q8

Why did you choose 5Y Capital's investment?

Guo Jing: Among all content format UGC-ifications, music has always been a tough nut to crack. Many characteristics of music give it a certain head-heavy tendency — not everyone believes music can be UGC-ified. But 5Y and we both believe in the fundamental human need for self-expression, and everyone's latent expressive desire and imagination. We believe the basic creator base for any content format should be "talented ordinary people."

These underlying beliefs are our native motivation for doing this, and also require us to choose partners with the same beliefs and imagination at the same level.

Shi Yunfeng

Investment Manager, 5Y Capital

Q1

Why invest in ACE Virtual Singer?

Shi Yunfeng: The value of tools has actually been long underestimated. When you provide users with a creation tool, the creativity they can unleash often amazes. The maturation of front-facing phone cameras in 2010 dramatically lowered video creation barriers, giving rise to Douyin and Kuaishou that we scroll through daily. We've been thinking about what technical variable today could reduce creation barriers by 100x — AGC (AI Generated Content) is an answer I believe in. I believe AI has the opportunity to play the role that mobile cameras played in the early mobile internet era.

Music's past state might be: 100,000 people listen, 1,000 people sing, 1 person creates songs. The ratio of "creating music / listening to music" in this world is far lower than other media formats. Music creation has been enormously challenging for ordinary people (like me, who can't carry a tune). But music is one of humanity's oldest and most natural forms of self-expression. Today, ACE Virtual Singer lets ordinary people create music with AI, play with music, express themselves through music — and very interesting and unique AI music styles have emerged.

Most AI applications on the consumer end today are about matching content with people. I hope AI isn't just a "guess what you like" matcher — AI has the opportunity to inspire more ordinary users' creativity in music creation. AI-generated music might be like short video apps in their early days: most people didn't see value in a few seconds of GIF on phones compared to web dramas on streaming sites. But when we stretch the timeline, AI and user co-created music may be what users listen to every day in 10 years.

Currently, in the Chinese AI vocal synthesis field, ACE's AI vocal synthesis technology has reached globally top-tier levels, far surpassing other vocal synthesis engines in singing naturalness, emotional expression, technical advancement, and ease of use.

ACE Virtual Singer provides users with multiple AI virtual singers. In ACE's creation tools, users can input melody and lyrics and select AI virtual singers to perform. Combined with ACE's BGM library, smart scale, and flexible "playing voice" feature, users can intuitively "feel out" a song with accompaniment even without music theory knowledge. Additionally, ACE's "lyric-filling" feature lets users create secondary works based on other users' melodies, further lowering music creation barriers.

ACE Virtual Singer has built a complete growth system from zero music foundation to veteran producer, meeting creators' needs to record life and express themselves through music creation while helping creators advance quickly. Among ACE Virtual Singer's quality creators, 60% are first-time music creators. A typical example is creator "Yu Zhang," who started with zero music foundation and has released nearly 30 original works on the platform.

Creator ages range from 15-22. Gen Z has broad music style preferences and strong expressive desire. Songs created on ACE Virtual Singer show unique consumability, with released works spanning ancient Chinese style, ACG, meme culture, and more — encompassing everything. Quality works have spontaneously spread on other platforms to become hits. Every day, 1,000-2,000 new songs are born on the ACE platform, and this number continues growing. Gen Z music creators' talent and explosive power are already evident — among Spotify's top 20 most-streamed artists, Gen Z singers already occupy 5 seats.

Beyond the creation end, ACE also provides community features where users can publish works to share with others, and listeners can find favorite songs in their feed and follow preferred creators. Currently, ACE Virtual Singer has reached over one million cumulative users, also concentrated in the 15-22 age range, primarily post-2000 and post-2005. Users encourage each other and interact frequently on work pages, forming a positive cycle of content creation and consumption.

But ACE Virtual Singer is just a small step toward Beijing Shiyu Technology's vision of "AI-empowered music creation, building a new music creation and consumption paradigm." Next, Beijing Shiyu Technology will upgrade its existing creation tools beyond "AI vocal creation," spinning off ACE's creation tools into a standalone music creation engine, ACE Studio, to better empower Gen Z creators, said Beijing Shiyu Technology CEO Guo Jing.

The Beijing Shiyu Technology team has diverse and creative backgrounds, with engineers from Tencent, ByteDance, and the Chinese Academy of Sciences, a musical prodigy who founded his own label as a sophomore and became Warner's youngest signed creator, and a genius algorithm engineer born in 2002 who started coding at 13.

Traditional music creation tools have numerous problems to solve: almost all DAWs (Digital Audio Workstations) aren't sold in China, require paid local downloads of large sound libraries, and creating music from scratch is too difficult — these issues make music creation barriers extremely high and creation efficiency low. ACE Studio will be based on cloud rendering, using AI empowerment, modularized assets, and real-time collaboration to let more promising creators not be shut out by installation, creativity, and collaboration barriers. ACE Studio aims to become the gathering place for a new generation of music creators.

The combination of ACE Studio and ACE Virtual Singer will thoroughly connect music creation and distribution data pathways, with structured MetaData at the core, changing the existing music creation and consumption ecosystem: creation becomes iterative, copyright becomes verifiable, presentation formats diversify — thereby accelerating music industry evolution, truly enabling quality creators to emerge continuously (rather than waiting ten years for another Jay Chou), greatly enriching music supply and scenarios, quickly satisfying user demands, and fundamentally changing the relationship between people and music.

Clearly, the music industry hasn't seen major change in years, and the path Beijing Shiyu Technology has chosen isn't easy. But as Silicon Valley godfather Peter Thiel said, startups are like guerrilla fighters, preferring to choose remote mountain forests where survival is difficult as their base. For Guo Jing, doing what others don't do, things that are hard to attack, gives him and his team more sense of achievement — this is not only how startups operate, but the essence of entrepreneurship, and determines how high a barrier you set for competitors.

5Y Capital (formerly Morningside Venture Capital) currently manages approximately RMB 32 billion in dual-currency USD and RMB funds. 5Y Capital seeks out, supports, and inspires lonely entrepreneurs, providing support from spirit to all business operations. We believe that if the crazy you in others' eyes begins to be believed in, the world will become a different place.

BEIJING · SHANGHAI · SHENZHEN · HONG KONG

WWW.5YCAP.COM