When AI Translation Becomes a Commodity, Where Does Innovation Begin? A Deep Dive with InnAIO's Co-Founder | Yunqi Capital Attent!on Podcast

云启资本·October 21, 2025

The Product Logic and Overseas Playbook of "New Translation Species"

Cross-border communication is often punctuated by moments of "I don't understand" and "I don't know how to respond." The story of smart translation brand InnAIO begins right there.

Wei Bo, a company founder constantly on the move globally, found himself repeatedly stymied by language barriers. Rather than wait for the perfect product to appear, he chose to build his own solution — a magnetic "puck" translator, along with a suite of online smart translation products.

In an increasingly crowded smart translation market, how do you define a product starting from real needs? How do you innovate on form factor around user characteristics? And on the unavoidable path of going global, how do you find a rhythm that's both composed and efficient?

In this episode, we speak with Cathy Chen, co-founder of InnAIO — a post-95s marketer who crossed over from bone-conduction sports headphones to smart translation — about the growth trajectory of InnAIO's smart translation product matrix, and a vivid, highly referenceable methodology for global brand-building.

Scan the QR code above or follow "Attent!on" on Xiaoyuzhou to listen to this episode

Guests

Cathy Chen, Co-founder of InnAIO

A post-95s marketer deeply rooted in overseas tech product markets. Formerly EMEA Marketing Lead at Shokz. Led the 0→1 launch and systematic growth of multiple headphone products in European markets, with extensive experience in consumer tech brand globalization and overseas marketing.

Linda Li, Managing Director at Yunqi Capital (Host)

Below are edited excerpts from this episode.

Defining Product from Need: The Birth of a "Digital Simultaneous Interpreter"

Linda:

First, help us understand — what kind of company is InnAIO?

Cathy Chen:

For most listeners, InnAIO is probably still a relatively new brand. What we do is provide cross-border language solutions. We're an AI hardware company integrating software and hardware, whose purpose and slogan is "Speak the world's languages in your voice." Currently our company has corresponding product deployments on both the B2C and B2B sides, across hardware and software. Our flagship hardware product is an extremely compact and portable translator that supports over 150 languages with millisecond-level translation — a B2C hardware translator.

Beyond that, we have "Tingyibao" on the software side, which can turn any headphones into translation headphones — no need to buy additional translation-dedicated headphones. On the B2B side, we provide accessible meeting rooms, which can enable thousands of people in the same meeting room to each speak their native language and communicate smoothly.

We hope that through this product layout, no matter what translation needs you encounter, there will be a corresponding perfect solution.

Linda:

I'm curious — why is your brand called InnAIO?

Cathy Chen:

This term is actually quite interesting. Its full form is Innovation, AI, Open, Innovation, encompassing our company's philosophy of innovation. AI is our underlying logic; open represents our open and inclusive attitude toward languages and cultures worldwide. So the combined abbreviation is InnAIO.

Linda:

From your experience, switching from bone-conduction headphones to the AI translation category — at least to me, these seem like quite different product domains. I'm curious what attracted you to this track?

Cathy Chen:

They seem quite far apart, but actually, first, they're both in the 3C electronics category, and they have certain similarities. Whether headphones or AI translation, both can be understood as extensions of organs — both solve problems of hearing and speaking. The niche target audiences for these two products aren't actually that different: first, their price points are in the same range; additionally, frankly speaking, people who need or like both products tend to be older, and from a gender perspective, skew male.

The only difference is that Shokz probably had higher demands around sports, so they expanded along the sports line from that origin audience. InnAIO expands along business scenarios from the same origin audience. And whether bone-conduction or AI translation hardware, in European and American markets they're both still somewhat new tech, so they'll also attract tech-interested people — this user group actually has some overlap. So if you drill down, are they that different? Maybe not so much.

Linda:

That's quite interesting. The next question I'm also very curious about. There are actually quite a few form factors for AI products with translation capabilities now — AI glasses, pendants, brooches. Why did you choose your current form factor?

Cathy Chen:

Assuming our preconceptions about target users hold true, then from their perspective, what kind of device would they be willing to carry regularly? So this question actually determines the product form definition. Things like pendants you just mentioned — they're more decorative, probably with higher female share, or even if males are willing to wear them, it's probably not the style our particular group would prefer.

Conversely, our current magnetic portable form factor — now that Apple has Magsafe chargers, our little puck is exactly the same size, so it looks like a very lightweight wireless charger that can attach to your phone anytime. Although glasses are also an everyday solution, frankly speaking, if we made glasses, the hardware solution cost would be very high. For example, one problem with current AI glasses on the market is how to get people to wear them comfortably and imperceptibly all day, like their regular glasses. But solving this at a company's early stage would also be massive hardware investment.

So for InnAIO's first product, we chose the more portable magnetic puck.

Linda:

What about headphones? Headphones are also a hardware form factor that people are relatively accepting of now.

Cathy Chen:

My old job was headphones after all, so I do have some authority to speak here (laughs).

First, headphones have the same problem as glasses. If you look at current translation-focused headphones on the market, because they need to handle large translation volumes with good translation speed, their size won't be small, so fit is definitely not good enough. Shokz's Openear line's first selling point was comfort — why make comfort the first selling point? Because most headphones on the market don't solve comfort well.

We feel that an AI translator is ultimately a tool; if you add headphones, you attach other attributes. Then first you have to solve comfort, then occasionally you want to listen to music? Need to solve sound quality? It keeps adding layer by layer. So for a just-starting company, finding a solution with lower solution cost is actually the better approach.

Linda:

That's a very clear perspective. Because you may be partially targeting the B2B market, on the B2B side you use convenient methods as much as possible, while on the B2C side you need to satisfy comfort more. So you chose to make a platform-level software that adapts to various comfortable headphones.

Cathy Chen:

Yes, so when we feel we can't yet provide a very good headphone solution, we're more willing to provide a software solution. Actually, the core of our overall product layout and R&D still revolves around users' ultimate needs, looking at what they actually want.

Technology and Differentiation: Voice Cloning, Market Positioning, Big Tech Challenges

Linda:

Having said all that, please also help explain to everyone how AI translation devices actually work. Some of our listeners may be non-tech audiences who are curious about this. Compared to traditional software services, what are the advantages of hardware products?

Cathy Chen:

Talking about the technology板块 isn't really my expertise, but the upside is I can explain it simply, like everyone else —通俗地讲一讲 some of AI translation's working principles, or its unique advantages.

Back to the first question: how does an AI translation device actually work? We can imagine it as a digital simultaneous interpretation expert proficient in multiple languages. The first thing it needs to do is go from sound to text — it needs to recognize, to hear clearly what you're saying. This involves some speech recognition technology. After hearing clearly, the second step is to understand what you're saying and translate it. This core step relies on natural language understanding or machine translation technology.

The final step is to have it speak the words out, which uses some speech synthesis technology. Through these several steps, you can最终实现 a real-time translation output.

Linda:

Very clear — voice capture, voice recognition, then translation to synthesis. What role does the large model play here?

Cathy Chen:

The biggest use is naturally in the language processing and machine translation stage I just mentioned. Because after integrating LLM large models, this simultaneous interpretation expert can understand more complex logic, handle professional terminology or even cultural implicit meanings, and thus infer and guess what you're saying. This makes its translations less stiff and more human-like.

Linda:

So do you have a small model layer above your large model?

Cathy Chen:

Yes, overall our model is actually self-developed. Because we have a very large corpus, we use our own self-developed corpus. It's just that on the large model side, we call GPT (overseas) and DeepSeek (domestic) — we call semantic understanding models. So it's both self-developed model as the base, while also calling upon some large models.

Linda:

You mentioned voice cloning earlier — I'm curious, first, in what scenario did you think of having this function? And how is this technology implemented?

Cathy Chen:

This is also a very interesting story. Actually, this product was born from our founder Wei Bo's need to solve his own communication problems. Because Wei Bo needed to travel around the world, communicate with different clients, especially go to places like the Middle East to communicate, so language became a relatively difficult problem for him. But there didn't seem to be any product on the market that could solve this pain point — so what to do? Build it himself.

Actually, foreigners like to send voice messages on WhatsApp when communicating, and unlike WeChat, WhatsApp doesn't provide voice-to-text functionality. So the other party also hopes to receive voice replies. Based on this premise, our product first had a cross-app translation function — enabling direct voice translation on social apps, and outputting your own voice, directly converting it into the corresponding language's voice to send, achieving this cross-app translation function.

Speaking of which, our voice cloning function comes in — because you're two people communicating on a social app, if the other party has a normal, their-own voice, while yours is AI voice, it makes the communication very unnatural. Is there a way? It seems like you're really using the other party's language to communicate with them in real-time on these social apps. With this idea, the voice cloning function was born.

Linda:

That's quite interesting. About your product, I also saw a video that left a deep impression — this year's Shenzhen Two Sessions used your product. From this Two Sessions scenario, can you tell us how you discovered this government affairs scenario? And including how different people have different scenarios — give us the overall picture of these differences.

Cathy Chen:

This first goes back to the accessible meeting room I mentioned on the B2B side, or even our individual-end translator — we feel there are unique advantages in such scenarios. Because although such meetings generally have simultaneous interpretation, anyone who's experienced it knows that to use that interpretation equipment, first you have to wear those headphones, and some people don't like how the sound comes through.

If we provide a more personalized solution, perhaps they would prefer it more — so that's how we made our appearance at the Two Sessions.

As for whether the needs of people at the Two Sessions or similar meeting scenarios differ from ordinary people? The answer is definitely yes, because in such relatively serious government affairs occasions, compared to functions like voice cloning or cross-app translation, what people value most is actually translation accuracy and translation speed.

Linda:

So behind the need for precision and rapid response speed, what's the biggest challenge for you?

Cathy Chen:

Our challenge comes from professional vocabulary in government affairs scenarios, much of which is relatively rarely covered in our large language models. Because our large language model sources are more from C-end users' daily language. So at this time, we need to import this portion of lexicon on top of our existing model, thereby improving the reliability and accuracy of corresponding vocabulary.

Linda:

You also mentioned that you have software products to adapt to different comfortable headphones. But now giants are also releasing such software built into phones — of course some giants can only adapt to their own headphones for now, but I believe as software matures, they'll likely open this software ecosystem too. I wonder how InnAIO thinks about this point?

Cathy Chen:

Actually, I have a small question mark about this point. Because as far as I know, for example, the prerequisite for using their translation earphone software now is that users first purchase their headphones — so this actually has a precondition. But back to the original question: are users willing to separately purchase a pair of translation headphones? Or is translation headphone just a single need, so using their own headphones combined with a more economical software solution can satisfy their needs? This question actually needs to be validated by the market.

Linda:

I very much agree — this is a market state of form. For example, Apple is now also高调地推出 translation functions integrated at the system level, though this can only be used overseas for now. But when I saw this function, I also found it quite impressive, because it has completely achieved free plus universal translation services through built-in apps. I don't know from InnAIO's business model perspective, what do you think?

Cathy Chen:

Answering this question depends on whether we choose to enter a larger market covering more people, or a more focused, vertical market? Assuming the latter, then Apple's entry is actually somewhat good for us in a sense. It's the same situation as my previous company Shokz — their open-ear headphones also face competition from Apple headphones, and they too would worry about whether there's impact, especially after Airpods 4 provided a semi-open-ear with noise cancellation solution.

But if our origin audience, or most users who come to buy our products, are mostly coming for some vertical needs, then they won't choose a solution with broader applicable scenarios but requiring some compromises. Rather, at our current stage, the AI translation market is still completely unopened — there's still a large market population that needs education. So big brand entry can, on one hand, help educate this market for you.

Then at the consideration stage, who do users ultimately choose? That's a battle of product strength. If one day, when Apple headphones can also achieve our many languages, and all the unique selling point functions we're pushing now, then perhaps there truly won't be room for us — but this is actually quite unlikely. Most likely, translation will only become a highlight and value-added function of Apple Airpods. But they won't build specialized products around this scenario, so there will still be room for vertical audiences.


Going Global Methodology: From 0→1 to Systematic Growth

Linda:

You've already started the going global process. Where's the first stop?

Cathy Chen:

Our first stop is definitely Europe and the US. If we're talking about overall volume by single country, the US is definitely the largest, so the US is unavoidable. And the US actually has quite a few regions that speak Spanish or French, so overall market volume is also large. But Europe is actually the region with the largest overall translation device market volume — it's just that when starting in Europe, you can't cover every country at once, you can only select some key countries to start slowly expanding outward. This led to our initial strategy of entering Europe and America first, which are relatively similar in overall marketing approach and content preferences, so resources can be more concentrated.

Linda:

I went to the Middle East last year and discovered that people in different regions there, even just 100 kilometers apart, speak different languages — the translation need is truly quite headache-inducing.

Cathy Chen:

Yes, we're actually also working on the Middle East, just with different approaches. This revolves around one of our biggest strategic questions: simply put, online first or offline first? We ultimately decided after consideration to start from online and gradually辐射 to offline, because offline's startup and overall volume scaling is relatively slower, so we definitely hope online can scale quickly and provide certain backing for offline.

Based on this logic, European and American markets are relatively easier to do online, and also more homogeneous — frankly speaking, the largest volume is Amazon, and from a brand perspective, plus independent sites, the two most important channels, with TikTok as some derivative channels. But the Middle East is different — for example, Amazon's market share in the Middle East is almost very small, meaning to do online in the Middle East requires rebuilding a whole system and team from scratch, with higher costs.

Linda:

Then regarding overseas marketing approaches for different countries, can you give us a framework to quickly understand?

Cathy Chen:

Actually, marketing approach goes back to the 5A model that marketing friends often hear: Aware-Appeal-Ask-Act-Advocate — these stages. Differences between countries lie in their placement channels and content preferences, but what's common is that we need to meet users' different needs at these different stages — this is the same.

Take a simple example: the US market I just mentioned. For instance, something everyone may be more familiar with domestically — Xiaohongshu or Douyin live commerce — this is relatively common domestically, but in the US market this form is relatively rare. There are some带货的红人 on TikTok, or so-called带货红人 on Instagram, but their带货属性 isn't as strong as domestically.

In the US, what are the key channels? From a marketing perspective, on the promotion side, what matters more are KOL marketing, PR marketing, BD marketing, and social media marketing. Beyond this, in private domain user operations, EDM marketing and affiliate marketing also emerge. These are basically the marketing channels you'll definitely touch when building a brand. But depending on different brands' different stages and missions, which specific marketing channel they focus more on may differ.

On top of the six marketing channels I just mentioned, in recent years overseas the重中之重 is called Digital Marketing — simply put, ad placement. Why is this the重中之重? Because it's the only marketing channel that can be directly associated with ROI, and also the channel many brands invest most heavily in during early stages.

So our so-called integrated marketing looks at which promotional channels need to be matched for specific marketing purposes, and what content needs to be placed in these channels. This kind of排列组合 approach integrated together is called an integrated marketing plan.

What I just said already covers mainstream European countries — for example, our traditionally defined top three: Germany, UK, France — using this approach won't have major surprises. But European markets have a specificity: their offline share is very high.

Germany and UK are still okay, but France, and Italy, Spain that we didn't mention — their offline share even exceeds online, which leads to different marketing approaches. You can understand online as just seeing what排列组合 you make, but for some European markets, especially non-first-tier markets, offline marketing approaches反而 become the重中之重.

Linda:

What kind of offline operations are there?

Cathy Chen:

Offline is more偏向 some channel marketing. Going back to what I said — in these countries, especially older users really like to browse stores offline, so how are your offline channels arranged? Or do you have some channel resources? These become quite important links. Of course these come after the brand can enter offline stores, so generally we wouldn't choose to do such markets in the first stage, because store entry itself also requires quite a bit of time.

For the full episode, subscribe to the "Attent!on" channel on Xiaoyuzhou app~