AI Will Change Education, But Education Won't Be Only AI | MonoX

Monolith砺思资本·September 1, 2025

The bottleneck in learning is no longer knowledge itself, but rather willingness and focused attention.

Right now, education is one of the most active frontiers for entrepreneurs riding the AI application wave.

AI is opening up new possibilities for education: from intelligent teaching assistants and personalized learning path recommendations, to automated content generation and multimodal interactive instruction. But this track is no smooth ride. The path to real-world deployment is littered with hurdles — technical performance bottlenecks, user acceptance barriers, business model dilemmas — that practitioners must tackle one by one.

Recently, the MonoX salon series hosted an offline event themed "How AI Is Transforming the Education Industry." Dozens of guests from the education sector gathered to explore the practical road ahead for AI-powered education.

During the event, guests discussed the following topics:

  • Engineering bottlenecks and solutions for AI deployment in education scenarios
  • Shifting user learning needs and enhancing engagement
  • Experimental exploration and observation of new education models
  • Commercialization challenges and real-world obstacles in AI+education
  • Future-oriented technology evolution and industry ecosystem predictions

We've compiled the core takeaways from this discussion, hoping they prove useful for fellow practitioners.

Table of Contents:

  1. Engineering Bottlenecks and Solutions
  2. Shifting User Needs
  3. Experiments and Observations on New Education Models
  4. Commercialization Challenges and Real-World Obstacles
  5. Future-Oriented Technology and Ecosystem
  6. With Giants Entering the Fray, Where Is the Future of AI Education Startups?

1. Engineering Bottlenecks and Solutions

1.1 Beyond Chatbots: Multimodal and Performance Challenges

Empowering education with AI first requires clearing the technical threshold. Many early attempts focused on conversational AI tutors, but real teaching scenarios go far beyond Q&A chat — they involve visual, voice, and other multimodal interactions. For example, photo-based problem solving and homework grading are common needs, yet most education devices (like student tablets) have limited on-device computing power, making real-time local recognition and selection of test questions difficult.

The industry's practical workaround is to shift computation to the cloud: leveraging more powerful servers to run advanced models (such as YOLOv8), with image enhancement and correction to improve recognition accuracy. Meanwhile, because domestic textbooks and workbooks come in complex, varied layouts, handwritten OCR recognition also presents challenges — general-purpose APIs achieve only about 65%-70% accuracy in formula-heavy, Chinese-English mixed scenarios. The image below shows a page from a People's Education Press junior high math textbook. With its diverse layout and abundance of symbols and graphics, OCR already struggles; add handwritten answers, and the difficulty increases further.

Complex layouts of domestic textbooks and workbooks

To improve results, developing proprietary models is a viable path. Some teams use newer object detection models (like YOLOv10) to accurately locate question areas, combined with Vision Transformer (ViT) and other multimodal algorithms to classify question types, boosting recognition precision.

1.2 Voice Interaction: ASR and TTS Customization

In oral practice and conversational teaching, the technical details of automatic speech recognition (ASR) and text-to-speech (TTS) directly shape product experience.

One particularly salient issue is voice recognition for child users: student responses are typically short, accented, and surrounded by environmental noise. When a student answers a simple "no" in a noisy home environment, calling off-the-shelf model APIs often yields poor results.

Some startup teams have tackled this scenario by training their own Whisper-like models, combining speaker identification, noise reduction, and text correction modules to push recognition accuracy to 96.5%. On the synthesis side, a key challenge when AI voices educational materials is correctly reading formulas, phonetic symbols, pinyin, and other special content — this requires deep model customization. Some teams employ multi-token prediction and other techniques to accelerate synthesis speed and improve reading of special symbols.

1.3 Agents: Teaching Strategy Training

This year's AI sensation, Agents, has also generated considerable anticipation for education applications. So what does a genuinely useful education Agent need?

The consensus: It should act like a human teacher — not just answering questions, but calling up courseware, reviewing past learning data, even sketching diagrams on the fly to explain difficult points. This means the Agent needs to learn teaching strategies, and the effective way to train such strategies is using high-quality one-on-one human tutoring data for targeted training. Many teams are already accumulating this type of data, hoping to make AI teachers behave more like excellent human educators.

Overall, the current engineering approach to AI+education is pragmatic and multi-pronged: addressing bottlenecks across different modalities and segments with targeted optimizations rather than relying on one general-purpose large model to solve everything. When on-device computing falls short, cloud supplements it; when general models have blind spots, specialized proprietary models fill them; when teaching scenarios demand it, dedicated AI agents are built — working through technical constraints one by one.

2. Shifting User Needs

As technology matures, the competitive focus of education products is shifting. One particularly striking view from this discussion: in the AI era, the core of education is no longer knowledge transmission, but learning engagement.

Specifically, when AI can handle vast amounts of knowledge work, the value of pure knowledge delivery relatively declines; how to get students to invest time, focus, and persist becomes more critical. AI dramatically accelerates information access, but focused learning time remains the bottleneck for learning outcomes — and AI itself cannot directly solve student concentration problems.

Therefore, rather than endlessly supplying more content, it's better to increase the fun and interactivity of the learning process, thereby extending and strengthening students' learning investment.

This brings a series of shifting user needs: students (and parents) are moving from expecting education products to "give me knowledge" to "make me want to learn."

This explains why gamified learning carries such high hopes in the AI era — it can ensure users' focused engagement time through increased fun. Duolingo's success proves this point: as a language learning app company, it is essentially more of a gaming company, using points, levels, and other mechanics to get users "hooked" on practicing.

Duolingo

On this topic, some entrepreneurs believe that simply embedding AI technology into traditional institutions' teaching workflows (the typical B2B model) often fails to produce products that demonstrate AI's advantages; the real opportunity lies in B2C — directly reaching learners with AI-native, entirely new experiences.

Of course, this doesn't mean AI should completely replace humans. On the contrary, in K-12 stages requiring high emotional companionship, pure AI virtual teachers are not the optimal solution.

The likely mainstream future form is a "human + AI" hybrid model: AI handles standardized knowledge delivery, personalized practice, and data analysis, freeing teachers from repetitive tasks; while human teachers focus on emotional support, learning habit cultivation, and guidance at critical junctures. This approach leverages AI's efficiency and precision while preserving human teachers' irreplaceable roles in motivation and supervision, better satisfying students' and parents' psychological need for "someone to accompany and oversee."

Worth noting, AI may also reshape the supply-side structure of education services, further responding to new user demands. Someone proposed a community-based "Lianjia model" vision: in the future, every community would have a "district teacher" who, powered by a robust AI teaching platform, simultaneously serves hundreds of children (where previously they might only handle a dozen or so), directly reaching users through offline neighborhood trust without relying on expensive online customer acquisition. This way, parents get localized services with a real person present at high value-for-money.

Looking at broader markets, AI technology lowers the barrier to education supply. With AI-powered real-time translation, automatic lesson plan generation, and other capabilities, people previously constrained by language or expertise may become capable of teaching. For example, a mom in a third-tier city with standard Mandarin could use AI tools to teach Chinese to foreigners online and earn income. This would vastly enrich education supply and open new markets through cost advantages.

"Lianjia Model"

3. Experiments and Observations on New Education Models

To achieve stronger learning engagement, numerous teams are experimenting with new teaching models. Here are three representative experimental cases emerging in the industry:

3.1 AI Companion Learning

One startup team built a prototype AI teacher service loop: first analyzing large volumes of one-on-one human class recordings, then interacting with students via AI speakers after class and pushing practice questions through WeChat, achieving linkage between in-class and after-class.

With this model, they gained tens of millions in revenue within just 7-8 months through private-domain operations in their early startup phase. However, the team found that in this form, the AI teacher lacked proactivity, only connecting with children in limited scenarios and struggling to form long-term emotional companionship.

To strengthen companionship, the team envisioned introducing innovative hardware form factors, combined with AI digital humans replacing human teachers for repetitive, low-complexity segments like "read-along practice," allowing human teachers to focus on core instruction. Through such software-hardware integration, they aim to build tighter emotional connections, ultimately getting children to learn proactively rather than passively completing tasks.

3.2 Entertainment-Based Learning

This model draws on current short-video culture and immersive entertainment experiences, embedding learning content within泛 entertainment scenarios.

For example, some teams are trying "Douyin-style learning" — presenting authentic, interesting Chinese conversation scenarios in short video feeds, where users can watch and learn one clip after another just like scrolling Douyin. Another approach is "role-play (Cosplay)" learning: users enter preset video storylines, choose to play one of the characters (such as playing the emperor in Empresses in the Palace), and engage in immersive dialogue practice, even inviting friends to join the performance.

These creative approaches gamify and socialize the learning process, letting users unconsciously practice language and other skills while being entertained. More importantly, AI is deeply embedded at every step: when users encounter unfamiliar content, they can summon AI anytime for contextually relevant explanations. This deeply integrated model ensures coherent, immersive learning experiences.

3.3 Learning Through Play

The third path blends hands-on practice with AI guidance, making the learning process itself as fun as playing a game.

One entrepreneur shared a case: a child independently completes a Lego building course on iPad, guided by an AI Agent. Throughout this process, AI provides instant 3D building step guidance, while also recognizing the child's free creations through camera and giving emotionally rich feedback and interaction. The core lies in creating an empathetic learning companion: AI uses visual and expression recognition technology to monitor the child's emotions, proactively offering hints and encouragement when they encounter difficulties, helping the child enjoy the hands-on exploration process rather than getting frustrated and giving up.

To achieve this, the team adopted a multi-agent collaborative architecture: different Agents respectively handle curriculum planning, difficulty adjustment, emotional support, creative stimulation, and other tasks — each with its own role while coordinating together. This product uses a B2C subscription model, developing hundreds of different courses around a single Lego teaching set, substantially enhancing the value and lifecycle of physical toys.

Lego in Education

4. Commercialization Challenges and Real-World Obstacles

Despite the endless stream of new AI+education products, actually making the business work requires confronting commercialization challenges head-on. The industry currently faces at least three major dilemmas.

4.1 High Customer Acquisition Costs

Acquiring student or parent users remains prohibitively expensive.

When a startup needs to scale, it has little choice but to buy ads on traffic platforms like Douyin and Baidu, resulting in very high customer acquisition costs (CAC) per user that severely compress margins and pose an existential challenge to company survival.

Many teams try private-domain traffic (community operations, referrals) for cold starts in early stages; while effective at small scale, private domains struggle to support exponential user growth. Even looking overseas, advertising on Meta (Facebook) and similar platforms is similarly costly with poor ROI.

Fundamentally, whether burning money on user acquisition is sustainable depends on whether lifetime value (LTV) can cover those high acquisition costs. If product retention and paid renewal rates are strong enough, early ad spend is worthwhile; but the reality is that most AI education products currently don't generate sufficient user lifetime value to support acquisition costs that routinely run into hundreds of yuan per user.

4.2 Parent Acceptance Barrier

Even with mature technology and products, getting the ultimately paying parents to buy in involves a cognitive gap.

Many parents are inherently skeptical of "pure AI teaching," even holding a deep-rooted belief — "without a real person, the service isn't worth much." They're willing to pay premium prices for services that may not be particularly effective but offer human companionship (like "one-on-one vocabulary memorization accompaniment"), because real people provide certainty and a sense of supervision.

By contrast, when seeing AI teachers in class, no matter how interactive or entertaining, parents tend to suspect: isn't this just repackaged recorded video? So they struggle to willingly pay high prices.

In the past, the "dual-teacher classroom" model succeeded largely because the live-streamed celebrity teacher satisfied parents' aspiration for top-tier educator resources, while the辅导 teacher's offline follow-up built emotional connections with students and boosted renewal rates. The dual-teacher model precisely hit parents' psychological expectation of wanting both results and companionship.

At root, parents buying education services are purchasing a "guaranteed outcome" (such as grade improvement, admission slots), and pure AI products currently struggle to provide outcome guarantees or sufficient trust.

Dual-teacher classroom scenario

4.3 Products Still "Not Quite There"

The final dilemma is that current industry products may simply not yet be compelling enough to win users over.

One guest mentioned a striking phenomenon: certain educational smart hardware products (like learning tablets heavily promoted on Douyin) have return rates as high as 40%. In other words, even if ad placement data suggests customer acquisition is profitable, massive user returns and refunds after trial actually leave the books in the red.

Such alarming return rates reflect a huge gap between product experience and user expectations — users feel it's "not worth it" and return en masse. Without solving real user pain points and improving genuine product口碑, even the best concepts and technical selling points will ultimately be drowned out by the market's real vote.

AI learning tablet

5. Future-Oriented Technology and Ecosystem Predictions

Despite the difficulties, we can still think multidimensionally about the future of AI reshaping the education industry.

5.1 Start with Sales

One entrepreneur proposed that since human teachers remain irreplaceable in the short term for high-end niche markets, rather than obsessively making AI into a teacher, it's better to first apply AI to empower non-teaching segments and get the commercial loop running.

A typical approach is using AI for sales conversion: targeting middle-class families with annual per-customer prices of 30,000-40,000 yuan, one team described their developed AI sales system — capable of automatically completing the entire process on WeChat for Business, from adding new customers, introducing course products, answering questions, negotiating prices, to ultimately guiding payment for trial lessons.

This essentially has AI assume work previously done by ground sales consultants and course salespeople, solving chronic problems of traditional sales teams like "difficult management, high training costs, talent turnover, unstable performance" — and reportedly, this team's AI sales conversion rate exceeds human sales averages.

Many entrepreneurs have already achieved阶段性 results — boosting automation and intelligence in operations, marketing, and service processes outside the teaching segment itself, and will continue increasing investment in this area going forward.

5.2 Going Global

Another trend is looking toward overseas blue-ocean markets. As domestic education competition intensifies and regulation tightens, numerous teams are exploring using AI technology advantages to expand overseas, applying Chinese AI and content to meet global users' education needs.

One particularly notable direction is AI-powered Chinese language education going global.

Globally, demand for learning Chinese is exploding due to trade and employment opportunities — many foreigners' motivation for learning Chinese directly comes from "making money," such as wanting to do business with China or work in Chinese enterprises. Yet high-quality Chinese teachers overseas are currently both scarce and expensive, while online live human classes are constrained by language barriers (requiring English intermediation) and carry high prices.

In response, some startup teams are trying to build "hyper-realistic AI Chinese tutors," offering native language + Chinese bilingual teaching at accessible prices. Their user acquisition method is also uniquely efficient: having international students in China serve as hosts for bilingual teaching livestreams on overseas TikTok, gaining large volumes of precise, free user traffic, then transforming these student host personas into digital human AI teachers within the app, forming a trust loop from acquisition to conversion. Simultaneously, by concentrating on specific population scenarios (like Southeast Asian or Middle Eastern markets), they accumulate massive real-world corpora to train highly precise vertical AI models, building first-mover advantage.

5.3 Start with L2-Level Products

In technology evolution and market博弈, education startups are also rethinking product positioning and roadmap. Startups must avoid falling into the trap of "proving their technology is the coolest," and instead adopt a pragmatic, incremental strategy.

This means products don't need to pursue the fully realized AI teacher from the start, but can begin by satisfying刚性 needs, even if capabilities are only equivalent to "Level 2 autonomous driving" — solving 80% of typical scenarios, with the remaining 20% iterated gradually. This way you survive while creating the future, continuously refining AI models through real business, iterating with user data rather than burning money in isolation building "perfect AI."

Some entrepreneurs also propose that in market positioning, they define their product as a "full-process solution" rather than a pure teaching辅助 tool. For example, creating an AI learning app that covers the complete chain from diagnostic testing, customized learning plans, knowledge delivery to practice feedback brings user value closer to an online school. This gives companies pricing confidence to benchmark against expensive offline training classes, rather than falling into the trap of making cheap utility apps.

In short, product forms that can make money and stand firm should be the optimal combination of AI technology and actual teaching needs — beware of flashy concept stacking for its own sake.

6. Where Are the Opportunities in AI+Education?

Looking ahead 3-5 years, most believe AI's reshaping of the education industry will be a gradual integration process, rather than a sudden dramatic scene where some super AI teacher comprehensively replaces existing education models.

Moreover, AI's positioning will differ across ages and domains: high-intelligence AI teachers may first demonstrate tremendous value in high school and above education, adult vocational training, and other markets of mature learners. These users are self-directed with complex learning needs, where AI's powerful intelligence can truly shine.

For K-12 students, the bottleneck often isn't knowledge itself but learning habits and focus. For them, a well-produced interactive recorded lesson may already be sufficiently effective; what's truly lacking is offline supervision or learning atmosphere, which may need to be supplemented through physical scenarios like smart self-study rooms.

What's foreseeable is that human teachers will continue playing important roles in nurturing people for a considerable time to come, but AI will become an omnipresent productivity tool: it may not replace human care and inspiration, but can dramatically reduce the costs of content production and personalized services.

Today, the cost of producing a complete course with AI may be just a few percent of what it used to be — for example, 200,000 yuan can accomplish courseware development that previously required 50 million yuan in investment. This radical cost structure transformation will redefine entry barriers and competitive dynamics for education products: well-funded, technically strong teams can rapidly expand content territories, while smaller, niche teams also gain opportunities to enter细分 markets due to lowered costs.

What cannot be ignored is that giants like OpenAI and Google are also deeply embedding general large model capabilities into education through their products, even launching dedicated education versions.

ChatGPT education model

When giant foundation model companies with more powerful technical bases, more massive data, and unmatched brand influence enter the arena, we must ask: where exactly lies the core competitiveness of startups?

The answer probably isn't in technology — at least no longer in prompt engineering or foundation models themselves. Any simple Chatbot-style education application will face marginalization risk from base model companies.

Some answers are already emerging in entrepreneurs' minds, and reflected in the discussion above. We will continue following this space, and look forward to seeing more genuinely grounded innovative explorations in the future.