ZhenFund-backed "Mindverse" Closes $50 Million Series A to Build Continuously Learning Agent Models

真格基金·June 2, 2026·53·30

Let intelligence truly understand every unique soul.

How do you build an advanced agent model that keeps learning?

As the intelligence ceiling of large models keeps getting pushed higher, the definitive answer to "continual learning" still hasn't emerged.

Recently, Mindverse completed its Series A funding round led by Meituan, with Yuanhe Puhua, Shokz, Variable Capital, and existing investors also participating, bringing total funding to nearly $50 million. In 2024, ZhenFund became the company's first investor from day one and has backed it ever since.

"True agent capability doesn't come from clever prompt engineering — it comes from post-training." Mindverse is one of the few startups betting on the "inside" of the model. Building on top of general-purpose large models, it uses reinforcement learning to teach the model how to get things done through complex, multi-step real-world tasks, transforming it from "knowing a lot" to "getting things done well."

The key to achieving continuous evolution lies in LoRA technology. Think of it as attaching countless lightweight "skill modules" to a powerful shared brain — each module uses minimal parameters, can be updated independently, and remains isolated from others, allowing the model to continuously accumulate memories and capabilities for specific users or scenarios at extremely low cost, rather than retraining from scratch every time.

Three years ago, when the entire industry's attention was still fixed on pre-training, Mindverse founder Andrew wrote a judgment in a paper co-authored with Shunyu Yao that almost no one agreed with:

Agent capability ultimately has to come from model training itself, not from piecing together prompts and frameworks.

Early internal research meeting at Mindverse

Three years later, as industry attention has shifted from pre-training to post-training, this company discovered that the path it had been quietly walking had placed it right at the center of the tide. Soon they will open-source their trained 750B agent model, which will also become the world's first reinforcement learning post-training result completed on GLM 5.1.

They saw this coming early, yet the team remains small. Mindverse's core R&D team numbers about 20 people, with members from DeepSeek, ByteDance Seed, xAI, and backgrounds from Tsinghua University, MIT, and Duke, having collectively published over 200 top-tier conference papers.

The two founders, Kaijie Chen and Andrew, have been working together since 2018, when they both took leave from school to start a business, worked on robotics, ran a lab, then returned to school separately, and came back together in 2023. Andrew grasped the technical paradigm from agent training to advanced agent models. Chief Scientist Xiaoteng Ma brought a decade of reinforcement learning expertise, while Kaijie Chen focused primarily on applying business models and judging user value.

In this conversation with Kaijie Chen, we wanted to understand: How do you use post-training to build a model that's cheap, useful, and keeps growing?

The Second Half of Model Improvement Lies in Post-Training

Q: Over the past year or two, industry attention has visibly shifted from pre-training to post-training. When did post-training truly become important?

Kaijie Chen: Today the boundary between pre-training and post-training is increasingly blurred, with pre-training stages also mixing in large amounts of agent trajectory data. But broadly speaking, we can still distinguish them this way: Pre-training mainly uses internet data to establish basic understanding of the world, while post-training converts that understanding into concrete capabilities.

The real inflection point probably came when DeepSeek released R1. That was the first time the industry saw reinforcement learning systematically drive large model capability improvement, and also when post-training's status began rising rapidly. Before that, post-training might have consumed only 3% to 5% of the compute used for pre-training. Now, the vast majority of model capability progress happens during post-training.

One important reason is that the industry has begun accumulating data that didn't previously exist on the internet. Products like Claude Code are generating large amounts of agent trajectories from real tasks that are being preserved and becoming an important foundation driving continuous post-training evolution.

Q: What does post-training actually solve for models? Is it capability, alignment, or teaching it to "learn to do things"?

Kaijie Chen: It's about enhancing real-world task capability on top of the foundation of "basic understanding of the world." Pre-training gives it knowledge and a worldview, but someone who knows a lot of things isn't the same as someone who can get things done — post-training fills in that second half: how to apply existing understanding in real tasks and apply it correctly.

And there's an even more future-oriented form of this, called continual learning. What we want to create is a training method that lets models evolve and improve at very low cost, learning new knowledge and taking on new tasks, while also forgetting knowledge and tasks that are no longer needed, with performance improving gradually. It doesn't become fixed after training, but continuously updates itself while operating in real-world scenarios.

Q: You decided to solve this through training, and you moved on it very early. The 2023 FireAct paper proposed that "agent capability comes from training, not prompts" — that was still a non-consensus view at the time. Why were you willing to bet on it so early?

Kaijie Chen: This judgment was connected to my second entrepreneurial experience after leaving school. At the time we were building AI games, using GPT-2 and GPT-3 era models. The capabilities were very limited, but we already needed to build an AI world that would constantly change with user behavior — essentially constructing complex workflows.

During those two years, we clearly saw one problem. A single step with 95% success rate seems very high, but when you chain more than a dozen steps together, errors keep compounding and ultimately destroy the entire experience. Long-horizon tasks can't rely solely on prompt stitching; capability ultimately must be acquired through training. What people today commonly call trajectory is essentially a continuous thought-and-action sequence.

Later, Shunyu Yao proposed ReAct, organizing thinking and action into a continuous trajectory. When we saw this, we actually felt very much in sync. From that point on, we became increasingly convinced that agent capability would ultimately return to training itself. After Andrew published the FireAct paper with him, we decided to continue along this path, establish a company, and make it happen.

Q: This path inevitably goes through LoRA. But most people's impression of LoRA still stops at "adding a filter to an image." It's clearly not playing that role for you. How should we re-understand it, and what's its relationship with reinforcement learning?

Kaijie Chen: We initially chose LoRA for very practical reasons — it's an extremely cost-effective training method.

You can understand it as an adaptive adapter: instead of touching the entire model, it extracts the most critical parameters and trains those, using very few parameters to approximate the full model's training effect. Because we only had dozens to a hundred GPUs in our earliest cluster, this constraint forced us to squeeze every drop of efficiency from our compute.

But today LoRA has evolved into the technical foundation for building continual learning — it's what enables model capabilities to be continuously carried and updated.

LoRA and reinforcement learning are two separate things with distinct roles. Reinforcement learning is the primary method in post-training, responsible for actually training out the model's capabilities. At the trillion-parameter scale, both reinforcement learning and adapting LoRA to it are difficult, but both are unavoidable.

Q: What was the real turning point in your research? We noticed a subtle detail — almost simultaneously, Thinking Machines in Silicon Valley was working on the exact same thing.

Kaijie Chen: The turning point came around September 2025.

We discovered that using LoRA for reinforcement learning on sufficiently large MoE models incurred no performance loss. A lightweight low-rank method for updating the model achieved identical results to updating the entire model's full parameters. This meant we could achieve the exact same results as full-parameter training at 1/10 the cost. It went from being a tradeoff between performance and efficiency to a monotonic optimization of efficiency.

Our first reaction upon getting this result was to doubt ourselves. It wasn't until Thinking Machines subsequently published "LoRA Without Regret" on September 29th, with conclusions completely consistent with ours, that we felt reassured to see independent confirmation of the same thing.

By late December, we had completed trillion-parameter LoRA reinforcement learning, publishing around the same time as Thinking Machines. Globally, only two organizations could do this at the time; this year, adding Fireworks (partner on Cursor Composer's model), there are only three.

Q: You called LoRA "the technical foundation for building continual learning." What does that mean specifically? Why does this LoRA layer become the key component for a model's "continual learning"?

Kaijie Chen: It's a smaller layer on top of the base model. For example, our latest upcoming model is the base model plus this LoRA layer on top — the LoRA layer's parameters are roughly 0.5% of the base model's, though with many of them it would be larger. Because this layer doesn't have many parameters, it's cheap, easy to train, and scalable.

For example, suppose I serve a financial client and first train their stock and market data into a financial reasoning model. Three months later, many things have happened in financial markets, stock prices have changed — what about this new data?

For OpenAI or Anthropic, incorporating these into pre-training again would be extremely expensive, difficult, and enormously costly to mobilize. But for this financial client, because LoRA itself is small enough, they just need to continue training this LoRA, feeding the new data into it. The LoRA's size isn't fixed either — it can be made very, very small, so small that each person has one, a thin slice; training with one person's monthly data might be on the order of tens of dollars. And the largest LoRA, capable of matching full-parameter training results, is only tens or hundreds of thousands of dollars. So it has enormous adjustable space: whether you have little enterprise data or lots, whether you want near-pre-training-level major improvements like learning a new programming language, it can be trained. Individual LoRAs are thin, easy to train, additive, and cheap — that's the first level of meaning for LoRA in continual learning.

Q: If you couldn't use any jargon, how would you explain to an ordinary person what you're actually doing?

Kaijie Chen: We're taking a sufficiently powerful large model base and hanging many "skill modules" on it, allowing one model to simultaneously become thousands or tens of thousands of models each with their own strengths, serving different people, different enterprises, and different scenarios.

That base is the base model, providing the ceiling for general intelligence; those "skill packs" are LoRAs, each carrying a small, specific thing — it could be a user's long-term preferences, a company's business know-how, or the playbook for a certain type of task.

The default approach used to be "one model serves everyone," with everyone sharing the same parameters. What we want to do is the opposite: share one smart base, but let every person and every scenario have their own parameters on top that can keep growing. We call this structure mixture of LoRA.

Q: "Mixture of LoRA" — the name immediately makes people think of MoE, the familiar mixture of experts. Are these two "mixtures" the same thing?

Kaijie Chen: There are things we've learned from MoE, but they're different. In MoE, a single expert can't complete inference on its own; it's more like a computational unit that the model divides internally. But in mixture of LoRA, every LoRA is unique, can be called independently, and corresponds to a clearly defined capability.

For example, say I want to do financial tasks — I can hang 10 LoRAs at once: one learns stock prices, one learns financial reports, one learns risk control... each learns its own thing. Later, if I want to add two new tasks, like tips for Hong Kong IPO lotteries, I don't need to touch those 10 already-trained LoRAs at all. I just add two new LoRAs to learn them, hang them up when done, and the model's capabilities naturally expand by one chunk without affecting anything old. That's why we say it's a structure "naturally suited for continual learning." Because all its capabilities accumulate piece by piece, rather than having to retrain the entire model every time you add something new, risking forgetting old skills. This is the second level of meaning for LoRA — in the continual expansion of mixture of LoRA.

We're also exploring more distant possibilities, like letting LoRAs negotiate and collaborate with each other. Once we have this mixture of LoRA architecture, we'll pay attention to how different LoRAs cooperate, and whether the model's diversity brings better results.

Q: This structure materializes into something concrete — the model you're about to release?

Kaijie Chen: We'll soon open-source our trained model, which natively supports mixture of LoRA. It's a 750B parameter Agent model: 744B of pre-trained GLM 5.1 plus 6B of LoRA. We should be the first team besides Zhipu AI itself to complete reinforcement learning post-training on GLM 5.1.

Doing LoRA reinforcement learning on GLM 5.1 has real engineering barriers. You need to adapt DSA (DeepSeek Sparse Attention) and MTP (Multi-Token Prediction). This model of ours isn't chasing the "knows everything" general base model. It's specifically deep post-trained for agent scenarios, mainly serving generative UI coding, everyday life chat, long-chain reasoning, and tool calling.

Q: You're defining the new model as an Agent Model. How should we understand this term? Is everything everyone invests in post-training ultimately for this?

Kaijie Chen: The latest frontier models are all agent-oriented models.

Take Claude: after it launched Claude Code, model training started using Claude Code data, which is completely different from how we normally use Doubao with its "ask one question, get one answer" pattern. In Claude Code, writing a piece of code is a very long task with lots of interaction in between — it's long-chain data. After training on this data, Claude becomes increasingly "agent native," increasingly adapted to agent architecture, because it was trained on this data in the first place. So models and application scenarios reinforce each other; everyone is evolving in this direction, just at different paces.

We're doing the same thing, just with the scenario placed in everyday life. Macaron is our agent harness. In life scenarios, there are similarly many complex tool calls, code executions, and lots of fuzzy requests where users themselves don't know what they want. We string these into continuous task chains, letting the model get better through training along these chains to improve agent performance.

When we say agent model, we mean: this model is trained to be used in a multi-round agent environment, specifically optimized for this environment. It's still a model, but trained for agent tasks.

What's special about us is that there are almost no models on the market specifically optimized for agent workflows. Large numbers of open-source models domestically are still catching up to the most advanced generation of GPT and Claude, so much of everyone's energy is still on pre-training — how to catch up first, probably not yet having the bandwidth to do the agent part particularly well in post-training.

Claude is gradually doing it and doing it very well, but they have many more topics to manage. We're specifically training models for agent tasks, making them better at agent work: tool calling, memory retrieval, when to hand tasks back to users, when to continue multi-round thinking — all of these it will do better.

In the Model Era, Time Is the Biggest Moat

Q: People first came to know Mindverse from Macaron. You mentioned Macaron isn't just a C-end product, but the model's agent harness. Can you specifically explain how models and products feed each other? How is this different from what people usually call "training models on user data"?

Kaijie Chen: From the very beginning, we've looked at model training and C-end application iteration as one thing. It's not as simple as "build the model first, then use the product to collect some data" — it's a bidirectional loop.

But we have a key difference from many people: we don't directly train on user data. Privacy in life is just as important as privacy at work, yet many people directly train on user data. Our approach is to use user feedback to understand distributions and characteristics in data, then build our own simulation environment and put the model in it to train. We deliberately add lots of noise, interference, and extreme cases, because real user behavior is already very extreme: they'll interrupt midway, change goals, and give wrong or outdated information. A model trained in this environment can withstand the real situations agents actually encounter. And post-training actually needs very little data — tens of thousands or hundreds of thousands of samples is already a meaningful scale. Unlike pre-training, it doesn't need massive volume; what matters more is extremely high data quality.

Conversely, the model also feeds the product. These trained capabilities are deployed directly back into Macaron after training. The ceiling of product experience is fundamentally determined by model capabilities. This is the same logic as Anthropic: Claude's training directly serves Claude Code, and what runs in Claude Code flows back to train the model — just our scenario is everyday life.

Macaron's significance for us isn't having one more product entry point, but providing the model with a real, long-term agent harness and training environment that continuously generates feedback. Macaron now has over 2 million users and more than 100,000 DAU.

Q: You place a lot of emphasis on "generative UI." Isn't it enough for the model to explain answers clearly? Why must it know how to "draw interfaces"?

Kaijie Chen: Having the model return everything to you as text isn't actually a good form of expression.

Humans are inherently visual animals; our perception of graphics is significantly better than our perception of text. For the same information, showing a chart is definitely clearer than writing those numbers into a long paragraph — what this saves is your cognitive load. What Google presented at I/O about omni means the same thing: models should deliver results in richer forms, rather than always dumping a pile of text for you to digest yourself.

On the standard Google defined as A2UI, SOTA isn't just measured by "can the model generate UI," but by "how much cognitive load does the interface it generate reduce for the user." This is especially critical in life scenarios: you ask "what should I eat today," and getting a few directly clickable option cards versus getting 300 words of text — completely different experience. Whether a model can "speak properly" directly determines experience on the C-end.

Q: The benchmarks you've published for the model are also quite interesting — you achieved SOTA on life tasks, but for hardcore tasks like code and math, you explicitly said you'll approach but not chase first place. That trade-off itself is a statement, right?

Kaijie Chen: This choice itself says what kind of company we are.

We particularly agree with Shunyu Yao's view in "The Next Phase of AI": going forward, benchmarks may be the most important part of model training, because what benchmarks you choose is what tasks you want the model to get stronger at.

We picked four: Living Bench is our own definition; Vita Bench is from Meituan — these two target life-category long-chain tasks, like trip planning, which sounds simple but actually involves many steps and personal preferences; A2UI is Google's proposed generative UI standard; PinchBench is a benchmark commonly used overseas to characterize agent task performance for things like OpenClaw. We achieved SOTA on all four.

Customer service, coding, pure math — these traditional tasks matter to us too, but they're not where we most want to fight for first place. We'll approach the best open-source levels, but won't compete for first there. Put simply, we don't want to make a general model that aces every test; we want to make the best agent model at "getting complex things done in real life."

But from another angle, our entire training framework is reusable across multiple scenarios. Through this first model release, we're essentially validating that the "base large model + skill packs" path works in complex long-chain tasks. So facing broader enterprise vertical demands, we don't need to train models from scratch — we just need to quickly enhance specialized skills for corresponding scenarios based on the same base, covering new benchmarks at extremely low marginal cost.

Q: We heard you can cut post-training costs by 10x without sacrificing performance. Where exactly do those savings come from, and what's hard about doing this at trillion-parameter scale?

Kaijie Chen: The savings come from not having to replicate an entire massive model for every user and every scenario.

Think of it this way: if you wanted to deploy a full trillion-parameter model for thousands of people, you'd be copying the same colossal thing thousands of times over. The compute required would be astronomical — economically impossible. But in our architecture, those thousands of models share one base, each carrying only a tiny LoRA. The compute needed barely increases compared to deploying a single model. What you're saving is those thousands of redundant base copies.

As for why "bigger means harder" — the difficulty doesn't scale linearly, it hits you as a series of engineering thresholds. Slapping a LoRA on a small model is nothing special. But stably training near-trillion-parameter models, and simultaneously deploying hundreds or thousands of LoRAs, requires an entire systems engineering stack: rewriting operators, managing VRAM, keeping training and inference consistent, loading and switching between millions of skills, isolating between multiple clients... every single one becomes a hard problem at this scale.

Domestically, we might be the only ones doing LoRA training at this size. We're even pushing toward the extreme low end — traditional LoRAs typically use rank 16 or 32, but we're researching algorithms with rank equal to 1 or even smaller, because a lot of personalization doesn't actually need to store that much information. The smaller the skill pack, the better the cost-performance, and the more you can hang on a single base.

Q: "Quantity" is a key word here. In December you could hang 10 LoRAs on one base; now you're talking millions. What enabled that leap? And does "number of models" itself become a new scaling dimension?

Kaijie Chen: Two things.

First, making LoRAs smaller and smaller — the rank-1 approach I just mentioned means each one is easier to carry. Second, better caching mechanisms. Where others might use three layers of cache, we added a fourth, plus a lot of parallel processing methods. So it's not millions activated simultaneously — it's millions that can be activated extremely fast, roughly within a second. A request comes in, hits a LoRA that isn't active yet, and still responds within a second. So "the same batch can only handle dozens" isn't actually a limitation — it depends on resources. If you want to deploy millions, just spin up more cards.

That "number of models itself becomes a scaling dimension" is genuinely exciting for us. The main thread of large model scaling has been making one model bigger and bigger; the agent era adds another line — scaling the number of models too.

We've validated that it works. The more models you hang, the more overall intelligence rises stably, roughly a natural-log-scale linear improvement. That was a pretty shocking finding for us. So we can do one per person, one per company, or one per task domain.

Q: You say only three companies globally can do this, but this sounds more like "got there first." If a major tech company commits fully, even builds their own LoRA post-training architecture, could they do it? What's your real moat?

Kaijie Chen: In large models, time itself is a moat.

Look at OpenAI and Anthropic — there's no real barrier between them, no "can do" versus "can't do." Same technical platform, people flow back and forth. Today's AI is a constant process of "forming consensus, chasing consensus, forming new consensus." From whether consensus exists at all, to reinforcement learning, to O1, R1, then to agents — everyone takes turns leading. The real difference is in that alternation: who builds it first, who moves faster, who can form a loop with users and B2B customers first, and lock in value.

But we have accumulated some things others can't easily bypass. One is genuine engineering depth and industry recognition. We're building AReaL-MinT with the open-source community alongside Ant Group and Huawei, and verl-mint with ByteDance and NVIDIA — the two main reinforcement learning frameworks domestically — both integrating our LoRA technology. NVIDIA featured us on their homepage. This isn't PR; it's people actually using us at the foundation layer.

Another is our starting point for looking at problems. Major tech companies typically build models top-down from pre-training, from data and infrastructure. We work backward from user needs, from problems that emerge in real products. This insight that grows out of product is something people only training models in labs can never get.

Q: Where specifically do these partnerships with major tech companies land? Following the money — what's your commercial logic? You serve cloud providers at the infrastructure layer while also building your own products. Don't those conflict?

Kaijie Chen: The partnerships operate at several levels.

With NVIDIA, it's bidirectional technical co-building in the open-source community — we write the operators, we build the foundation platform together. With ByteDance and Ant Group, it's co-building reinforcement learning frameworks in open source; we use their platforms and contribute our efficient training methods back. Moving up to the business layer: because we have efficient concurrent training and inference infrastructure that can cut customer training costs by an order of magnitude — roughly to 1/10th — we've formed partnerships with Huawei Cloud, Microsoft Cloud, Alibaba Cloud, Volcano Engine, and others. With Huawei, it's a deep strategic partnership.

As for whether they conflict, we're pretty clear-minded: we don't want to become a purely commercialized company. If a direction needs to become large-scale service requiring heavy investment, we'd rather let platform partners like Huawei Cloud and Microsoft Cloud scale it, while we stay focused on the technology itself. So "serving cloud providers while also building our own products" isn't left-hand fighting right-hand — it's division of labor. They do scale; we do the frontier. On the consumer side, it's mainly Macaron. For where we are today, getting the backend technology good enough matters more than rushing to revenue. When the technology is truly there, demand naturally finds you.

"Model memory shouldn't be a notebook — it should live in parameters"

Q: When a base carries thousands of LoRAs, what new things start happening between models?

Kaijie Chen: Division of labor and collaboration start emerging. Andrew shared an analogy that really excited me — he feels we're making models grow "biology."

Before biology existed, there was only chemistry, only atoms and molecules. The critical transition from chemistry to life was the cell membrane. It defined the boundary between inside and outside, establishing the essence of a living organism. In AI, we call this boundary isolation — Isolation. Each LoRA is an independent unit, like an individual wrapped in a cell membrane.

Previous models only had "physics and chemistry" — competing on parameter count, data volume, compute. But when you can isolate models from each other while letting them exchange information efficiently, it's like the move from single-celled to multicellular life. Division of labor naturally forms, followed by heredity and evolution. AI's trajectory is stepping from pure chemistry into the long river of biological evolution.

Q: But Isolation sounds like a very "engineering" word, even a bit mundane. Why elevate it to such importance?

Kaijie Chen: Precisely because it looks mundane, it's easily underestimated.

When people talk about the future of memory, they usually fixate on two fancy directions: better model architectures, more efficient algorithms. Isolation ranks third — it sounds like dirty work, just "keeping data separate." But as I said, that leap from chemistry to biology depended on the cell membrane — this "isolation."

And Isolation isn't just a technical problem; it's the precondition for this whole thing to actually enter society. There are walls between enterprises. One company cannot and will not hand over its long-term memory to be kneaded together with others' into one unified large model.

The same applies between people — if one model holds both my long-term memory and yours, I could just ask it and extract your entire privacy. That's terrifying. Each person's, each enterprise's memory must be cleanly separated. LoRA's "one base, countless independent skill packs" is, for now, a very good way to achieve this isolation.

Q: Why are you convinced that the base large model itself can't solve "memory" and "personalization," that you need a mechanism like LoRA to fill the gap?

Kaijie Chen: Because today's mainstream memory approach is essentially writing things into an external document or database — you can think of it as a constantly lengthening notebook hanging next to the model. It remembers facts and context.

This works well at first; the model understands you better with use. But it has an unavoidable flaw: this notebook only grows, never shrinks. The more it records, the lower the probability that the model can actually "read into its brain" the specific thing you need right now — because the model's real reading window is limited. Past a certain threshold, the experience starts degrading. Consumers haven't used a product that gets "worse with use" in a long time. WeChat gets better the more you use it, because you accumulate more friends. But a notebook-style memory assistant might start getting dumber by week three.

Our judgment is that true long-term memory shouldn't be written in an external notebook — it should be "trained into parameters." What's written into prompts or documents is temporary, external; what's trained into parameters is something the model itself grows, a stable capability. LoRA happens to be the right tool for this — it distills your preferences, habits, and ways of interacting into a small slice of model parameters, rather than a piece of text that could get pushed out of the window at any time.

Q: Under this "parametric memory" direction, we noticed you actually have more than just the LoRA line — there's also something called δ-mem. One is an offline-trained parameter skill pack; the other is a real-time-updating online memory matrix. How do these two divide labor in your memory system? Or are you yourselves betting on which one is right?

Kaijie Chen: These two aren't as opposed as people think. δ-mem also grew out of the LoRA approach — fundamentally it's doing the same thing, sinking memory into parameters rather than hanging it outside. It's just that in our R&D process, some innovative architectural ideas emerged, so we built it out, and it turned out to work pretty well.

Q: If in three to five years, the general base model becomes strong enough to directly understand every user, does your whole "hang a LoRA for each person" approach become meaningless?

Kaijie Chen: I don't think so, and the reason is precisely Isolation, which I mentioned earlier.

The most fundamental point is that each person's data, experience, and life history are stored separately — meaning my data and another person's are difficult and shouldn't be mixed together to train into one model that then serves all of us well.

The base model will definitely keep getting smarter, but each person's unique experiences will still need to be supported by data that belongs to them alone — and that data will ultimately settle into parameters and model layers that are yours. So even as the foundation grows stronger, the need for "each individual to have their own isolated slice of parameters" won't disappear; if anything, it becomes more essential. A stronger base just makes every personalized skill module hanging off it more valuable, not less.

Q: Another hot term these past two years is "harness" — wrapping a model in an environmental memory framework. Wouldn't "general model + harness" be enough? Do you even need this "general model + LoRA" approach of yours?

Kaijie Chen: We actually build harnesses ourselves, and because we do harness and model training together, we have more room to do this well.

On "post-training plus harness," we're pretty much on par with the best teams out there. At the same time, we've chosen our own direction: daily life, clothing, food, housing, transportation, long-term living themes. In this direction, putting model training, post-training, continuous learning LoRAs, and harness together — I believe we can create the most distinctive and valuable product experience.

So the development of harness is good for us, because we can train models specifically for harness — a lot of teams can't do that. To be specific: in our product experience, there's a dedicated model. You casually record and share life fragments, it gets to know you better and better, recommending restaurants you need, workout plans, weight loss plans, what to buy for your kids — getting more and more accurate. This experience requires model and harness to work together. OpenAI, for instance, wouldn't specifically train a dedicated harness and dedicated model for this. That's our opportunity — putting product form and model training together.

Q: If the LoRA path doesn't produce expected results in one or two years, or three to five years, would you pivot to something else? Or are you all-in on LoRA?

Kaijie Chen: What hasn't changed in three years are two things: from day one, we've insisted on using training methods to improve agent capabilities. Second, having research and product do co-design together — using real products to provide real tasks, real failure cases, then feeding that back to train the model. Today you rarely see excellent model companies without their own products; conversely, it's also quite hard.

Q: How do you define what kind of company you are? Would you straight up call yourselves a "model company"? Compared to Moonshot AI, Zhipu AI, what's the difference?

Kaijie Chen: We've become a Frontier Lab that builds Agent models.

But this is somewhat different from the model companies people are familiar with. Moonshot AI, Zhipu AI — these are more starting from pre-training, from data and infrastructure, to build general base models. We start from user needs, from problems that emerge in real products, to do post-training and continuous learning. Put bluntly: others have the model first then find the scene; we work backwards from the scene to the model.

This naturally leads to certain characteristics. Post-training is inherently closer to users — you have to understand data to do better post-training. Pre-training is learning from the internet, learning human knowledge; post-training is learning scenarios, learning how to interact better in a scenario. Even company scale differs — pre-training and post-training differ by about half an order of magnitude in compute needed, three to ten times, and the final scale is different too. In China, companies training models from this perspective should be quite rare.

Outsiders sometimes call this form a Neolab — not a traditional lab, but a new kind of AI company organization: young teams, high talent density, goal not to package an AI application but to continuously probe the technical ceiling. Overseas, Thinking Machines Lab, Ilya's SSI, and Fei-Fei Li's World Labs have this quality; domestically it's still relatively rare. We're roughly this form. In technical depth we share similarities with them, but product and model started earlier.

Q: When did you clearly decide "we want to be a post-training company"? What was the biggest struggle in between, and how did you decide?

Kaijie Chen: When the company was born, Andrew's paper planted the seed. It's called Towards Language Agent Fine-Tuning — moving towards post-training for large language model agents. But making it solid was hard — you had to rally researchers, have enough compute and funding to support exploration, and find answers on application direction, otherwise you couldn't train in an empty environment. It was more about how to make it reality over these two and a half years.

Deciding to do large-scale reinforcement learning was really hard. When we did it, there were maybe only four or five in China: DeepSeek, Moonshot AI, ByteDance, Alibaba, then us. Committing was hard then — not much money, not many people, yet challenging something this difficult. But without reinforcement learning, you can't do post-training, so in the end we bit the bullet and did it. Looking back today, it was the right choice.

I could grit my teeth because we were convinced we're a post-training company. Our preference for entrepreneurship is consistent — it should be a successful, technically valuable company.

Q: Now high-performance general models are increasingly closed-source, and you need sufficiently large models to work well. If in the future models are all closed-source and you even become a model purchaser, how much profit margin is left?

Kaijie Chen: I think there will always be open-source models. Right now the gap between open and closed source isn't large. If one day the difference becomes huge, things might change. But I think China will continue to have good open-source models — that won't change. As long as there's a number two, people will still tend towards open source. If it really all goes closed source and we have to purchase, then calculating cost-effectiveness, how much value serving users can generate — that might be future business model consideration. The company isn't at the stage of thinking about this yet. It's also possible that in that scenario, we'd do something like Microsoft and OpenAI did initially, deeply partnering with one company — that's not impossible.

Q: Three years from now, how do you want people to remember Mindverse? Have you thought about the endgame — IPO, acquisition, or something else?

Kaijie Chen: The endgame in our minds is a flywheel between agent model and consumer product. Our technology drives product experiences that others can't build — this even includes hardware and other forms, and we're working with some companies on this. At the same time, this training and deployment capability will serve more and more enterprises; the 2B line is also growing fast. Looking further, this industry's endgame might be astronomical compute deployed in space, exceeding a country's current power generation — that's a distant vision. Everything else is process.

Q: If you had to choose between "making a history-changing research breakthrough" and "building a sustainably profitable company," which side do you pick?

Kaijie Chen: We'd choose the research breakthrough side. Not that we don't care about profit — we believe that if you truly solve problems others can't solve technically, business will naturally come to you. The reverse doesn't hold.

If this path succeeds for an ordinary person, his life would have fewer worries and more grounded happiness. But everyone's circumstances differ, joys and sorrows are a thousand faces for a thousand people — and this is precisely the full meaning of "personalization." Not giving everyone a smarter model, but making intelligence truly understand every unique soul.