"Xinzhou Technology" Closes $50 Million Series A Led by Meituan | Linear Portfolio
An agent's capabilities ultimately come down to the underlying model training.

Mindverse has announced the completion of its Series A funding round led by Meituan, with participation from Yuanhe Puhua, Shokz, Variable Capital, and follow-on investments from existing shareholders. The company's historical investors include top-tier funds such as Ant Group, HSG, Linear Capital, ZhenFund, and Gaorong Ventures. The funding round totaled nearly $50 million.
Building on general-purpose large models, the company uses reinforcement learning to teach them how to get things done through complex, multi-step real-world tasks — transforming models from "knowing a lot" to "getting things done well."
In a recent conversation with GeekPark, founder Kaijie Chen explained how post-training methods can produce a model that's cheap, capable, and keeps improving over time.
How do you build an advanced agent model that keeps learning? As the ceiling for large model intelligence keeps getting pushed higher, the definitive answer to "continual learning" still hasn't emerged.
Mindverse may be one of the few startups betting on what's happening inside the model itself. Building on general-purpose large models, it uses reinforcement learning to teach them how to get things done through complex, multi-step real-world tasks — transforming models from "knowing a lot" to "getting things done well." The key to achieving continuous evolution lies in LoRA technology. Think of it as hanging countless lightweight "skill modules" on a powerful shared brain — each module uses only a tiny fraction of parameters, yet can be updated independently and kept isolated from others. This allows the model to continuously accumulate memories and capabilities for specific users or scenarios at minimal cost, rather than retraining from scratch every time. Three years ago, when the entire industry's attention was still fixated on pre-training, Mindverse founder Andrew wrote a judgment in a paper co-authored with Shunyu Yao that almost nobody agreed with:
Agent capabilities must ultimately come from model training itself, not from piecing together prompts and frameworks.
Early Mindverse internal research meeting
Three years later, as industry attention has shifted from pre-training to post-training, the company discovered that the path it had been quietly walking had placed it right at the center of the tide. Soon they will open-source their trained 750B agent model, which will also become the world's first reinforcement learning post-training result completed on GLM 5.1.
They saw this coming early, yet the team remains small. Mindverse's core R&D team numbers about 20 people, with members from DeepSeek, ByteDance Seed, xAI, and backgrounds from Tsinghua University, MIT, and Duke, having collectively published over 200 top-tier conference papers.
The two founders, Kaijie Chen and Andrew, dropped out of school together to start a company back in 2018. They worked on robotics, ran a lab, then each returned to school before coming back together in 2023. Andrew grasped the technical paradigm shift from agent training to advanced agent models. Chief Scientist Xiaoteng Ma brought a decade of reinforcement learning expertise. Kaijie Chen focuses primarily on applying the business model and judging user value.
In this conversation with Kaijie Chen, we wanted to understand: How do you use post-training to produce a model that's cheap, capable, and keeps improving? Here is GeekPark's conversation with founder Kaijie Chen:

Q: Over the past year or two, industry attention has visibly shifted from pre-training to post-training. When did post-training really become important?
Kaijie Chen: Today the boundary between pre-training and post-training is increasingly blurred, with pre-training itself incorporating large amounts of agent trajectory data. But broadly speaking, the distinction still holds: pre-training primarily uses internet data to build a basic understanding of the world, while post-training converts that understanding into concrete capabilities.
The real inflection point probably came with DeepSeek's release of R1. That was the first time the industry saw reinforcement learning systematically drive large model capability improvements, and it marked the beginning of post-training's rapid rise in importance. Before that, post-training might have consumed only 3% to 5% of the compute used for pre-training. Now, the vast majority of model capability improvements happen during post-training.
One important reason is that the industry has begun accumulating data that didn't previously exist on the internet. Products like Claude Code are generating massive amounts of agent trajectories from real tasks, which are being captured and becoming an important foundation for driving continuous post-training evolution.
Q: What does post-training actually solve for models? Is it capability, alignment, or "learning to do things"?
Kaijie Chen: It's about enhancing real-world task capability on top of the foundation of "basic understanding of the world." Pre-training gives it knowledge and a worldview, but someone who knows a lot isn't the same as someone who can get things done — post-training fills in that second half: how to apply existing knowledge correctly in real tasks.
And there's an even more future-oriented form of this, called continual learning. What we want is a training method that lets models evolve and improve at very low cost, learning new knowledge and taking on new tasks, while also forgetting knowledge and tasks that are no longer needed — allowing performance to improve gradually. It doesn't freeze after training; it keeps updating itself while operating in real-world scenarios.
Q: You committed to solving this through training quite early. In 2023, FireAct proposed that "agent capabilities come from training, not prompts" — that was a non-consensus view at the time. Why were you willing to bet on it so early?
Kaijie Chen: This judgment came from my second entrepreneurial experience after leaving school. At the time we were building AI games, using GPT-2 and GPT-3 era models. The capabilities were quite limited, but we already needed to build an AI world that constantly evolved with user behavior — essentially constructing complex workflows.
During those two years, we clearly saw one problem. A single step with 95% success rate seems quite high, but when you chain a dozen steps together, errors compound and ultimately destroy the entire experience. Long-horizon tasks cannot rely solely on prompt engineering; capabilities must ultimately be acquired through training. What people today call a trajectory is essentially a continuous thought-and-action sequence.
Later, Shunyu Yao proposed ReAct, organizing thinking and action into a continuous trajectory. When we saw this, it strongly resonated with us. From that point on, we became increasingly convinced that agent capabilities would ultimately come back to training itself. After Andrew published the FireAct paper with him, we decided to continue down this path, start a company, and make it happen.
Q: This path inevitably goes through LoRA. But most people's impression of LoRA still stops at "adding a filter to an image." It's clearly not playing that role for you. How should we re-understand it, and what's its relationship with reinforcement learning?
Kaijie Chen: We chose LoRA initially for a very practical reason — it's an extremely cost-effective training method. You can think of it as an adaptive adapter: instead of touching the entire model, it extracts the most critical parameters and trains those, achieving the full model's training effect with very few parameters. Because we only had dozens or at most a hundred cards in our cluster, this constraint forced us to squeeze every drop of efficiency from our compute.
But today LoRA has evolved into the technical foundation for building continual learning — it's what enables model capabilities to be continuously carried and updated. It and reinforcement learning are actually two separate things with distinct roles. Reinforcement learning is the primary method in post-training, responsible for actually training out the model's capabilities. At the trillion-parameter scale, both making reinforcement learning work with LoRA and adapting LoRA itself are difficult, but neither can be avoided.
Q: So what was the real turning point in your research? We noticed a subtle detail — almost simultaneously, Thinking Machines in Silicon Valley was working on the exact same thing.
Kaijie Chen: The turning point came around September 2025. We discovered that doing reinforcement learning with LoRA on a sufficiently large MoE model incurred no performance loss. A lightweight low-rank method for updating the model achieved the same results as updating all parameters across the entire model. This meant we could get identical results to full-parameter training at one-tenth the cost. What had been a trade-off between performance and efficiency became a monotonic optimization for efficiency.
Our first reaction when we got this result was to doubt ourselves. It wasn't until Thinking Machines subsequently published "LoRA Without Regret" on September 29th, with conclusions completely consistent with ours, that we felt reassured to see independent confirmation.
By late December, we had completed trillion-parameter LoRA reinforcement learning, publishing around the same time as Thinking Machines. Globally, only two groups could do this at that point; this year, with Fireworks (Cursor Composer's model partner), there are three.
Q: You called LoRA "the technical foundation for building continual learning." What does that mean specifically? Why does this LoRA layer become the key component for a model's "continual learning"?
Kaijie Chen: It's a smaller layer on top of the base model. For example, our latest upcoming model is the base model plus this LoRA layer on top. The LoRA layer's parameters are roughly 0.5% of the base model's, though with many modules it can be larger. Because this layer doesn't have many parameters, it's cheap to train, easy to train, and scalable.
Here's an example: suppose I'm serving a financial client and first train their stock and market data into a financial reasoning model. Three months later, the financial markets have moved, stock prices have changed — what about this new data? For OpenAI or Anthropic, incorporating this into pre-training would be extremely expensive, difficult, and costly to mobilize. But for this financial client, because the LoRA itself is small enough, they can simply continue training this LoRA with the new data. The LoRA's size isn't fixed either — it can be made very, very small, so small that each person has their own, a thin slice. Training on one person's data for a month might be on the order of tens of dollars. And the largest LoRA, capable of matching full-parameter training results, is only tens or hundreds of thousands of dollars.
So it has enormous flexibility: whether you have a little enterprise data or a lot, it works; whether you want near-pretraining-level gains or need it to learn a new programming language, that works too. Thin, easy to train, stackable, cheap — that's the first level of what LoRA means for continual learning, on a single LoRA.
Q: Without using any jargon, how would you explain to an ordinary person what it is you're actually doing?
Kaijie Chen: We're taking a powerful base model and hanging lots of "skill packs" off it, so that one model can simultaneously become thousands of models with different strengths, serving different people, different companies, different scenarios.
The base is the base model, providing the ceiling of general intelligence; those "skill packs" are LoRAs, each carrying a small slice of something specific — it could be a user's long-term preferences, a company's operational experience, the playbook for a certain type of task. The default approach used to be "one model serves everyone," with everyone sharing the same parameters; what we want to do is the opposite: share one smart base, but have each person, each scenario, have their own slice of parameters that can keep growing. We call this architecture mixture of LoRA.
Q: "Mixture of LoRA" immediately brings MoE to mind — the familiar mixture of experts. Are these two mixtures the same thing?
Kaijie Chen: There are things we've learned from MoE, but they're different. In MoE, a single expert can't perform inference on its own; it's more like a computational unit that the model divides internally. But in mixture of LoRA, each LoRA is unique, can be called independently, and corresponds to a clearly defined capability.
For example, say I'm doing a financial task — I can hang 10 LoRAs at once, one learning stock prices, one learning financial reports, one learning risk control... each learning its own thing. Later, if I need to add two new tasks, like tips for Hong Kong IPO lottery draws, I don't need to touch those 10 already-trained LoRAs at all — I just add two more LoRAs to learn the new tasks, hang them up when they're done, and the model's capabilities naturally expand by that piece, with the old ones completely unaffected. That's why we say it's a structure "naturally suited for continual learning." Because all its capabilities accumulate block by block, rather than having to retrain the entire model every time you add something new, risking forgetting old skills. That's the second level of what LoRA means, in the continual expansion of mixture of LoRA.
We're also exploring possibilities further out, like having LoRAs negotiate and collaborate with each other. Once we have this mixture of LoRA architecture, we're interested in how different LoRAs might work together, and whether the model's diversity could produce better results.
Q: This structure manifests in something concrete — the model you're about to release?
Kaijie Chen: Yes, we'll soon open-source the model we've trained. It natively supports mixture of LoRA, a 750B parameter Agent model, consisting of 744B of pre-trained GLM 5.1 plus 6B of LoRA. We should be the first team besides Zhipu AI itself to complete reinforcement learning post-training on GLM 5.1.
Doing LoRA reinforcement learning on GLM 5.1 has real engineering hurdles — you need to adapt DSA (DeepSeek Sparse Attention), plus MTP (Multi-Token Prediction). Our model isn't chasing the "knows everything" general base model; it's specifically deep post-trained for agent scenarios, mainly serving generative UI coding, everyday life chat, long-chain reasoning, and tool calling.
Q: You're defining this new model as an Agent Model. How should we understand that term? Is everything everyone's investing in post-training ultimately for this?
Kaijie Chen: The latest frontier models are all agent-oriented models. Take Claude: after it launched Claude Code, model training started using Claude Code data, which is completely different from how we normally use Doubao with its "ask one thing, get one answer" pattern. In Claude Code, writing a piece of code is a very long task with lots of interaction in between — it's long-chain data. After training on this data, Claude becomes increasingly "agent native," increasingly adapted to agent architecture, because it was trained on this data in the first place. So models and application scenarios reinforce each other; everyone is evolving in this direction, at different paces.
We're doing the same thing, just with life as our scenario. Macaron is our agent harness; in life scenarios there are similarly many complex tool calls, code executions, and lots of fuzzy requests where users themselves don't know what they want. We string these into continuous task chains, training the model to perform better along these chains to improve agent performance. So when we say agent model, we mean: this model is trained to be used in a multi-turn agent environment, specifically optimized for that environment. It's still a model, but trained for agent tasks.
What's unusual about us is that there are almost no models on the market specifically optimized for agent workflows. Most open-source models domestically are still catching up to the most advanced generation of GPT and Claude, so much of their energy is still on pre-training — how to catch up first, perhaps not yet having the bandwidth to do the agent part particularly well in post-training.
Claude is gradually doing it, and doing it very well, but they have many more topics to manage. We're specifically training models for agent tasks, making them better at agent work — tool calling, memory retrieval, when to hand tasks back to users, when to continue multi-turn thinking — all of this it will do better.

Q: People first came to know Mindverse through Macaron. You mentioned that Macaron isn't just a consumer product, but the model's agent harness. Can you explain specifically how the model and product feed each other? How is this different from what people usually mean by "training models on user data"?
Kaijie Chen: From the very beginning, we've looked at model training and consumer app iteration as one thing. It's not as simple as "build the model first, then use the product to collect some data" — it's a bidirectional loop.
But we have a key difference from many others: we don't directly train on user data. Privacy in life matters just as much as privacy at work, yet many people will directly use user data to train models. Our approach is to use user feedback to understand the distribution and characteristics within data, then build our own simulation environment and put the model in it to train. We deliberately add lots of noise, interference, and extreme cases, because real user behavior is already very extreme: they'll interrupt midway, change goals, and provide wrong or outdated information. A model trained in this environment can withstand the situations that agents actually encounter in reality. And post-training doesn't actually need that much data — tens of thousands or hundreds of thousands of examples is already a meaningful scale. Unlike pre-training, it doesn't need massive volume; what matters more is very high data quality.
Conversely, the model also feeds the product. These trained capabilities are deployed directly back into Macaron after training; the ceiling of product experience is fundamentally determined by model capability. This is the same logic as Anthropic: Claude's training directly serves Claude Code, and what runs through Claude Code flows back to train the model — except our scenario is life. So Macaron's significance for us isn't as an additional product entry point, but as providing the model with a real, long-term, continuously feedback-generating agent harness and training environment. Macaron currently has over 2 million users and more than 100,000 DAU.
Q: You place a lot of emphasis on "generative UI." Isn't it enough for the model to explain answers clearly? Why does it need to "draw interfaces"?
Kaijie Chen: Having the model return everything as text isn't actually a good form of expression. Humans are inherently visual animals; our perception of graphics is significantly better than our perception of text. For the same information, showing a chart is definitely clearer than writing those numbers out in a long paragraph — what this saves is your cognitive load. What Google presented at I/O with omni means the same thing: models should deliver results in richer forms, rather than always dumping a wall of text for you to digest yourself.
So on the A2UI standard that Google defined, SOTA isn't just measured by "can the model generate UI," but by "how much cognitive load does the interface it generates reduce for the user." In life scenarios this is especially critical: asking "what should I eat today" and getting a few directly tappable option cards versus getting 300 words of text — completely different experiences. Whether a model can "communicate well" directly determines the consumer experience.
Q: The benchmarks you've published for the model are also interesting — you achieved SOTA on life tasks, but for hardcore tasks like code and math, you explicitly said you approach but don't chase first place. That trade-off itself is a statement, right?
Kaijie Chen: That choice itself says what kind of company we are. We strongly agree with Shunyu Yao's view in "AI's Second Half": going forward, benchmarks may be the most important part of model training, because what benchmarks you choose is what tasks you want your model to get stronger at.
We picked four: Living Bench, which we defined ourselves; Vita Bench, from Meituan — these target life-category long-chain tasks, like trip planning, which sounds simple but actually involves many steps and personal preferences; A2UI, the generative UI standard from Google; and PinchBench, commonly used overseas to characterize agent task performance for things like OpenClaw. We achieved SOTA on all four.
Customer service, code-writing, pure math — these traditional tasks matter to us too, but they're not where we most want to be first. We'll approach the best open-source model levels, but won't compete for first place there. Put simply, we don't want to build a general model that aces every test; we want to build the best agent model at "getting complex things done in real life."
But from another angle, our entire training framework is reusable across multiple scenarios. Through this first model we're releasing, we're essentially validating that the "base large model + skill packs" path works in complex long-chain tasks. So facing broader enterprise vertical demands, we don't need to train models from scratch — we just need to quickly enhance specialized skills for corresponding scenarios based on the same base, covering new benchmarks at extremely low marginal cost.
Q: We heard you can cut post-training costs by 10x while keeping performance the same. Where exactly do those savings come from, and what's hard about doing this at the trillion-parameter scale?
Kaijie Chen: The savings come from not having to replicate an entire large model for every user and every scenario. To use an analogy, if you wanted to deploy a complete trillion-parameter model for thousands of people, you'd be copying the same massive object thousands of times over — the compute required would be astronomical, economically impossible. But in our architecture, these thousands of models share one base, each carrying only a small LoRA. The compute needed barely increases compared to deploying a single model. What you save is those thousands of redundant copies of the base.
As for why "bigger means harder," the difficulty doesn't scale linearly — it hits engineering thresholds one after another. Attaching a LoRA to a small model is nothing special, but stably training on a near-trillion-parameter model and simultaneously deploying hundreds or thousands of LoRAs is an entire systems engineering problem: operators need rewriting, memory needs managing, training and inference need to stay consistent, millions of skills need loading and switching, multiple customers need isolation... every single one becomes a hard problem at this scale.
Domestically, we might be the only ones doing LoRA training at this size. We're even pushing toward the extremely small end — traditional LoRAs typically use rank 16 or 32, but we're researching algorithms with rank equal to 1 or even smaller, because a lot of personalization doesn't actually need that much information stored. The smaller the skill pack, the better the cost-performance, and the more you can hang on a single base.
Q: "Quantity" is a key word here. In December last year you could hang 10 LoRAs on one base; this year you're saying millions. What enabled that leap? And does "number of models" itself sound like a new scaling dimension?
Kaijie Chen: Two things. First, making LoRAs smaller and smaller — the rank-1 approach I just mentioned makes each one easier to carry. Second, better caching mechanisms — where others might use three layers of cache, we added a fourth, plus a lot of parallel processing methods. So it's not millions activated simultaneously; it's millions that can be activated extremely fast, roughly within one second. A request comes in, hits a LoRA that isn't yet active, and still gets a response within a second. So "the same batch can only do a few dozen" isn't actually a limitation — it depends on resources. If you want to deploy millions, just add more cards.
That "model count itself becomes a scaling dimension" is something that genuinely excites us. The main scaling thread for large models has been making one model bigger and bigger; the agent era adds another line: scaling the number of models too.
We've validated that this works. The more models you hang, the more overall intelligence rises stably — roughly a linear improvement on a natural log scale. This was actually a pretty shocking finding for us. So we can do one per person, one per company, or one per task domain.
Q: You say only three companies globally can do this, but this sounds more like "got there first." If a major tech company commits to it, even builds their own LoRA post-training architecture, could they do it? What's your real moat?
Kaijie Chen: In large models, time itself is a barrier. Look at OpenAI and Anthropic — there's no real moat between them, no "can do" versus "can't do." Same technical platform, people move around. Today's AI is a process of constantly "forming consensus, chasing consensus, forming new consensus." From whether there's consensus at all, to reinforcement learning, to O1, R1, to agent — everyone takes turns leading. The real difference in this alternation is who builds it first, who moves faster, who can form a loop with users and B2B customers first and lock in that value.
But we have accumulated some things others can't easily bypass. One is genuine engineering depth and industry recognition. We're building AReaL-MinT with the open-source community alongside Ant Group and Huawei, and verl-mint with ByteDance and NVIDIA — the two main reinforcement learning frameworks domestically, both integrating our LoRA technology. NVIDIA featured us on their homepage. This isn't PR; it's people actually using us at the foundation layer.
Another is that we start from a different place. Major tech companies typically build models top-down from pre-training, from data and infrastructure. We work backward from user needs, from problems that emerge in real products. This insight that grows out of product is something people only training models in labs simply can't get.
Q: Where specifically do these partnerships with major tech companies land? Following the money — what's your commercial logic? You do both underlying infrastructure for cloud providers and your own products. Don't those conflict?
Kaijie Chen: The partnerships operate at several levels. With NVIDIA, it's bidirectional technical collaboration in the open-source community — we write operators, co-build the underlying platform together. With ByteDance and Ant, it's co-building reinforcement learning frameworks in open source; we use their platforms and contribute our efficient training methods upward. Moving up to the business layer: because we have efficient concurrent training and inference infrastructure that can cut customer training costs by an order of magnitude, roughly to 1/10 of original, we partner with Huawei Cloud, Microsoft, Alibaba Cloud, Volcano Engine — with Huawei, it's a deep strategic partnership.
As for whether they conflict, we're pretty clear-minded: we don't want to become a purely commercialized company. If a direction needs to become large-scale service requiring heavy investment, we'd rather let platform partners like Huawei Cloud and Microsoft scale it, while we stay focused on the technology itself. So "serving cloud providers while also building our own products" isn't left hand fighting right hand — it's division of labor. They do scale; we do the ceiling. On the consumer side, it's mainly Macaron. For where we are today, getting the backend technology good enough matters more than rushing for revenue. When the technology truly delivers, demand naturally finds you.

Q: When thousands of LoRAs hang on one base, what new things start happening between models?
Kaijie Chen: Division of labor and collaboration start emerging. Andrew shared an analogy that genuinely excited me — he feels we're making models grow "biology."
Before biology existed, there was only chemistry, only atoms and molecules. The critical transition from chemistry to life was the cell membrane. It drew a clear inside-outside boundary, defining the essence of a living organism. In AI, we call this boundary isolation. Every LoRA is an independent unit, like an individual wrapped in a cell membrane.
Previous models only had "physics and chemistry" — competing on parameter count, data volume, compute. But when you can isolate models from each other while letting them efficiently exchange information, it's like single-celled organisms evolving into multicellular life: division of labor naturally forms, followed by heredity and evolution. AI's trajectory is stepping from pure chemistry into the long river of biological evolution.
Q: But "isolation" sounds like a very "engineering" word, even somewhat mundane. Why elevate it to such importance?
Kaijie Chen: Precisely because it looks mundane, it's easily underestimated. When people talk about the future of memory, they usually fixate on two fancy directions: better model architectures, more efficient algorithms. Isolation ranks third — it sounds like grunt work of "keeping data separate." But as I said, that leap from chemistry to biology depended on the cell membrane's "isolation."
And isolation isn't merely a technical problem — it's the precondition for this whole system to truly enter society. There are barriers between enterprises; one company cannot and will not hand over its long-term memory to be kneaded into someone else's unified model. Between individuals it's even more so — if one model holds both my long-term memory and yours, I could extract your entire privacy just by asking it. That's terrifying. So every person's, every enterprise's memory must be cleanly separated. LoRA's "one base, countless independent skill packs" is, for now, a very good way to achieve this isolation.
Q: Why are you convinced that large models alone can't solve "memory" and "personalization," that you must use a mechanism like LoRA to supplement them?
Kaijie Chen: Because today's mainstream memory approach essentially writes things into an external document or database — you can think of it as a constantly growing notepad hanging next to the model. It remembers facts and context. This works well initially; the model understands you better with use. But it has an unavoidable flaw: this notepad only grows, never shrinks. The more it records, the more the model's actual "reading bandwidth" for any given moment is limited. So the more stored, the lower the probability of hitting exactly what you need right now — past some critical point, the experience starts degrading. Consumers actually haven't used a product that gets "worse with use" in a long time. WeChat gets better the more you use it, because you have more friends. But a notepad-style memory assistant might start getting dumber by week three.
Our judgment is that true long-term memory shouldn't be written in an external notepad, but "trained into parameters." What's written into prompt or document is temporary, external; what's trained into parameters is the model's own grown, stable capability. LoRA happens to be the tool that carries this — it precipitates your preferences, habits, and ways of interacting with it into a small slice of model parameters, rather than text that could get pushed out of the window at any moment.
Q: Under this "parametric memory" direction, we noticed you actually have more than just the LoRA line — there's also something called δ-mem. One is an offline-trained parameter skill pack; the other is a real-time-updating online memory matrix. How do these two divide labor in your memory system? Or are you yourselves betting on which one is right?
Kaijie Chen: Actually these two aren't as opposed as people think. δ-mem also grew out of this LoRA methodology — fundamentally it's doing the same thing, precipitating memory into parameters rather than hanging it outside. It's just that in our R&D process, some innovative architectural ideas emerged, so we built it out, and it turned out to work pretty well.
Q: Then I have to ask the sharpest question. If in three to five years, the general base model itself becomes strong enough to directly understand every user, does your whole "hang a LoRA for every person" approach become meaningless?
Kaijie Chen: I don't think so, and the reason is precisely the isolation I mentioned earlier. The most fundamental point is that each person's data, experience, and life history are preserved separately — meaning my data and another person's are difficult and shouldn't be mixed together to train into one model, then have that single model serve all of us well. The model itself will definitely get smarter, but each person's unique experience ultimately still needs to be supported by data belonging to that person, and these things will eventually precipitate into parameters belonging to you, into model layers belonging to you. So even as the base gets stronger, the need for "each entity having a slice of parameters that belongs only to itself and is isolated" won't disappear — if anything, it becomes more rigid. A stronger base only makes each personalized skill pack hanging on it more valuable, not erases them.
Q: Another buzzword these past two years has been "harness" — wrapping a contextual memory framework around a model. Could "general-purpose model + harness" be enough, making your "general-purpose model + LoRA" approach unnecessary?
Kaijie Chen: We actually build harnesses ourselves, and because we integrate harness with model training, we have more room to do this well. On "post-training plus harness," we're pretty much on par with the best teams out there. At the same time, we've chosen our own direction: daily life — clothing, food, housing, transportation, long-term living themes. In this direction, putting model training, post-training, continual learning via LoRA, and harness all together — I believe we can create the most distinctive and valuable product experience.
So the evolution of harness is actually good for us, because we can train models specifically for harness — a lot of teams can't do that. To be specific: in our product experience, there's a dedicated model. You casually jot down notes, share life fragments, and it gets to know you better and better — recommending restaurants you need, workout plans, weight-loss plans, what to buy for your kids, getting more and more accurate. This experience requires model and harness to work in tandem. OpenAI, for instance, wouldn't specifically train a dedicated harness and dedicated model for this. That's our opportunity — putting product form and model training together.
Q: If the LoRA path doesn't deliver expected results in one or two years, or three to five, would you pivot to something else? Or are you all-in on LoRA?
Kaijie Chen: What hasn't changed in three years comes down to two things. From day one, we've insisted on using training methods to improve agent capabilities. Second, having research and product co-design together — using real products to provide real tasks, real failure cases, then feeding that back into model training. Today, you rarely see excellent model companies without their own products; conversely, it's quite hard too.
Q: So how do you define what kind of company you are? Would you straight-up call yourselves a "model company"? How are you different from Moonshot AI, Zhipu AI, and the like?
Kaijie Chen: We've become a Frontier Lab that builds agent models, but we're quite different from the model companies people are familiar with. Moonshot AI, Zhipu AI — they're more coming from pre-training, from data and infrastructure, building general-purpose base models. We're coming from user needs, from problems that emerge in real products, doing post-training and continual learning. Put bluntly: others have a model first and then find scenarios; we're working backward from scenarios to model.
This naturally leads to certain characteristics. Post-training is inherently closer to users — you have to understand data to do better post-training. Pre-training is about learning from the internet, learning human knowledge; post-training is about learning scenarios, learning how to interact better within a scenario. Even company scale differs — the compute needed for pre-training versus post-training differs by roughly half an order of magnitude, three to ten times, and the final scale is different too. In China, companies training models from this perspective should be quite rare.
Externally, people sometimes call this form a Neo Lab. It's not a lab in the traditional sense, but a new kind of AI company organization — young teams, high talent density, where the goal isn't packaging an AI application but continuously pushing the technical ceiling. Overseas, places like Thinking Machines Lab, Ilya's SSI, and Fei-Fei Li's World Labs have this quality; domestically, it's still relatively rare. We're roughly this form — we share technical depth with them, but started earlier on product and model.
Q: When did you clearly decide "we're going to be a post-training company"? What was the biggest struggle in this, and how did you finally decide?
Kaijie Chen: Actually, it was planted when the company was born — Andrew's paper, called Towards Language Agent Fine-Tuning, moving toward post-training for large language model agents. But making it solid was hard. You had to rally researchers, have enough compute and funding to support exploration, and find answers on application direction — otherwise you can't train in an empty environment. More than anything, it was about turning it into reality over these two and a half years.
Deciding to do large-scale reinforcement learning was genuinely difficult. When we did it, there were maybe only four or five domestically — DeepSeek, Moonshot AI, ByteDance, Alibaba, and us. Committing was hard back then: not much money, not many people, yet challenging something this difficult. But without reinforcement learning, you can't do post-training, so in the end we bit the bullet and did it. Looking back today, it was the right choice. I could grit my teeth because we were convinced we're a post-training company — our entrepreneurial preferences aligned. It should be a successful, technically valuable company.
Q: Now high-performance general-purpose models are increasingly closed-source, yet you need sufficiently large models to work well. If in the future models are all closed-source and you even become a model purchaser, how much profit margin remains?
Kaijie Chen: I think there will always be open-source models. Right now the gap between open and closed source isn't large. If one day the gap becomes huge, things might be different. But I believe China will continue to have good open-source models — that won't change. As long as there's a second place, people will still tend toward open source. If it really all goes closed-source and we have to purchase, then calculating cost-effectiveness, how much value serving users can generate — that might be future business model consideration. Today's company isn't at the stage of thinking about this. It's also possible that in such a scenario, we'd do something like what Microsoft and OpenAI did initially — deep partnership with a particular company. That's not impossible.
Q: Three years from now, how do you hope people remember Mindverse? Have you thought about the endgame — IPO, acquisition, or something else?
Kaijie Chen: The endgame in our minds is a flywheel between agent model and consumer product. Our technology drives product experiences that others can't replicate — this even includes hardware and other forms, and we're collaborating with some companies. Meanwhile, this training and deployment capability will serve more and more enterprises; the B2B line is also growing fast. Looking further out, this industry's endgame might be astronomical compute deployed in space, exceeding today's national power generation — that's a distant vision. Everything else is process.
Q: If you had to choose between "making a history-changing research breakthrough" and "building a sustainably profitable company," which side would you pick?
Kaijie Chen: We'd choose the research breakthrough side. It's not that we don't care about profitability — we believe that if you truly solve problems others can't solve technically, business will naturally come to you. The reverse doesn't hold.
If this path succeeds for an ordinary person, their life would have fewer worries and more grounded happiness. But everyone's circumstances differ, and sorrows and joys are a thousand faces for a thousand people — and this is precisely the full meaning of "personalization." Not giving everyone a smarter model, but making intelligence truly understand every unique soul.

