AI Virtual Cells: Exploring Digital Life | FreeS Fund Report
Why We're Bullish on AI Virtual Cells
Cells are the fundamental building blocks of life, and the cells in our bodies are constantly changing. According to a 2021 study published in Nature Medicine, a person renews roughly 60–100 grams of cells daily — about 330 billion cells. That means approximately 3.8 million cells are being replaced in your body every single second.
From a microscopic perspective, complex exchanges of matter, energy flows, and signal transmissions are happening nonstop between the inside and outside of cells. These dynamic yet orderly processes drive our growth, development, aging, and even disease.
Understanding changes in cell state helps us understand life itself. In humanity's long journey to explore the mysteries of life, we've relied on microscopes to observe cell morphology, in vitro experiments to parse biochemical reactions, and cell models to test drug efficacy. Now, a new paradigm is emerging: the AI virtual cell — using artificial intelligence to simulate and explore the processes of life.
Under this new paradigm, a cell's state can be analogized to an "article." Biological observational data such as gene sequences, gene expression, and cell images are like words and sentences, while the biological rules governing cell fate are the underlying logic and grammar of that article. What the AI virtual cell does is "read" this article and infer the meaning behind cellular behavior.
In this research report, we'll focus on core topics related to virtual cells, including but not limited to:
- What is a virtual cell? What problems can it solve?
- Why has the virtual cell received so much attention in the past year?
- What innovation opportunities exist in the virtual cell space?
We hope this brings fresh perspectives and angles. If you're following the virtual cell industry, feel free to reach out to the author, Da Xie (xie.da@freesvc.com).
Reader Giveaway What changes and value do you think virtual cells will bring? Share your thoughts in the comments. By 17:00 on June 26, 2025, the three most thoughtful commenters will each receive a copy of physicist Erwin Schrödinger's What Is Life?.
What Is an AI Virtual Cell?
Apart from viruses, the cell is the smallest unit of life — it is not only the "brick" that makes up our bodily structures, but also the basic unit that carries out life functions. Understanding changes in cell state helps us understand life itself.
Deciphering how cells and genes work in an orderly fashion has always been one of the most central propositions in the life sciences.
In traditional biology, we typically follow a "hypothesis-experiment" research paradigm to understand different life phenomena. But when it comes to systematically reconstructing the full picture of a cell, traditional research methods fall short: whether describing cellular dynamics with abstract mathematical and physical formulas, or comprehensively studying every gene and gene combination in the human genome, significant challenges remain.
"AI virtual cells" are showing potential to address these challenges. It is not an intuitively visible physical simulation, but rather a foundational model for the life sciences — similar to GPT-4o or DeepSeek.
By deeply learning from massive biological datasets and medical imaging information, AI virtual cells can predict how cells respond under different biological conditions from a global perspective. For example: Will a certain drug activate a specific pathway in a cell? What type of cell will a stem cell differentiate into under specific conditions? How does a cell interact with its neighboring cells?
This modeling approach differs from traditional theoretical deduction. It uses data as its language, letting the AI itself "learn" to depict cell states. In other words, it doesn't necessarily need to fully understand the mechanism of every gene first; instead, it leverages statistical patterns in large-scale data to rapidly identify potentially valid biological conclusions that have yet to be discovered.
Can Life Be Digitally Simulated?
When discussing virtual cells, one unavoidable core question is — can life be "simulated"? Before diving deeper, let's return to the definition of life.
I. The Definition of Life
Across different disciplines, the definition of life varies slightly:
- Biology: Life is defined by characteristics such as metabolism, growth, development, and reproduction.
- Chemistry: Life depends on the interaction of organic molecules (such as nucleic acids and proteins), capable of self-maintenance and self-replication.
- Physics: Focuses on the relationship between energy and entropy. Simply put, life continuously consumes energy to maintain the order of its various functions.
- NASA: Defines life as a chemical system capable of self-maintenance and Darwinian evolution.
Synthesizing these perspectives, if a system has clear boundaries, can maintain internal order through energy exchange, and possesses self-replication ability, it can be considered life. If we can embody these characteristics of life in the digital world, then simulating life becomes theoretically feasible.
Although organisms are complex, they are highly ordered — whether in metabolism, energy exchange, or self-replication, this orderliness is evident. It is precisely because of this that building virtual cell models through AI becomes practically possible.
II. Is Simulating Life at the Cellular Level Necessary? Is It Feasible?
As the basic unit of life activities, cells play an indispensable role in scientific research and drug development. Researchers exploring the minimal genome required for life activities, or testing the activity of drug molecules in inhibiting tumor cell proliferation, often conduct their work at the cellular level.
Physicist Richard Feynman once said: "What I cannot create, I do not understand." Building foundational virtual cell models may be an important step toward deciphering the orderly mechanisms of life.
Virtual cell technology, by simulating changes in cell state, not only helps reveal biological signaling pathways but also holds promise for advancing drug manufacturing processes and optimizing engineered cells in pharmaceutical industrial production.
So, does AI have the capability to "simulate" cell states?
If understanding cell states is likened to solving complex systems of equations, then AI may be that expert skilled at finding "solutions." There are three key reasons:
First, the unknowns in the equations are finite.
Although gene regulation, metabolic pathways, and signal transduction inside cells appear complex, they are not entirely random — they are highly structured and ordered. This means that even when thousands of possible biomolecular interactions exist simultaneously, the core variables that truly affect cell state are actually limited. Or put another way, cellular complexity can potentially be "captured and modeled."
Second, we already have a large number of "equations."
Advances in modern molecular biology and imaging technology have enabled scientists to acquire massive amounts of causal data about cell state changes. For example, through gene editing, single-cell omics, high-resolution imaging, and other integrated approaches, we can meticulously observe and interpret how specific genetic alterations change cell states. These observational data are like individual equations, helping us gradually approximate a true model of cell state.
Third, the "tools" for solving equations are already in place.
Building on the orderliness of life, deep learning algorithms have demonstrated powerful capabilities — they can extract low-dimensional underlying patterns from high-dimensional, complex biological data and generalize to previously unobserved scenarios, making reasonable and effective inferences.
Therefore, with the accumulation of individual characteristic data, advances in biological omics observation methods, and the development of deep learning algorithms, simulating cell states is gradually becoming reality.
III. Industry Validation Cases: Evo2 and Geneformer
In fact, the approach of using foundational models to simulate living systems has already received preliminary validation in the industry.
Evo2, developed by the US-based Arc Institute, was pre-trained on 930 million nucleotide sequences, enabling it to accurately judge the likelihood and reasonableness of base mutations in gene sequences — predictions that align with biological common sense.
Geneformer, developed by Harvard University's Xiaole Shirley Liu's team, was trained on 30 million single-cell transcriptomes. During training, partial gene expression information was randomly masked, and the model had to "fill in" this missing information based on the expression of other genes.
Experiments show that Geneformer possesses strong reconstruction capabilities. More importantly, in practical applications, Geneformer can accurately classify cells and demonstrates good generalization and "analogy" capabilities for novel cell types it has never seen before.
Both Evo2 and Geneformer provide important upstream tools for life science research and even drug development. By comparison, Evo2 reveals patterns implicit in gene sequences across different species (such as humans and dogs), while Geneformer focuses on gene expression regulation patterns across different cell types. This may explain why, at the virtual cell level, model construction generally begins with single-cell transcriptome data (similar to Geneformer).
What Problems Can Virtual Cells Solve? What Changes and Value Will They Bring?
Since building virtual cells is technically feasible, what specific real-world problems can virtual cells solve? And how will they drive transformation and development in the biopharmaceutical industry?
I. Understanding the Complexity of Gene Regulation and Predicting New Targets
Despite continuous advances in medical research, our understanding of human biology remains limited.
Take drug development as an example: among roughly 20,000 human genes, fewer than 5% (about 700) are associated with existing drugs. This means current drug development may have only touched the tip of the iceberg of potential possibilities, with numerous targets still waiting to be discovered.
Furthermore, various model cell lines (such as yeast, HEK293, and CHO cells) play important roles in scientific research and fermentation production, yet their genomes and gene interaction networks have not been fully characterized. Traditional "hypothesis-verification" methods often reveal only local interactions between genes, making it difficult to provide comprehensive insights.
By contrast, virtual cell models offer a data-driven new perspective by integrating cellular omics data and imaging information. They can understand complex interactions between genes from a global perspective, uncover undiscovered gene interaction patterns, and predict new drug targets.
II. Potentially Improving Drug Efficacy in the Human Body
The modern drug development process typically begins with preliminary experiments in test tubes, gradually transitioning to cell or cell cluster-level screening, then animal model validation, and finally human clinical trials.
However, humans differ significantly from other animals biologically. Even primates closest to humans in evolutionary terms, such as monkeys and apes, are separated from us by approximately 25 million years of evolution. Consequently, many drugs that perform well in animal models fail to replicate that success in human clinical trials. Even among drugs that ultimately receive approval, clinical efficacy rates still face challenges.
Virtual cell technology offers a new solution to this problem. By building and training models based on large amounts of real human cell data, virtual cells can more accurately simulate the biological characteristics of human cells. This makes virtual cell models more aligned with actual human conditions when assessing whether drugs activate specific pathways, induce apoptosis, or trigger drug resistance — thereby potentially improving drug efficacy in human settings.
III. Leading a Paradigm Shift in Drug Development
In recent years, the cost of new drug development has continued to rise. In 2010, the average R&D investment per approved drug was approximately $1 billion. By 2023, this figure had climbed to about $2.3 billion. Yet the average internal rate of return (IRR) for in-house R&D pipelines at pharmaceutical companies hovers around only 5%, far below the average level of other high-tech industries.
Against this backdrop, finding new methods that can significantly improve R&D efficiency has become an urgent need for the biopharmaceutical industry.
Virtual cell technology provides an entirely new solution. It bypasses lengthy basic research phases, using designed virtual experiments to more efficiently identify potential targets and screen for drugs more effective for patients. This approach reduces the most time-consuming wet lab screening, validation, and iteration processes in traditional development.
ARK Invest noted in its Big Ideas 2025 report that the combination of single-cell omics and AI will drive the development of virtual cells and reshape drug discovery.
With the support of virtual cell technology, in the short term, the role of wet experiments will shift from being the primary means of data generation to becoming a tool for validating virtual cell simulation results. In the long term, this paradigm shift is expected to significantly shorten drug development cycles, reduce costs, and enable the development of personalized medicines.
Why Are Virtual Cells Receiving Increasing Attention?
If we zoom out on the timeline, virtual cells are not an entirely new concept. As early as 1998, attempts were made to represent intracellular reactions with mathematical formulas. In 2012, Stanford University's Covert team successfully simulated the entire process of Mycoplasma from gene expression to division through complex mathematical modeling.
So why are virtual cells receiving increasing attention right now?
I. Policy Drivers: Regulatory Direction Accelerating Virtual Cell Implementation

In recent years, policy initiatives have begun actively promoting AI applications in the life sciences, becoming one of the key drivers for virtual cell development.
In 2024, the US President's Council of Advisors on Science and Technology proposed developing AI technology to reveal cell functions. Although the term "virtual cell" was not explicitly used at the time, the core idea already pointed toward using AI to understand complex cellular mechanisms.
In April 2025, the FDA announced plans to gradually eliminate animal testing requirements for drugs such as monoclonal antibodies, proposing AI-based toxicity prediction models, cell line testing, and organoid toxicity testing in laboratory environments as primary alternatives. In early June, the FDA even released its AI tool Elsa ahead of schedule.
These policies can be interpreted as regulatory agencies' gradual recognition of AI-driven new drug R&D models, paving the way for virtual cell technology applications.
II. In Scientific Research, Virtual Cell Technology Is Becoming a Focus of Attention

Since Geneformer's release in 2023, the academic community has repeatedly emphasized the enabling role of virtual cells in biomedicine through releasing new models and publishing perspective articles.

For example, foundational virtual cell models such as scGPT, scFoundation, and Geneformer have been published in top academic journals; the Chan Zuckerberg Initiative's scientific team released a long-term vision for using AI to build virtual cells and empower biomedical research; and Nature listed "biological foundation models (including AI-built virtual cell models)" among seven key technologies to watch in 2025.
III. Accelerating Industry Layout: Major Virtual Cell Projects Launching at Home and Abroad

At the industry level, virtual cells are showing strong momentum, with substantial capital flowing in and major projects being intensively implemented.
Abroad, AI pharmaceutical company Xaira Therapeutics, founded by Nobel Chemistry laureate Professor David Baker, released a large-scale single-cell perturbation sequencing dataset in June 2025 to support virtual cell research. Companies such as Recursion Pharmaceuticals have transformed to focus specifically on virtual cell modeling, exploring more efficient drug development pathways.
Domestically, multiple national-level major projects are also actively laying out in the virtual cell field. The "14th Five-Year" National Major Science and Technology Infrastructure — the Human Organ Physiology and Pathology Simulation Facility — began construction in Beijing in 2024. In March 2025, the national major science and technology infrastructure Human Cell Lineage Large-Scale Research Facility was launched in Guangzhou. These facilities may signify China's systematic layout in the virtual cell field.
IV. AI Empowering Pharmaceutical R&D Has Received Preliminary Validation, Bringing Imaginative Space for Virtual Cell Technology Development

Over the past decade, artificial intelligence has made significant progress in early-stage pharmaceutical R&D, particularly in target and molecule discovery.
A Boston Consulting Group research report shows that as of 2023, 67 AI-driven drug pipelines had entered clinical stages or received approval, with target discovery for 24 of these pipelines benefiting from AI technology. Additionally, AI has shortened preclinical drug development cycles by 30–50%, reduced costs by 25–50%, and achieved Phase I clinical trial success rates of 80–90% — far exceeding the industry average of around 50%.
As the next step, as a product of the intersection between artificial intelligence and life sciences, virtual cells are expected to play important roles in two key directions: gene regulatory network analysis and cell state prediction, and gradually integrate into the entire drug development process.
V. Thanks to Declining Data and Computing Costs, Barriers to Virtual Cell Development Are Lowering
The data and computing costs required to build virtual cell models are continuously decreasing.
With advances in sequencing technology and detection methods, the cost of acquiring multi-dimensional data such as genomics, transcriptomics, and proteomics has dropped substantially. A decade ago, whole-genome sequencing cost thousands of dollars; today it can be done for just a few hundred dollars.

Meanwhile, computing resource costs are declining at an exponential rate. The proliferation of specialized chips such as GPUs and TPUs has made processing massive biological data more efficient, improving the accuracy and generalization capabilities of virtual cell models.

The decline in data and computing costs is accelerating the development of virtual cell technology. The speed of data generation and analysis has increased. Lower costs have accelerated technology adoption, enabling more research institutions and startups to participate in virtual cell R&D.
Overall, virtual cells stand at a brand new development node. With policy support, scientific breakthroughs, industry investment, and AI technology advances, virtual cells are poised to welcome development opportunities and produce far-reaching impacts on the biopharmaceutical field.
Innovation Opportunities in the Virtual Cell Field
Since Geneformer's release in 2023, the cutting-edge interdisciplinary field of virtual cells has attracted increasing numbers of companies, particularly in Europe and America where startups have been especially active. These companies can be broadly divided into two categories:
The first type focuses on building foundational virtual cell models. Such models may leverage strong generalization capabilities to support multiple application scenarios, including but not limited to disease diagnosis, drug discovery, and personalized medicine.
The second type chooses to skip building foundational models, instead focusing on specific biological scenarios such as tumor drug sensitivity, iPSC (induced pluripotent stem cell) differentiation pathways, and embryonic stem cell development simulation — directly developing specialized cell models for these tasks.
I. Innovation Opportunities for Foundational Model Development Companies
1. Multimodal Data Collaborative Modeling: Enhancing Virtual Cell Model Flexibility and Generalization
Currently, due to limitations in biological data volume, foundational virtual cell models mainly rely on the single modality of single-cell transcriptomics. Introducing multimodal data (such as cell images, single-cell spatial omics, epigenomics, and proteomics) for collaborative modeling is a key innovation direction for enhancing virtual cell model flexibility and generalization capabilities.
By integrating multiple data types, virtual cell models can more comprehensively capture cell state changes and their complex biological contexts. When building these multimodal foundational models, the efficiency of data "alignment" at the single-cell level becomes a core competitive advantage for companies.
2. Application Advantages of General Foundational Models
From an application perspective, general foundational models are pre-trained on large amounts of data from different cell types and multiple omics types, possessing a broad knowledge base. This extensive pre-training endows the models with strong generalization capabilities, enabling derived specialized models to achieve high task completion even with limited task-specific data.
In fact, existing research has confirmed that in tasks such as cell classification, drug sensitivity prediction, and cell state prediction under genetic perturbation, specialized models developed based on general models outperform models designed solely for a single task.
3. Foundational Model Developers Need to Strengthen Collaboration with Third Parties
As the variety and complexity of niche scenarios and specific tasks in drug development increase (such as studying interactions between immune cells and tumor cells in the tumor microenvironment), companies with foundational model technology are gradually gaining industry attention.
In the biopharmaceutical field, data related to specific scenarios and tasks is often concentrated in the hands of drug development companies and research institutions. Promoting collaboration between foundational model developers and pharmaceutical companies as well as research institutes not only helps validate foundational models' transfer and generalization capabilities and promotes model iteration, but is also a natural choice for foundational model developers in the near term.
4. Long-term Development Strategy: Building Data Moats and Independent Validation Capabilities
In the long run, to establish data moats in niche areas and gain first-mover advantages, foundational model development companies need to gradually build capabilities for independently validating model predictions, and shift toward providing molecular or cellular-level collaborations or building proprietary pipelines. This is not only key to standing out in competition, but also an important step toward achieving higher-value conversion.
5. Typical Case: Recursion Pharmaceuticals
In the virtual cell foundational model field, Recursion Pharmaceuticals is a representative example. The company integrates pathway models, protein models, and atomic models to support its proprietary pipeline's clinical development. By using AI to analyze complex biological pathways and build detailed cellular network maps, it helps identify disease mechanisms and therapeutic targets, and simulates cell behavior under disease conditions.
II. Innovation Opportunities in Specialized Model Vertical Domains
Specialized models focus on training for specific cell types or application scenarios. With advantages in specific data types, their performance on specific tasks may be comparable to specialized models derived from foundational models. This development model may suit enterprises that already have clear industry demand validation and data accumulation in vertical domains.
1. Development Strategy: Leveraging Proprietary Data Advantages and Expertise to Achieve Closed-loop From Data to Application
The innovation opportunity for specialized models lies in "vertical" development — leveraging companies' proprietary data advantages and deep expertise to achieve closed-loop from data to application.
For example, companies with large amounts of clinical sequencing data in the immunotherapy space can prioritize developing specialized virtual cell models for immune T cells to support discovery of new immunotherapy targets; companies focused on tumor omics testing can train specialized tumor cell models based on patient sequencing results to accelerate screening for personalized tumor therapeutics or even develop proprietary pipelines.
2. Typical Case: Asimov
A typical case is Boston-based biotech company Asimov. This company provides drug or vaccine production optimization services to pharmaceutical companies by engineering CHO, HEK293, and other engineered cells.
Asimov possesses a vast genetic element library and multi-omics measurement systems, based on which it has developed a full-stack genetic circuit design platform for programming living cells. Through virtual genetic circuit modifications on this platform, Asimov can predict engineered cells' production efficiency and put optimized engineered cells into wet lab validation. Currently, Asimov's target is achieving stable yields of 7–11 g/L, and if delivered modified CHO cell productivity falls below 5 g/L, the company provides these cell lines to customers free of charge.
III. Summary
ARK Invest founder Cathie Wood has predicted that within the next five years, training costs for virtual cell foundational models will drop by more than two orders of magnitude. This means more companies will have the opportunity to enter this field, choosing AI virtual cell development paths from foundational models to specialized models, with more participants including established pharmaceutical companies also entering.
Taking the end goal as the starting point, specialized model development capabilities and performance in niche scenarios may be key to forming competitive advantages. For startups, specialized model development capabilities, building proprietary data moats, and targeting actual products such as cells or molecules as deliverables may be three important early leverage points.
Looking ahead, in the intersection of artificial intelligence and life sciences, virtual cells will become the industry's frontier and next milestone.
Reader Giveaway What changes and value do you think virtual cells will bring? Share your thoughts in the comments. By 17:00 on June 26, 2025, the three most thoughtful commenters will each receive a copy of physicist Erwin Schrödinger's What Is Life?.

Sex Differences in the Immune System and Their Impact | FreeS Research
Long Read: China's Biopharma "DeepSeek Moment" | FreeS Report
Seven Core Questions About DeepSeek, Explained | FreeS Report
AI for Science: At the Turning Point of the Research Paradigm | FreeS Report
2025: Three Directions Where AI Is Reshaping Drug Development
Star the FreeS Fund WeChat Official Account — timely business insights delivered to you
