2025: Three Directions Where AI Is Reshaping Drug Discovery

峰瑞资本·January 16, 2025·18·2

How Can AI Help Drug Development Cross the "Valley of Death"?

In 2024, the Nobel Prize in Chemistry was awarded to three scientists for their work in protein structure prediction, two of whom received the honor for developing AlphaFold.

AlphaFold is an AI tool that solves the protein structure prediction problem. Proteins are essential components of human cells and tissues, and protein structures serve as critical starting points for drug discovery — most drug development relies on rational design based on protein structures. AlphaFold demonstrated the potential of AI and computational technology in biomedicine. In recent years, the advancement of large language models like GPT has opened up even more possibilities for the biopharmaceutical industry.

According to tech media outlet Tech Emergence, AI technology could save the pharmaceutical industry up to $26 billion annually in R&D costs. A 2024 study by Boston Consulting Group found that AI-generated drug molecules achieved success rates of 80% to 90% in Phase I clinical trials, well above the historical average of 50%.

While some in the industry once questioned AI's prospects in pharmaceutical applications, today AI-related research is being conducted across every stage of drug development — from initial target identification to compound discovery, preclinical research, clinical trials, and even post-market safety monitoring and commercialization.

As early as 2015, Lipeng Lai, along with Shuhao Wen and Jian Ma, co-founded XtalPi. They chose a distinctive path: deeply integrating AI technology with robotic automation to provide drug discovery and materials science R&D solutions and services to global and domestic companies in pharmaceuticals, materials science (including agricultural technology, energy, novel chemicals, and cosmetics), and related industries.

During the COVID-19 pandemic, XtalPi used AI prediction algorithms combined with experimental validation to help Pfizer identify the optimal crystal form of Paxlovid (an oral COVID-19 treatment) in just six weeks, accelerating the drug's development process.

In 2016, FreeS Fund became an investor in XtalPi's Series A round. In June 2024, XtalPi (2228.HK) was officially listed on the Main Board of the Hong Kong Exchanges and Clearing Limited, becoming what is known as "China's first AI pharmaceutical stock."

At the recent FreeS Fund 2024 Annual Investor Summit, Lipeng Lai, co-founder and Chief Innovation Officer of XtalPi, delivered a speech titled "The Value and Future Opportunities of AI in Drug Innovation," exploring the new possibilities created by the intersection of AI and biopharmaceuticals. Topics he covered included:

What transformations has AI brought, from protein structure prediction to protein design?
How can AI help overcome "Eroom's Law" in biopharmaceuticals?
How can we find a globally optimal drug development path?
How can we improve the utilization efficiency of bioscience data?
How can we bridge the "Valley of Death" from preclinical to clinical translation?

Reader Giveaway

What are your thoughts on AI's role in biopharmaceuticals? Leave a comment below. By 5:00 PM on January 21st, we will randomly select 5 readers to receive the latest industry research handbook written by the FreeS Fund team.

/ 01 / What Role Does AI Play in Biopharmaceuticals?

Biomedicine is a market of enormous social value with steady growth, but it is also full of challenges.

Specifically, drug development typically faces the "rule of three 10s": development cycles exceeding 10 years, investments over $1 billion, and success rates below 10%. The pharmaceutical field also has a famous "Eroom's Law," which describes how returns on biomedical investment decline exponentially over time. In the semiconductor industry, Moore's Law states that the number of transistors on an integrated circuit doubles approximately every two years. The pharmaceutical counterpart is humorously called "Eroom's Law" — "Moore" spelled backwards.

On the other hand, in recent years we have particularly felt the impact of AI on biopharmaceuticals.

In 2016, AlphaGo defeated Lee Sedol, and people witnessed a historic moment — though some at the time believed AI's capabilities might be limited to gaming.

Just two years later, AlphaFold emerged, making a major impact on biopharmaceuticals. Proteins are essential components of human cells and tissues, and protein structures are critical starting points for drug discovery. Most drug development relies on rational design based on protein structures.

Previously, obtaining a single protein structure cost tens of thousands to hundreds of thousands of RMB, making large-scale research prohibitively expensive for biopharmaceutical companies. While AlphaFold did not perfectly solve this problem at the time, it proved the potential of AI and computational technology in biomedicine. The recent development of large models like GPT has opened up even more possibilities for the industry.

Some voices in the industry once questioned AI's prospects in pharmaceuticals, but this is no longer a concern.

At every stage of drug development — from target identification to compound discovery, preclinical research, clinical trials, and post-market safety monitoring and commercialization — companies are conducting AI-related research, and there are already many successful commercialization cases. Additionally, in 2024, the Nobel Prize in Chemistry was awarded to three scientists for their outstanding contributions to protein structure prediction.

Based on structure prediction, many clinical applications have already emerged. Here are three examples.

The first is mini-protein design, a field that has garnered significant attention in biomedicine. Mini-proteins are approximately 60 to 100 amino acids in length. Due to their stable protein structures, they exhibit good stability in vivo and may even have cell-penetrating properties (the rate at which compounds pass through lipid membrane structures, which affects drug absorption and efficacy). During the COVID-19 pandemic, we designed a mini-protein targeting the interaction between the SARS-CoV-2 virus and cellular transfection. In pseudovirus experiments (pseudoviruses have biological properties similar to live viruses but lack their pathogenicity), it demonstrated significant viral transmission blocking effects.

The second example involves a critically important class of proteins in the human body — cytokines. Cytokines play important roles in immune response and antiviral defense. We conducted research on interleukin-2, a type of cytokine. With AI-assisted design, we were able to quantitatively regulate interleukin-2's activation of immune responses in the human body while reducing its immunosuppressive functions.

Multiple companies have attempted this research. Typically, experience-based protein design involves making modifications at the protein-protein interaction interface — "where you and I interact, that's where I make changes." But through AI, we can identify so-called distal effective mutations. That is, the mutations we use to optimize interleukin-2 are not at the protein interaction surface, but rather exert indirect effects on protein function through distal mutations. This greatly expands the design space available for new drug development and commercially increases opportunities for generating new patents and novel drugs.

The final example is industrial enzyme engineering. By introducing mutations through AI, we developed five candidate molecules. This protein achieves three times the catalytic efficiency of the wild-type enzyme at 50°C.

These examples demonstrate that AI applications have already moved beyond protein structure prediction.

In the currently high-profile fields of mRNA (messenger RNA) and siRNA (small interfering RNA), we have also conducted relevant research.

For mRNA, including mRNA vaccines and other mRNA products, we improved mRNA stability and intracellular expression through coordinated optimization of non-coding and coding regions. For siRNA, we used algorithms to enhance its targeting precision and silencing efficiency. This has potential value for improving drug efficacy, reducing toxicity, and lowering costs.

/ 02 / How to Find a Globally Optimal Drug Development Path?

AI already has many applications in biopharmaceuticals, so what is its greatest value?

Coming from a physics background, I particularly appreciate an example from a 17th-century mathematical problem called the "brachistochrone problem": a ball rolls from the leftmost point of an inclined plane to the rightmost point — what trajectory minimizes the travel time?

This problem seems simple, with only gravitational acceleration as a constant, yet the finest mathematicians of the time spent years solving it. Interestingly, the final solution is highly counterintuitive. The fastest path is the red curve shown in the figure.

What feels most counterintuitive is this: to achieve the globally fastest speed, the ball's horizontal position must first drop below the endpoint before rising again. This approach is difficult to arrive at through intuition alone. This curve is known as the "brachistochrone."

Drawing a parallel to biomedical R&D, this example has intuitive significance. Drug development project initiation is the leftmost point of the incline; successful market launch and commercial returns are the rightmost point. In this process, what we pursue is not every local optimum, but how to plan a path that improves drug development efficiency and benefits patients.

The core factor in the brachistochrone problem is the gravitational acceleration constant. In biomedicine, you need to collaborate with experts across hundreds of different specialized fields, while also considering a series of policy factors including reimbursement, health insurance, regulatory review, oversight, and more.

In this context, how to find a globally optimal drug development path represents the most fundamental value AI can provide. Put another way, this is thinking from first principles about how to use AI to help us anticipate what might occur along this path.

This approach has already been validated in practice. In a 2022 article, Pfizer compared its internal success rates from Phase I, II, and III through comprehensive clinical development against industry averages. The data showed that Pfizer's overall success rate after 2019 was significantly higher than the industry average. They summarized three key lessons: deeper biological understanding, greater molecular diversity, and the introduction of more quantitative standards in decision-making.

Connecting this to AI technology, these three points can be significantly enhanced. We can introduce more global understanding of biology into the drug development process, comprehensive knowledge of different molecular types (such as small molecules, antibodies, fusion proteins, or cell and gene therapies), and incorporate more expert opinions and quantitative metrics. Practice has proven that we can break through Eroom's Law and deliver due commercial returns.

/ 03 / How to Enhance the Value of Bioscience Data?

AI primarily improves R&D efficiency in biomedicine on the data side. Especially in protein structure prediction and protein molecular design, the 2024 Nobel Prize in Chemistry laureate David Baker's team has already done excellent work — so is there still room for development in this field?

Here is an internal case study of ours. In the peptide field (specifically referring to short peptide sequences of no more than 30 amino acids), we started with 200,000 publicly available data points, obtained 800,000 internal data points through customized data augmentation, and collected approximately 10,000 fully proprietary internal data points. Based on this model, our performance in peptide design surpassed ProteinMPNN (the protein design model used by David Baker's team).

I need to be clear here: we are not claiming to surpass David Baker's team's overall capabilities — they have indeed done outstanding work. But in specialized biomedical AI R&D, through proprietary data accumulation and model fine-tuning, we may achieve higher accuracy than general-purpose models.

Ilya Sutskever, former Chief Scientist at OpenAI, once suggested that pre-training models may be approaching a ceiling, and that future AI development may depend more on agentic AI and reasoning capabilities.

This assessment may hold true for the internet sector, but in AI for life sciences, I believe there is still substantial room. (See "AI for Science: At the Turning Point of a Research Paradigm | FreeS Report")

First, biological data remains limited. Second, as this example shows, there is still significant room for improvement in AI with proprietary data. Therefore, we believe that in the next three to four years, data will remain a critical factor for AI in biomedicine to achieve real-world impact and demonstrate value.

There are three methods to enhance data value.

First, improve data quality and reduce data acquisition costs through a series of automated or standardized high-quality experimental methods.

At XtalPi, we have built a large-scale automated chemistry experiment cluster.

On top of the hardware, we have constructed a three-layer architecture: closest to the physical hardware is a digital twin system, where all new experimental protocols are first run in the digital twin environment before physical deployment. The middle layer is a chemical programming language that can describe all chemical reactions programmatically. The top layer is a natural language system where users can interact with the system in natural language, which is then translated into chemical language and executed in the automated system. These are not concept images — they are systems already implemented in our Shenzhen and Shanghai laboratories.

Second, develop new experimental methods to increase data dimensions. In recent years, multi-omics technologies, spatial omics, and high-content experimental techniques have gradually developed, generating vast amounts of data.

Nature magazine published a paper in December 2024 (Method of the Year 2024: spatial proteomics) discussing spatial proteomics.

Currently, spatial transcriptomics receives more discussion in the industry (it can analyze gene expression patterns of individual cells and the spatial relationships and biological characteristics of cell populations from temporal and spatial dimensions). But proteins are molecular types more directly related to life activities. Therefore, if we can achieve spatial proteomics analysis, linking spatial cell type information with proteomics data, this would be an exciting development.

Third, after obtaining large amounts of data, matching data analysis methods are needed. Because big data often comes with substantial noise, improving signal-to-noise ratio through better data analysis methods is crucial.

Looking back at the development of deep learning, 2012 was a pivotal moment. That year, computers surpassed humans in image recognition accuracy for the first time. One key factor was that previous machine learning-based methods relied on manually defined features, such as eye distance, nose length, mouth width, and face aspect ratio. In contrast, convolutional neural network-based methods only required inputting image pixels, without manual, experience-based pre-judgment of features.

Similarly, when processing single-cell big data, we made adjustments based on the Attention structure of Transformer models. The main difference of this adjustment is: in the past, each single cell was processed with equal weight, with humans manually excluding poor data and retaining quality data based on experience, but the retained data still had uniform weights.

With the Attention mechanism, the machine judges the quality of single-cell data and assigns dynamic weights between 0 and 1. This improvement alone has led to significant improvements across different drug development-related tasks.

I believe that in the future, Transformer-based models or newer architectures will deliver better results than traditional analysis methods.

The Future: Three Directions for AI to Reshape Drug Discovery

How will AI affect drug development in the future?

We are optimistic about three directions. First, although AI has applications across the entire industry chain, the next focus is enabling AI to help pharmaceutical companies bridge the "Valley of Death" from preclinical to clinical translation. If this breakthrough can be achieved, it will completely reshape modern drug development processes and approaches.

The challenge in drug development is that we cannot directly test drugs on humans due to both cost considerations and ethical constraints. Therefore, all drug development work strives to build a human-independent evaluation system, and we hope this system has better correlation with human biology. After training, large language models undergo reinforcement learning from human experience — this is an alignment process between human knowledge and machine models, using human experience to help large models make judgments.

Previously, bridging preclinical to clinical translation relied on experts' selection of biomarkers (biomarkers are objectively measurable biological parameters used to indicate disease status, physiological processes, or response to treatment). XtalPi is attempting to reference the RLHF (Reinforcement Learning from Human Feedback) approach, relying on actual clinical disease states to guide selection. Currently, we have made preliminary progress in Alzheimer's disease and depression, and hope to develop better preclinical screening models for psychiatric and neurological diseases in the future.

Second, agent-based systems built on large models and human-machine collaboration with experts.

Drug development requires bringing together different experts for decision-making, which is challenging for many companies, especially early-stage biotech companies. If we can use large models to build virtual experts in different specialized fields and achieve human-machine collaboration, this would be extremely valuable work.

In 2019, AstraZeneca published an article proposing the 5R framework, explaining how to integrate different expert opinions to increase drug development success rates from 4% to 19%.

At XtalPi's Innovation Center, there is a project called "Project42" — the name 42 comes from a science fiction novel's reference to 42 as the ultimate answer to the universe. This is an intelligent agent interaction system where you can not only converse with large language models but also receive support from experts in clinical research, antibody research, drug design, and other areas. This project can basically enable 1-2 people to improve efficiency by 5-10 times, completing foundational drug design, literature research, and other work.

Third is building an open ecosystem. Cristóbal Valenzuela, co-founder and CEO of American video generation startup Runway, once said: "AI is becoming infrastructure, as important as electricity or the internet. Calling yourself an AI company today is like calling yourself an internet company. It means nothing because it's general. Every company uses the internet; every company will use AI."

We hope to learn from internet development, where the most important lesson is the concept of open communities or open-source ecosystems. Currently, we are actively working to build a platform ecosystem through close collaboration in technology, capital, scientific research, and industry. On this foundation, we have participated in and incubated innovative companies or projects in different directions. These projects span oncology, immunology, vaccines, and also extend to materials design, agriculture, anti-aging, and other directions.

Finally, we must return to the original intention of making medicines. I have always agreed with a statement by George W. Merck, second president of Merck & Co.: "We must never forget that medicine is for the people. It is not for the profits. If we remember this, the profits will follow, and they never fail to appear." I believe this is a philosophy that both technology platforms and pharmaceutical companies should uphold.

Reader Giveaway

▲ Li Feng x Qingyou Guan: In 2025, What New Opportunities Remain in the Consumer Market?

▲ eVTOL, China's Next Opportunity to Overtake on a Curve

▲ From "How the World Got Here" to "Where We Are Going" | Recap of FreeS Fund 2024 Annual Investor Summit

▲ Outlook for 2025: What Innovation Opportunities Exist in the AI Industry? | FreeS Report

▲ AI for Science: At the Turning Point of a Research Paradigm | FreeS Report

▲ Congratulations on XtalPi's Successful IPO: FreeS Fund's Research-Driven Approach, Learning to Invest Through Investing

Star the FreeS Fund WeChat Official Account

Timely business insights delivered to your feed