Ma Lijia of Westlake Yungu Intelligence Pharma: AI Knocks on the Door to a Cure for Gene-Editing Therapies | Gaorong Ventures

高榕创投高榕创投·July 4, 2023

AI for Science in practice.

As modern biomedicine advances, our understanding of life grows ever more sophisticated. Gene technology, in particular, allows us to delve into the "book of life" — tracing diseases to their origins and devising countermeasures. Whether for rare diseases or cancer, gene editing holds the promise of correcting errors at their source, demonstrating almost magical "one-time cure potential."

CRISPR, the third generation of gene editing technology, has been revolutionizing basic and clinical biomedical research since scientists first demonstrated its ability to achieve efficient genome editing in mammalian cells in 2013. With its advantages of efficiency, convenience, and broad applicability, CRISPR has dramatically improved our capacity to repair genomic errors — an achievement recognized with the 2020 Nobel Prize in Chemistry.

In recent years, CRISPR gene editing therapy pipelines have been emerging globally, spanning blood disorders, metabolic diseases, muscular dystrophy, and oncology. The world's first CRISPR gene editing therapy for sickle cell disease (SCD) and transfusion-dependent beta-thalassemia (TDT) was submitted for regulatory approval this year.

Yet gene editing therapies, and gene therapy more broadly, still face numerous hurdles: complex manufacturing processes, exorbitant drug prices, and safety profiles that require continued observation. Is it possible to dramatically reduce the cost of gene therapies in the future? And can AI help accelerate the many steps along the path to approved drugs?

For Lijia Ma, founder of Westlake Valley Intelligent Therapeutics, the answer is clear: "This is something we're deeply committed to. We want to combine high-throughput biotech data with AI technology to empower gene editing therapy across safety, efficacy, durability, and accessibility."

Ma heads the Functional Genomics and Gene Therapy Laboratory at Westlake University. She earned her PhD at the Beijing Institute of Genomics, Chinese Academy of Sciences, and completed postdoctoral work at the University of Chicago's Institute for Genomics and Systems Biology, with long-term focus on large-scale functional genomics research. In February 2021, she co-founded Westlake Valley Intelligent Therapeutics, an AI-enabled gene editing therapy company dedicated to implementing artificial intelligence across the full workflow of gene editing therapeutics. Gaorong Ventures invested in the company's angel round in 2021 and continued to participate in its Pre-A round.

Recently, at Gaorong Ventures' Generative AI series event, Ma shared the company's latest deep learning model developments, its AI for Science practice in gene editing, and how it is moving from technological innovation toward product realization and commercial readiness.

Gene editing therapy is still in its early stages but carries enormous market value. "One striking indicator: current gene editing therapeutics on the market cost about $3.1 million per injection. That really puts it in perspective," Ma noted. "The core thing we're doing is organically integrating biotech and AI to make gene editing drugs safer and no longer prohibitively expensive in the future."

From the company's founding, Ma and her team have pursued frontier technology development built on iterating high-throughput, high-precision, high-dimensional biotech data production with deep learning model construction, aiming to develop differentiated gene editing therapeutic products.

Traditional drugs generally target proteins, while gene therapies target aberrant DNA. "When we first proposed AI-enabled gene editing therapy, many people didn't understand, because biological experiments in this field take too long to collect data and are too inefficient."

But Ma believes that based on the Central Dogma of molecular biology — the complex process from DNA transcription to RNA, then translation to protein, and ultimately to cellular functional and morphological changes — if measurable dimensions can be identified and expressed in appropriate data formats that become machine-learnable corpora or training datasets, then AI empowerment becomes viable.

Ma explained how CRISPR works: "Through a designed guide RNA (gRNA) whose sequence is complementary to the target genomic sequence, you can bring the Cas nuclease to the right location. The Cas enzyme has the ability to modify DNA sequences, enabling precise gene editing through insertion, deletion, or base substitution at specific sites."

Video: Understanding CRISPR mechanism

That is, a gene editing therapeutic system requires three basic components — guide RNA (gRNA), Cas nuclease, and delivery system. Ma vividly compares the gene editing therapeutic system to a precision-guided long-range missile system: "The gRNA is like GPS navigation, responsible for finding the error and plotting the course; the delivery system is equivalent to the propulsion system; and the Cas nuclease is like the engineer who travels the planned route to the designated location and ultimately corrects the error."

Encouragingly, all three components are "programmable" and can be studied through big data and deep learning approaches.

Meanwhile, next-generation sequencing technology in the biotech field can read nucleic acid sequences at large scale and high quality, rapidly producing biological data.

The programmability of all components in the gene editing therapeutic system, combined with the maturation of next-generation sequencing technology, are the two preconditions that allow AI to make its mark in gene editing therapeutics.

So how do scientists develop a CRISPR gene editing drug? It mainly involves three steps.

Target Identification: Finding the genomic DNA that needs modification;

Editing Strategy: Choosing which "GPS (gRNA)" and Cas enzyme to use, which significantly impacts drug development efficiency;

Delivery System Design: Different targeting objectives require different delivery systems.

At these critical steps, Westlake Valley Intelligent Therapeutics has originally proposed a Biotech+AI-assisted paradigm for gene therapy product development.

Ma explained that at the target identification stage, DNA pre-trained models can be combined with high-throughput functional genomics to accelerate target discovery. At the strategy selection stage, predictive models and databases help identify the most suitable gRNA and Cas enzyme. At the delivery stage, deep learning models help identify optimal delivery vectors.

Ma also shared the team's latest achievement — a deep learning model built through a novel strategy that can effectively predict multidimensional gRNA performance in CRISPR, further guiding the design of gRNA molecules with specific targeting and stability properties. The related research was published in Cell Discovery on May 16 ("Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities").

The human genome contains 6×10⁸ potential gRNAs with NGG-PAM. "In the ideal scenario, we would experimentally measure all possible paths corresponding to every target in the coordinate system and store them in the 'GPS.' But such experimental volume is beyond current technological capabilities." Ma and her team selected approximately 1% of this data space for high-quality experimental sampling. Supported by a high-throughput gRNA-target sequence library, they comprehensively characterized the on-target editing efficiency, off-target editing specificity, and DSB repair profiles of 920,000 gRNAs across two human cell lines (K562 and Jurkat), "obtaining very clear label values."

Based on this currently largest and highest-quality gRNA-target sequence dataset, Westlake Valley Intelligent Therapeutics developed corresponding deep learning models, including AIdit_ON for predicting gRNA activity, AIdit_OFF for predicting off-target activity, and AIdit_DSB for predicting SpCas9-induced DSB repair profiles. "In the future, when a new coordinate appears, the 'GPS model' can help us rapidly plan the optimal route."

"In head-to-head comparisons with other similar models, our models show significant advantages." Ma and her team also built a public website embedding the three high-performance models (https://crispr-aidit.com). Researchers can input gene names, sequence fragments, or FASTA files with sequences to obtain multidimensional gRNA prediction data, enabling more precise gRNA selection.

Ma further explained that the team used this model to attempt drug pipeline design for a rare disease, compressing what previously required 3-4 years of gene editing strategy design into just over three months. The pipeline is expected to advance to clinical stages in the second half of this year. "This shows the tremendous transformation that AI-enabled gene editing therapy research paradigms can bring to drug development efficiency."

Beyond its gene editing platform, Westlake Valley Intelligent Therapeutics is also developing a target discovery platform for predicting new gene therapy targets; a novel enzyme discovery platform to develop proprietary gene editing enzymes and address patent issues for global expansion; and a delivery vector platform to create a new paradigm for AI-generated AAV mutant evolution.

"CRISPR technology itself is very new," and as a pioneer, Ma shared some challenges encountered along the way.

First is abstracting biomedical problems in gene editing into computational biology problems: "You must transform biotech domain problems into datasets that are suitable for machine learning."

Second is where the data comes from. "Data production in the gene editing field is extremely difficult and requires very strong industry know-how."

The third major challenge is cross-disciplinary integration between biotech and AI. "Our AI experts must have strong willingness and ability to actively learn about gene editing technology. Otherwise, they won't know how to tune parameters or build models."

Westlake Valley Intelligent Therapeutics has assembled an interdisciplinary team spanning genomics, deep learning, cell biology, and immunology. "We've gone through磨合 along the way and understand each other better now. For example, after our AI model iterations, the experimental team needs 3-6 months to produce data — a timeline that can be hard for AI people to grasp. Facing such challenges, we accelerate the experimental team's data output while the AI team continues optimizing models."

Built on a foundation of high-quality data and original technology-driven gene editing therapeutic platforms, Westlake Valley Intelligent Therapeutics aims to "more rapidly decipher problems in the genetic code and correct them through gene editing." The driving force behind this is a deep conviction that "gene editing drugs will inevitably become as important to human health as traditional small-molecule and large-molecule drugs."