image: (From Left) Distinguished Professor Sang Yup Lee, Dr. Gi Bae Kim, Professor Bernhard Palsson
Credit: KAIST
“We know the genes, but not their functions.” To resolve this long-standing bottleneck in microbial research, a joint research team has proposed a cutting-edge research strategy that leverages Artificial Intelligence (AI) to drastically accelerate the discovery of microbial gene functions.
KAIST announced on January 12th that a research team led by Distinguished Professor Sang Yup Lee from the Department of Chemical and Biomolecular Engineering, in collaboration with Professor Bernhard Palsson from the Department of Bioengineering at UCSD, has published a comprehensive review paper. The study systematically analyzes and organizes the latest AI-based research approaches aimed at revolutionizing the speed of gene function discovery.
Since the early 2000s, when whole-genome sequencing became a reality, there were high expectations that the genetic blueprint of life would be fully decoded. However, even twenty years later, the roles of a significant portion of genes within microbial genomes remain unknown.
While various experimental methods—such as gene deletion, analysis of gene expression profiles, and in vitro activity assays—have been employed, discovering gene functions remains a time-consuming and costly endeavor. This is primarily due to the limitations of large-scale experimentation, complex biological interactions, and the discrepancy between laboratory results and actual in vivo responses.
To overcome these hurdles, the research team emphasized that an AI-driven approach combining computational biology with experimental biology is essential.
In this paper, the team provides a comprehensive overview of computational biology approaches that have facilitated gene function discovery, ranging from traditional sequence similarity analysis to the latest deep-learning-based AI models.
Notably, 3D protein structure prediction technologies such as AlphaFold (developed by Google DeepMind) and RoseTTAFold (developed by the University of Washington) have opened new doors. These tools go beyond simple functional estimation, offering the potential to understand the underlying mechanisms of how gene functions operate. Furthermore, generative AI is now extending research boundaries toward designing proteins with specifically desired functions.
Focusing on transcription factors (proteins that act as genetic switches) and enzymes (proteins that catalyze chemical reactions), the team presented various application cases and future research directions that integrate gene sequence analysis, protein structure prediction, and diverse metagenomic analyses.
To overcome the biases and limitations inherent in traditional gene discovery, the researchers highlighted the need for an “Active Learning” framework where AI guides the experimental process.
Active Learning is a method where an AI model identifies predictions with high uncertainty and suggests specific experiments to resolve them. The results are then fed back into the model to improve its accuracy. This iterative loop allows researchers to efficiently validate the most critical gene functions first.
The team stressed that this approach requires tight integration with automated experimental platforms and shared research infrastructures, such as biofoundries. They also noted that “failed data”—experiments that did not yield the expected results—must be shared as vital learning assets for future research.
“While deep learning-based prediction performance has improved significantly, developing ‘Explainable AI’ models that can provide biological justifications for their results remains a critical challenge,” said Dr. Gi Bae Kim of KAIST, a co-author of the study.
Distinguished Professor Sang Yup Lee emphasized, “The key to surpassing the limits of gene function discovery lies in combining a systematic, AI-guided experimental framework with an automated research infrastructure under the direction of human researchers. Establishing a research ecosystem where prediction and validation are repeatedly linked is essential.”
The paper was published on January 7th in Nature Microbiology, a prestigious journal in the field of biotechnology published by Nature.
Publication Information
- Title: Approaches for accelerating microbial gene function discovery using artificial intelligence
- DOI: 10.1038/s41564-025-02214-1
- Authors: Bernhard O. Palsson (UCSD, First Author), Sang Yup Lee (KAIST, Second and Corresponding Author), Gi Bae Kim (KAIST, Third Author)
- This work is supported by the Development of platform technologies of microbial cell factories for the next-generation biorefineries project (2022M3J5A1056117) and Development of advanced synthetic biology source technologies for leading the biomanufacturing industry project (RS-2024-00399424) from National Research Foundation and supported by the Korean Ministry of Science and ICT.
Journal
Nature Microbiology
Method of Research
Meta-analysis
Subject of Research
Not applicable
Article Title
Approaches for accelerating microbial gene function discovery using artificial intelligence
Article Publication Date
7-Jan-2026
COI Statement
Bernhard O. Palsson (UCSD, First Author), Sang Yup Lee (KAIST, Second and Corresponding Author), Gi Bae Kim (KAIST, Third Author)