Foundation models in molecular biology
Higher Education PressFoundation models in molecular biology, leveraging their success in NLP and image generation, are revolutionizing the understanding of multi-level molecular correlations by training on vast datasets encompassing RNA/DNA/protein sequences, single-cell transcriptomics, and spatial transcriptomics. These models decode intricate relationships (e.g., gene regulatory networks, protein interaction hubs) to predict functions, design therapeutics, and infer spatial tissue dynamics. Current frameworks include ESM-2 (protein structure-function prediction), scGPT (single-cell data integration), and DNABERT (genomic variant interpretation). Future directions emphasize multimodal integration (combining sequences, structures, and omics), interpretable attention mechanisms for biological insights, and scalable architectures for high-resolution spatial-temporal data. Addressing data heterogeneity and model generalizability will unlock precision biomedicine applications.
- Journal
- Biophysics Reports
- Funder
- Method of Research: experimental study