News Release

USTC researchers develop a method named SCUBA for protein design

Peer-Reviewed Publication

University of Science and Technology of China

USTC Researchers Develop a Method Named SCUBA for Protein Design

image: The principle of protein design using SCUBA model. view more 

Credit: Image by HUANG Bin et al.

In a Nature paper published on Feb 9th, the team of Prof. LIU Haiyan and Prof. CHEN Quan from the University of Science and Technology of China (USTC) of the Chinese Academy of Sciences (CAS) reported a new method named SCUBA (for Side Chain-Unknown Backbone Arrangement) for de novo protein design. SCUBA employs a novel statistical learning strategy, and enables the continuous and extensive search of the main chain structure space, making it possible to automatically generate protein main chain structures with “high designability”, meaning amino acid sequences can be selected to fold into such structures closely.

Proteins are the foundation of life and the main executors of cellular functions, with their structures and functions determined by their amino acid sequences. Currently, proteins of stable three-dimensional structures are almost all natural ones, whose amino acid sequences are results of long-time natural evolution. When structures and functions of natural proteins fail to fulfill any industrial or biomedical needs, it is desirable to have designed proteins to provide the missed functions.

Previously reported studies of de novo protein design have relied on the use of existing (natural) fragments as building blocks for assembling into new overall structures. This approach has various shortcomings, including producing designs that are excessively monotonous and that lack the diversity of natural proteins, and being overly sensitive to the targeted main chain structures. The most difficult problem in de novo protein design, to which there still lacks a systematic approach, is how to enable efficient and extensive explorations of the space of main chain structures with “high designability”, so as to discover novel, plausible protein structures to realize diverse functions.

The research team of USTC dedicated themselves for years to develop a data-driven protein design method. After long-term efforts, they have established the ABACUS (a backbone-based amino acid usage survey) model for designing amino acid sequences for given main chain structures. Subsequently, the team developed the SCUBA model for designing new main chain structures without pre-specified amino acid sequence. SCUBA comprises an analytical energy function learned from original structural data by using kernel density estimation followed by neural network training. The energy function describes, with high fidelity, high-dimensional correlations in natural protein structures. It is used to guide the automatic generation of new main chain structures with “high designability”.

Theoretical calculations and experiments showed that carrying out main chain structure design with SCUBA can overcome the limitation of having to rely on existing fragments to build new protein structures. It thus significantly expands the diversity of protein structures accessible to de novo design, and facilitates the design of novel structures not observed in nature. In this study, the team reported high-resolution crystal structures of nine de novo designed protein molecules, all with actual structures in close agreements with their corresponding design models. Four of the designed proteins are of novel topologies that have not been observed in natural proteins.

This impressive work provided us a novel method for de novo protein design, which complemented shortcomings of existing methods. “Unlike existing approaches, the premise of this approach is that it allows one to design a much broader range of protein geometries than what is observed in nature”, said one of the reviewers.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.