Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
The 2024 Nobel Prize in Chemistry has been awarded to David Baker âfor computational protein designâ and to Demis Hassabis and John M. Jumper âfor protein structure predictionâ. Proteins are lifeâs essential building blocks, natureâs most ingenious molecular machines and the basis of all living organisms. Hassabis and Jumper have developed a series of artificial intelligence models to address the decades-long structural biology problem of how to predict the complex 3D structures of proteins solely from their linear amino acid sequences, while Baker has dedicated his scientific career to designing and constructing proteins that are not, and even can not, be found in nature. In recognition of this award, Nature Portfolio presents a collection of research, review and opinion articles that celebrates both contributions by the awardees and the advances they have inspired.
AlphaFold predicts the distances between pairs of residues, is used to construct potentials of mean force that accurately describe the shape of a protein and can be optimized with gradient descent to predict protein structures.
AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
AlphaFold is used to predict the structures of almost all of the proteins in the human proteomeâthe availability of high-confidence predicted structures could enable new avenues of investigation from a structural perspective.
AlphaFoldâ3 has a substantially updated architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues with greatly improved accuracy over many previous specialized tools.
AlphaFold is a neural-network-based approach to predicting protein structures with high accuracy. We describe how it works in general terms and discuss some anticipated impacts on the field of structural biology.
Predicting the structure of a folded protein from first principles for any given amino-acid sequence remains a formidable computational challenge. To recruit human abilities to the task, these authors turned their Rosetta structure prediction algorithm into an online multiplayer game in which thousands of non-scientists competed and collaborated to produce new algorithms and search strategies for protein structure refinement. This shows that computationally complex problems can be effectively 'crowd-sourced' through interactive multiplayer games.
The trRosetta server, a web-based platform for fast and accurate protein structure prediction, is powered by deep learning and Rosetta. This protocol includes procedures for using the web-based server as well as the standalone package.
Deep learning has transformed protein structure modeling. Here we relate AlphaFold and RoseTTAFold to classical physically based approaches to protein structure prediction, and discuss the many areas of structural biology that are likely to be affected by further advances in deep learning.
The novel Foldseek clustering algorithm defines 2.30 million clusters of AlphaFold structures, identifying remote structural similarity of human immune-related proteins in prokaryotic species.
The extent to which the AlphaFold database has structurally illuminated proteins that are challenging to annotate for function or putative biological role using standard homology-based approaches at high predicted accuracy is investigated.
An analysis of the evolutionary distribution of predicted structures for the metamorphic protein KaiB using AF-Cluster reveals that both conformations of KaiB were distributed in clusters across the KaiB family.
Here, the authors evaluate the performance of AlphaFold2 and its predicted structures on common structural biological applications, including missense variants, function and ligand binding site prediction, modeling of interactions and modeling of experimental structural data.
AlphaFold2 has already changed structural biology, but its true power may lie in how it changes the way we think about cells and organisms. Two studies broadly assess its utility and limitations in providing structural models to shed light in areas such as mutations, proteinâprotein interactions, and phosphorylation.
AlphaFold 3 represents a breakthrough in predicting the 3D structures of complexes directly from their sequences, offering insights into biomolecular interactions. Extending predictions to molecular behavior and function requires a shift from viewing biomolecules as static 3D structures to dynamic conformational ensembles.
This paper presents an iterative procedure where AlphaFold models are automatically rebuilt on the basis of experimental density maps and the rebuilt models are used as templates in new AlphaFold predictions.
ColabFold is a free and accessible platform for protein folding that provides accelerated prediction of protein structures and complexes using AlphaFold2 or RoseTTAFold.
OpenFold is a trainable open-source implementation of AlphaFold2. It is fast and memory efficient, and the code and training data are available under a permissive license.
An analysis of AlphaFold protein structure predictions shows that while in many cases the predictions are highly accurate, there are also many instances where the predicted structures or parts of predicted structures do not agree with experimentally resolved data. Therefore, care must be taken when using these predictions for informing structural hypotheses.
CombFold is a combinatorial and hierarchical assembly algorithm for predicting structures of large protein complexes utilizing pairwise interactions between subunits predicted by AlphaFold2.
Evolutionary analysis guided by predicted structures of AlphaFold 2 elucidates novel aspects of rapidly evolving pathogen effectors from fungal phytopathogens.
Deep graph neural networks can refine a predicted protein model efficiently with less computing resources. The accuracy is comparable to that of the leading physics-based methods that rely on time-consuming conformation sampling.
AlphaFold2 is a popular protein structure prediction tool, however, achieving high accuracy remains challenging for certain proteins that share fewer homologs with the database. Here, the authors develop a new version of the MULTICOM system to improve the multi-sequence alignment, structural template, model ranking, model refinement, and hence the accuracy of AlphaFold2 prediction.
A computational protein design was used to generate eight enzymes that were able to catalyse the Kemp elimination, a model reaction for proton transfer from carbon. Directed evolution was used to enhance the catalytic activity of the designed enzymes, demonstrating that the combination of computational protein design and directed evolution is a highly effective strategy to create novel enzymes.
Computational protein design is used to create a protein that binds the steroid digoxigenin (DIG) with high affinity and selectivity; the computational design methods described here should help to enable the development of a new generation of small molecule receptors for synthetic biology, diagnostics and therapeutics.
Computationally designed icosahedral protein-based assemblies can protect their genetic material and evolve in biochemical environments, suggesting a route to the custom design of synthetic nanomaterials for non-viral drug delivery.
A technique for the de novo design of switchable protein systems controlled by induced conformational change is demonstrated for three functional motifs, in vitro and in yeast and mammalian cells.
Enzymes use substrate-binding energy to promote ground-state association and to selectively stabilize the reaction transition state. Mutations in the amino-terminal domain of the monomeric homing endonuclease I-AniI, which cleaves with high sequence specificity in the centre of a 20-base-pair DNA target site, are now found to have different effects on the kinetic parameters of the enzyme than those in the carboxy-terminal domain, revealing an unexpected asymmetry in the use of enzymeâsubstrate binding energy for catalysis.
A computational method is reported that can be used to design protein nanomaterials in which two distinct subunits co-assemble into a specific architecture; five 24-subunit cage-like protein nanomaterials are designed, and experiments show that their structures are in close agreement with the computational design models.
Computational methods for the de novo design of conformationally restricted peptides produce exceptionally stable short peptides stabilized by backbone cyclization and/or internal disulfide bonds that are promising starting points for a new generation of peptide-based drugs.
The computational design of an extremely stable icosahedral self-assembling protein nanocage is presented; the icosahedron should be useful for applications ranging from calibrating fluorescence microscopy to drug delivery.
Despite substantial effort, the de novo design of a stable TIM-barrel protein fold has remained elusive. A Rosetta-based computational strategy identifies a unique 184-residue sequence that adopts a TIM-barrel structure, as revealed by X-ray crystallography.
A computational method to design cyclic protein homo-oligomers has been developed. Using this approach, a series of idealized repeat proteins incorporating designed interfaces that direct their assembly into complexes possessing cyclic symmetry were fabricated. 15 out of 96 oligomers that were characterized experimentally were shown to be consistent with the computational model.
Baker, Marcos and colleagues analyze β-arches (loops connecting unpaired β-strands) and derive rules used for de novo design of a hyperthermostable jellyroll structure, with eight antiparallel β-strands forming double-stranded β-helices.
A hyper-stable de novo protein mimic of interleukin-2 computationally designed to not interact with a regulatory T-cell specific receptor subunit has improved therapeutic activity in mouse models of melanoma and colon cancer.
Chris Garcia, David Baker and colleagues use a computational approach to develop designed repeat protein binders (DRPBs), which function as human Frizzled (Fz) subtype-selective antagonists and enable identification of Fz subtypes active in different organs.
An approach for the design of protein pores is demonstrated by the computational design and subsequent experimental expression of both an ion-selective and a large transmembrane pore.
The trRosetta neural network was used to iteratively optimise model proteins from random 100-amino-acid sequences, resulting in âhallucinatedâ proteins, which when expressed in bacteria closely resembled the model structures.
The authors show that consideration of global backbone strain enables successful de novo design of larger αβ-proteins with five- and six- stranded β-sheets flanked by α-helices.
A design pipeline is presented whereby binding proteins can be designed de novo without the need for prior information on binding hotspots or fragments from structures of complexes with binding partners.
The process of protein crystallization is poorly understood and difficult to program through the primary sequence. Here the authors develop a computational approach to designing three-dimensional protein crystals with prespecified lattice architectures with high accuracy.
Recently, a pipeline for the design of protein-binding proteins using only the structure of the target protein was reported. Here, the authors report that the incorporation of deep learning methods into the original pipeline increases experimental success rate by ten-fold.
RoseTTAFold2-Lite uses residueâresidue coevolution and protein structure prediction to identify and structurally characterize proteinâprotein interactions in bacterial pathogens.
Engineering the tunability of protein assembly in response to pH changes within a narrow range is challenging. Here the authors report the de novo computational design of pH-responsive protein filaments that exhibit rapid, precise, tunable and reversible assembly and disassembly triggered by small pH changes.
A computer Go program based on deep neural networks defeats a human professional player to achieve one of the grand challenges of artificial intelligence.
A new computational approach that can be used to refine the three dimensional structural models of proteins is described. When used to refine models generated from nuclear magnetic resonance data, the method can improve the accuracy of the structures in terms of the backbone conformations and the placement of core side chains. In addition, the approach can be used to generate significantly better solutions to the X-ray crystallographic phase problem in molecular replacement trials.
A deep learning approach enables accurate computational design of soluble and functional analogues of membrane proteins, expanding the soluble protein fold space and facilitating new approaches to drug screening and design.
A surface-centric approach captures the physical and chemical determinants of molecular recognition, enabling the de novo design of protein interactions and of artificial proteins with function.
Cysteine is the most intrinsically nucleophilic amino acid in proteins, but the absence of a consensus sequence that defines functional cysteines in proteins has hindered their discovery and characterization. Here, a proteomics method to quantitatively profile the intrinsic reactivity of cysteine residues directly in native biological systems is described. Hyper-reactive cysteines were identified in several proteins of uncharacterized function, including a residue conserved across eukaryotes that is shown to be required for yeast viability and involved in ironâsulphur protein biogenesis.
This study proposes a diffusion model, ProteinSGM, for the design of novel protein folds. The designed proteins are diverse, experimentally stable and structurally consistent with predicted models
The de novo design of functional membrane proteins is a formidable challenge. Now, water-soluble peptides have been designed that assemble into α-helical barrels with accessible, polar and hydrated central channels. Insights from these structures have been used to produce stable membrane-spanning, cation-selective channels.
How a lasso cyclase ties a lasso peptide into its characteristic knot has remained poorly understood. Here the authors identify key molecular interactions that guide lasso peptide folding and cyclase substrate tolerance to inform cyclase engineering for expanded lasso peptide diversity.
This Review describes the de novo design of metalloproteins, which perform numerous functions essential to life. By understanding the relationship between the symmetry of the protein structure and the metal active site, we can design novel, functional metalloproteins from scratch.
Recent combinations of structure-based and sequence-based calculations and machine learning tools have dramatically improved protein engineering and design. Although designing complex protein structures remains challenging, these methods have enabled the design of therapeutically relevant activities, including vaccine antigens, antivirals and drug-delivery nano-vehicles.
Retrobiosynthesis aims to create novel biosynthetic pathways for the beneficial production of molecules of interest. This Review outlines how machine learning can help to advance retrobiosynthesis by improving retrosynthesis planning, enzyme identification and selection, and the engineering of enzymes and pathways.
This Perspective proposes practical guidance to the application of AlphaFold2 for structure prediction of different classes of proteins including rigid globular proteins, intrinsically disordered proteins and alternative conformational states. The use of evaluation metrics to predict reliability of the resulting models and their integration with experimental data are also discussed.
A proteinâs shape is crucial for fulfilling its function within a cell. This Review discusses how molecular dynamics simulations have given us insight into the processes that turn a linear chain of amino acids into a unique three-dimensional protein.
Diffusion models are deep-learning-based generative models that can generate new data from input parameters. This Review discusses applications of diffusion models in bioinformatics and computational biology.
Protein structures predicted using artificial intelligence will aid medical research, but the greatest benefit will come if clinical data can be similarly used to better understand human disease.
Protein engineering is a powerful tool to create new proteins with useful functions and behaviors, but it is slow, laborious and requires specialized knowledge, limiting its broad application. Here, the authors present a system that combines AI and experimental automation to autonomously engineer proteins without human intervention.
Two threads of research in the quest for methods that predict the 3D structures of proteins from their amino-acid sequences have become fully intertwined. The result is a leap forward in the accuracy of predictions.
Dramatic advances in protein structure prediction have sparked debate as to whether the problem of predicting structure from sequence is solved or not. Here, I argue that AlphaFold2 and its peers are currently limited by the fact that they predict only a single structure, instead of a structural distribution, and that this realization is crucial for the next generation of structure prediction algorithms.
The greatly improved prediction of protein 3D structure from sequence achieved by the second version of AlphaFold in 2020 has already had a huge impact on biological research, but challenges remain; the protein folding problem cannot be considered solved. We expect fierce competition to improve the method even further and new applications of machine learning to help illuminate proteomes and their many interactions.
Deep learning has transformed protein structure modeling. Here we relate AlphaFold and RoseTTAFold to classical physically based approaches to protein structure prediction, and discuss the many areas of structural biology that are likely to be affected by further advances in deep learning.
Achieving cost-competitive bio-based processes requires development of stable and selective biocatalysts. In this Perspective, the authors propose an integrated solution combining growth-coupled selection with machine learning and automated workflows to accelerate development pipelines.
Following the publication of AlphaFold2 and RoseTTAFold in 2021, the field of protein structure prediction has moved quickly to incorporate these advances into protein engineering.
AlphaFold is a breakthrough in protein structure prediction, but limitations in its application to computation- and structure-guided drug discovery remain. As with structure prediction, public-domain data and benchmarking initiatives will be essential to advance the field of computational drug design.
This Perspective discusses the application of algorithmic methods throughout the preclinical phases of drug discovery to accelerate initial hit discovery, mechanism-of-action elucidation and chemical property optimization.