avatarLucianoSphere (Luciano Abriata, PhD)

Summary

The undefined website presents the groundbreaking capabilities of the new software RoseTTAFold All-Atom (RFAA) and its related diffusion model RFdiffusionAA, which revolutionize protein structure prediction and design by accurately modeling complex biological assemblies involving proteins, nucleic acids, small molecules, and metals, and by creating new proteins designed to bind specific molecules.

Abstract

The undefined website discusses a significant advancement in the field of biomolecular modeling with the introduction of RoseTTAFold All-Atom (RFAA) software. Developed by the Institute of Protein Design under Prof. David Baker, RFAA leverages deep learning to predict the structures of complex biological systems, including proteins, DNA, RNA, small molecules, and metal ions. This tool not only matches the accuracy of AlphaFold2 for protein-only tasks but also excels in docking small molecules into proteins, taking into account their flexibility and covalent modifications. Additionally, the website introduces RFdiffusionAA, a diffusion model capable of designing new proteins from scratch, which has been experimentally validated to bind to target molecules. The implications of these tools are vast, potentially transforming fields such as structural biology, drug discovery, and protein engineering.

Opinions

  • The author expresses that predicting the 3D structures of proteins with high accuracy is now a reality, thanks to deep learning networks like AlphaFold2 and the new RFAA software.
  • The author is enthusiastic about RFAA's ability to model complete biological assemblies, considering it a breakthrough in understanding how proteins interact with other molecules to carry out their biological functions.
  • The author highlights the potential of RFdiffusionAA to bring protein design and engineering to a new level, emphasizing its ability to create proteins that can bind to other molecules, which is crucial for drug discovery.
  • The author suggests that the mathematical elements developed for AlphaFold have been instrumental in the advancements made by RFAA and other subsequent methods.
  • The author believes that the training dataset used for RFAA, which includes a diverse range of biomolecule structures, allows the model to learn general atomic interactions applicable to various biomolecules.
  • The author is optimistic about the future applications of RFAA and RFdiffusionAA, foreseeing their use in discovering new biology, improving protein-ligand docking, and enabling innovative protein design.
  • The author indicates that the rapid development of AI tools in protein design, such as RFAA and RFdiffusionAA, is making this an exciting time for protein biotech and related scientific fields.

New AI Method for Protein Structure Prediction Handles All Kinds of Biologically Relevant Molecules

Allowing scientists to predict their joint structures and to create new proteins designed to specifically bind defined molecules

Image composed by the author from Dall-E 2 generations and own-made illustrations.

Predicting the complex three-dimensional structures of proteins with high accuracy is no longer a dream, thanks to deep learning networks like AlphaFold2 and others that followed it. But proteins don’t work alone. They interact with other proteins, with DNA, RNA, and small molecules and ions of all kinds — all crucial to their biological function. These interactions have been a huge challenge to model, but that’s only until now, again, thanks to deep learning.

Presented in a preprint last week, the new software RoseTTAFold All-Atom (I’ll call it just RFAA) from the Institute of Protein Design led by Prof. David Baker, the guru of protein modeling and design, is a deep learning network that can model complete biological assemblies containing proteins, nucleic acids, small molecules, and metals, even understanding covalent modifications of the amino acids. RFAA matches the accuracy of AF2 for protein-only tasks, and excels in docking small molecules into proteins, even accounting for their flexibility. RFAA can even predict protein covalent modifications and assemblies of proteins with multiple nucleic acid chains and small molecules and ions, all together. And the hype doesn’t stop there: the same preprint presents a related diffusion model called RFdiffusionAA, which designs new proteins from scratch by building them around small molecules and other non-protein molecules, that then actually bind upon wet-lab testing as presented in a dedicated section of the same preprint. While RFAA opens a new door into biology that extends on what AlphaFold 2 and other methods already can do, RFdiffusionAA can bring the whole field of protein design and engineering to a whole new level.

Imagine being able to solve a complex 3D puzzle without knowing what the final picture looks like. That’s the challenge scientists face when trying to predict the 3D structures of biological macromolecules such as proteins and their complexes with nucleic acids, small molecules, ions, and more. AlphaFold 2 cracked a big part of the problem: how to predict the 3D structures of isolated proteins and of complexes involving multiple proteins. But biology makes use of multiple other kinds of molecules beyond proteins, either small or big… So how to deal with structure predictions involving more than just proteins? A new preprint published by the David Baker Group at the Institute of Protein Design presents a breakthrough in biomolecular modeling and design that brings us one step closer to solving this puzzle.

The preprint (link at the end) introduces a new software called RoseTTAFold All-Atom (RFAA) that utilizes deep learning networks to model complete biological assemblies. This means that RFAA can predict not only the structures of isolated proteins or protein-only complexes but also the structures of proteins in combination with other molecules like DNA, RNA, small molecules, and metal ions. RFAA can even take into account covalent modifications of amino acids, which are modifications that change the chemical properties of proteins and are increasingly associated to disease states, information inheritance beyond that of genes, etc.

What really sets RFAA apart is its ability to accurately dock small molecules into proteins, accounting for their flexibility. Moreover, RFAA can predict complex assemblies of proteins with multiple nucleic acid chains and small molecules, providing a more comprehensive view of the biological system as a whole. All this is crucial in understanding how proteins interact with other molecules to carry out their biological functions (when for example, the protein is an enzyme that catalysis a reaction on a small molecule substrate) and also to create new molecules that will tune the structure, function or interactions of a target protein, which is the basis for pharma developments.

But the excitement doesn’t stop there. The preprint also presents a related model called RFdiffusionAA, fine-tuned from RFAA on diffusion denoising tasks so that it learns to “design” new proteins that fold as proposed by the user. Thus, RFdiffusionAA can build entirely new proteins that have the ability to bind to other (protein or non-protein) molecules. This has enormous implications in protein design and engineering, opening up new possibilities for drug discovery and other applications. Until recently, we were only looking at the problem of designing proteins that bind to other proteins, but the “all-atom” nature of the new model enabled this unprecedented kind of molecular modeling.

How RFAA and RFdiffusionAA work

This new work is yet another big leap in the revolution started by Deepmind’s ALphaFold 2 in biology. Indeed, most methods that came after AlphaFold for tasks related to molecular modeling exploit some of the mathematical elements developed for that tool as presented in the seminal Nature 2021 paper by the Google associate. And RFAA is no exception.

But central to the RFAA core is how it addresses the grasping of sequence information (protein and DNA sequences), structures (3D coordinates of atoms from all molecules being modeled), and representations for small molecules and amino acid modifications. For this, RFAA combines sequence-based descriptions of protein and nucleic acid components, with atomic graph representations of the small molecules and amino acid modifications whereby the bonded structure of a small molecule is represented as a graph with nodes representing atoms and edges representing bond connectivities. Such a description of small molecule structure is quite similar to the one we used in our system for protein interface prediction and context-aware design:

The mechanism used by the RFAA core to combine all descriptions is based on RoseTTAFold2’s architecture. The RoseTTAFold neural network has 3 tracks called the “1D”, “2D” and “3D” tracks, each of which was hacked as follows to extend its input tokens and create RFAA:

  • The 1D track inputs the chemical element type of each non-polymer atom, extending on RoseTTAFold’s 20 residues and 8 nucleic acid bases with 46 new element type tokens representing the most common element types found in the Protein Data Bank.
  • The 2D track, which in RoseTTAFold accepts pairwise distance information from homologous templates, adds in RFAA inputs about the chemical bonds between atoms, encoding pairwise information about whether bonds between pairs of atoms are single, double, triple or aromatic. These features are linearly embedded and summed with the initial pair features at the beginning of every recycle of the network, allowing the network to learn about bond lengths, angles, and planarity.
  • Last, the 3D track, which handles coordinate information and iteratively improves predicted structures through many hidden layers, adds in information about the chirality of the centers found in the small molecule.

Much like other methods but here handling non-protein atoms too, RFAA generates an internal representation of the full system as a disconnected gas of amino acid residues, nucleic acid bases, and freely moving atoms, which is then transformed into physically plausible assembly structures.

To train RFAA, the authors used a dataset of protein-biomolecule structures derived from the Protein Data Bank to include protein-small molecule complexes, protein-metal complexes, and covalently modified protein complexes. They also included small molecule crystal structures from the Cambridge Structural Database, to balance the amount of training data available for the small molecules, which are far less represented compared to proteins in the Protein Data Bank. By training RFAA on this diverse but balanced dataset, the model could learn general atomic interactions that describe geometries and physics similarly well for proteins, nucleic acids, and small molecules.

Applications

I won’t go into detail here, but the potential applications of RFAA and RFdiffusionAA are really vast. RFAA in discovering new biology, which is of course, of interest, and of enabling new forms of protein-ligand docking and virtual screening, two operations that are central in the quest for new medicaments.

But more excitingly, to me at least, RFdiffusionAA as it will enable a whole new world in protein design. Just in this preprint (and who knows what other exciting things they are trying out as I write and as you read this!), the authors present proteins that bind digoxigenin (a cardiac disease therapeutic), heme (an enzymatic cofactor), and bilin molecules, all experimentally validated in the wet lab and confirming their ability to bind to the target molecules as predicted.

References and further reads

This work brings us one step closer to solving the intricate puzzle of biomolecular structures and holds great promise for advancing fields like structural biology, drug discovery, and protein engineering. Read more about it right at the source:

I guess I will have to update this other article just months after I published it… it all goes so fast! But the good thing is that these methods are available for use:

Here’s the first protein design AI tool of these revolutionary times started by AlphaFold 2:

Other applications of AI to chemistry and biology

Artificial Intelligence
Bioinformatics
Biology
Machine Learning
Medicine
Recommended from ReadMedium