Structural bioinformatics
Over a year of AlphaFold 2 free to use and of the revolution it triggered in biology
Confident modeling of protein structures, prediction of their interactions with other biomolecules, and even protein design are now at everybody’s reach thanks to the revolution started by Deepmind.

Introduction | Democratizing the use of AlphaFold 2 and new adaptations of it | What’s next | Protein language models | Pre-made databases of structural models built with these modern tools | Unexpected things that AlphaFold 2 seems to be able to do | Further reads on the topic and on broader structural biology
Introduction
It all started when a private company played for the first time ever in a contest for protein structure prediction, and won it. Twice in a row, and it then released its star program in an open form for all scientists (also students, actually everybody!) to use for free. Moreover, its users don’t even need a powerful computer, because the calculation happens right on a GPU in the cloud!
We are talking about Deepmind, the Artificial Intelligence (AI) research company of the Alphabet group, owner of the most widely employed web browser, Google.
Deepmind entered CASP, a competition about protein structure prediction where usually only academic groups participate, in its 13th edition, right when I was an assessor who evaluates the predicted structures. Deepmind turned out to be the best predictor, with their AlphaFold program. As I discussed in my articles, this new program was the best as of CASP13 but it wasn’t out of this world, and hadn’t really invented anything novel -rather, it only engineered the most out of what academics were already doing at the time.
But then in the next edition of CASP, Deepmind came in again with AlphaFold 2, a full rewrite based on several new AI concepts. AI means “artificial intelligence”, a not very happy way to refer to a series of mathematical models that try to mimic how brains work by integrating relatively simple inputs through simple units (neurons) that when coupled together can crack down very complex problems -hence they are also known as “machine learning” or “artificial neural network” methods.
Unlike AlphaFold 1, AlphaFold 2 was truly revolutionary. And triggered in turn a revolution in all fields related to biology at atomic level, what’s known as “structural biology”. See, structural biology attempts to explain how biology emerges from how atoms connect into complex so-called “biomolecules” that work together to build up cells, tissues, and organisms. Among these biomolecules, proteins are of special interest because they provide the main variety of functions inside cells. Proteins help with everything in a cell’s life, from keeping its shape to replicating DNA, cracking down foods, photosynthesizing, controlling the cell’s cycle, and keeping the system healthy -just to mention 5 broad-scope functions out of thousands or probably millions of functions.
While other biomolecules are relatively simple in their 3D structure, proteins are not. But in order to understand how they work, how to tune them for our benefit, etc. we need to know this 3D structure. This essentially entails knowing the positions of all its constituents atoms in space, which we can do with some experimental techniques for some proteins, but not for all. And even when we can, experiments are costly and can take decades to work out. Therefore, for a long time scientists have studied ways to predict these 3D structures out from what’s the simplest representation of proteins: the sequences of amino acids that make them up.
I have explained the basics about protein structure, CASP (a contest on predicting protein structures, from which all this revolution came up), and AlphaFold 2, in detail in various works. If you are starting in the subject form scratch, these are probably the first blog entries you should read:
Then, to know more about CASP and AlphaFold 2, check the more of my blog entries and also why not my peer-reviewed articles:
Besides, I regularly answer the questions that people ask about all this -so go ahead, do your consultancy with me:
Democratizing the use of AlphaFold 2 and new adaptations of it
In July 2021, Deepmind published the academic paper about AlphaFold 2 and released all the code to run it, which entailed a change in how molecular and structural biology would develop from that point on:
Deepmind released a usable version of AlphaFold as a Colab notebook (hence running online on GPUs provided by Google) and then immediately a preprint came out that presented different variations of Colab notebooks to run AlphaFold in different ways.
The availability of these notebooks meant that everybody could run the new, powerful program everywhere, even literally from their phones. Moreover, one could edit the code and adapt it to specific use cases, opening several new lines of research in basic, fundamental biology and also in applied computational biotechnology, for example to create new proteins.
Even the computer science was so well explained and left open, that scientists started building on that to create new programs that could address other problems about the structures of proteins and biological macromolecules, that AlphaFold 2 wasn’t even designed to address. Here are two such examples, of AI programs aimed at predicting interactions between proteins and other biological molecules, and at designing new proteins:
What’s next?
A new edition of CASP, CASP15, is now rolling, whose results we will know by the end of this year. I discussed what changes has CASP introduced to adapt the competition to the new capabilities available to everybody:
So far CASP15 has released 54 targets across its different categories, including some interesting enzymes, bacterial proteins, a couple of large protein complexes, and also (new to CASP) RNA molecules and RNA-protein complexes.
One point of special interest in CASP15 is whether we can predict how proteins move, besides their average static structures…
Deepmind is not participating in this CASP, but in turn another Tech giant stepped in: Meta, which has specialized its work on so-called protein language models, which I’ve covered recently too as I summarize next.
Protein language models
In order to model a protein from its sequence, AlphaFold 2 first builds an alignment of multiple sequences of proteins related to the query of interest. This alignment is processed by a so-called “language model” that is specialized for proteins, i.e. that knows how to extract some data from it -data presumably containing information about the structure. This data, is then fed into downstream parts of the artificial neural network that makes up AlphaFold, into the core that predicts the actual structure of the protein as the output.
Some groups, most notably Meta, have now developed much more complex versions of these “protein language models”. These models in fact learn so much about protein sequences and the evolutionary patterns that relate them to protein structure and function, that they can process input sequences without any need for alignments. Somehow, these AI models don’t need alignments because they inherently already know all the relationships and patterns possible in the protein universe. This way, they are much faster than AlphaFold, producing models that are as good but running 60 times faster. I describe how these and related models work in more detail in two blog entries:
In particular, the latter presents Meta’s language model for proteins, how protein structures are modeled from it, and how its fast running speeds enabled Meta to process over 600 million protein sequences to model their structures.
Pre-made databases of structural models built with these modern tools
Both Deepmind with AlphaFold 2 and Meta with ESMFold have applied their methods to millions of protein sequences, to derive precomputed databases of protein “structures” (actually predicted models) for as many as possible. Teamed up with the European Institute of Bioinformatics, Deepmind released over 200 million models; meanwhile, Meta released a whole Atlas containing over 600 million models coming from metagenomic data (data obtained from large-scale genome sequencing projects).

Academics have also made their contributions; for example, we have processed a large fraction of the models released by Deepmind to identify interaction-prone surfaces in its protein models. Additionally, a new paper just out describes a resource called AlphaFill that has processed all of Deepmind’s models to include candidate ligands (which aren’t predicted by AlphaFold itself but can in principle be “transplanted”):
Unexpected things that AlphaFold 2 seems to be able to do
Immediately after AlphaFold 2’s release, scientists started to see it was useful for things other than predicting structures. For example, a so-called pLDDT metric provided by AlphaFold together with the model to estimate the quality, was found to also act as a good predictor of protein disorder (note that not all proteins are well-structured; some are very disordered and this disorder is actually key to their function).
There were many additional observations that were initially anecdotical, mostly propagated through Twitter, but eventually were investigated in enough detail to make up a paper out of it. Very recently, several groups came together to address these “side-applications” of AlphaFold 2, and reported all their findings in a paper:
On a more technical note, one of the most interesting points was the finding that “subsampling” the multiple sequence alignment used by AlphaFold 2 results in conformational variability that could potentially be meaningful. This apparently hasn’t been explored in detail for many proteins, or if it was then the findings were probably negative, but for a few systems the results seemed to make sense in the context of the related biology:
Further reads on the topic and on broader structural biology
Two guides to my articles on AlphaFold:
The official CASP15 website, whose results will be out by mid December 2022:
And if you like my style of communicating advances in biological sciences…
www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here.





