The recent release of over 200 million predicted protein structures by DeepMind’s AlphaFold 2, in collaboration with the European Institute of Bioinformatics, has ushered in a new era of protein research. Here I’m presenting a summary of findings from two groundbreaking papers published in Nature this week, which delve into the depths of this protein universe. These papers employ innovative clustering algorithms, structural comparisons, and other adaptations of existing tools to work on large data volumes, to shed light on the structural diversity, evolutionary relationships, and functional potential of proteins at an unprecedented scale.
Proteins are the workhorses of biology, governing a myriad of cellular processes, from energy generation to cell division. While the sequencing of proteins has burgeoned over the years, thanks to advances in genomics, the determination of their 3D structures has lagged behind due to the dearth of scalable experimental methods. However, with the advent of AlphaFold 2, a revolutionary AI system developed by DeepMind, the landscape of protein structure prediction has been transformed. The AlphaFold Protein Structure Database (AFDB) now houses an astounding 200 million predicted protein structures, marking a milestone in computational biology.
Now, just this week actually, two groups of authors write in Nature to report how to utilize AlphaFold 2’s protein models to unlock new insights into the protein universe. These studies leverage innovative versions of existing tools adapted to the huge volume of data in the AFDB; for example, modern versions of clustering algorithms and methods for structural comparisons. With these adapted tools the works explore the vast expanse of protein structures, their evolutionary origins, and their functional implications.