avatarLucianoSphere (Luciano Abriata, PhD)

Summary

This webpage provides a curated list of essential computational tools and online resources for structural biology, tailored for students and researchers in the field.

Abstract

The website outlines a collection of key websites and programs that are critical for modern structural biology, which heavily relies on computational methods. It emphasizes the importance of these tools for processing and analyzing the vast amounts of data generated by high-resolution techniques such as X-ray crystallography, cryo-electron microscopy (CryoEM), and nuclear magnetic resonance (NMR) spectroscopy. The article categorizes resources for sequence analysis, experimental and modeled structures, molecular visualization, modeling, and specialized software for techniques like NMR, X-ray diffraction, and Cryo-electron microscopy. It also touches on the potential future impact of augmented and virtual reality on molecular visualization and manipulation. The list is derived from a hands-on structural biology course at EPFL, aiming to guide students and researchers through the complex landscape of bioinformatics tools and databases.

Opinions

  • The author believes that computational resources are indispensable for the analysis and interpretation of structural biology data.
  • There is an appreciation for the diverse and specialized nature of software required for different structural biology techniques, acknowledging the challenges this poses.
  • The author expresses enthusiasm for the potential of augmented and virtual reality to revolutionize molecular visualization and interaction.
  • The article conveys a practical approach to learning, emphasizing hands-on activities with these tools as part of the educational process.
  • The author values the standardization and accessibility of structural data provided by resources like the wwPDB and the EBI-AlphaFold database.
  • There is an acknowledgment of the importance of predictive tools such as AlphaFold and other protein structure prediction methods in advancing the field.
  • The author suggests that the current mouse-screen interfaces for molecular visualization will soon be enhanced by more immersive technologies.

Key Websites and Programs for Structural Biology and Bioinformatics

Notes from a real-life course for Master and PhD levels

Introduction

Skip introduction and see the links

Modern structural biology relies heavily on computing for a variety of reasons. First and foremost, the sheer amount of data generated by structural biology experiments, especially the high-resolution techniques such as X-ray crystallography, cryo-electron microscopy (CryoEM) and nuclear magnetic resonance spectroscopy (NMR), is vast and requires powerful computational resources to process and analyze it. Additionally, deep characterization of the complex nature of biological structures often requires methods that are intrinsically computational, such as molecular dynamics simulations (see this) or some kind of advanced tool for prediction, such as the popular AlphaFold (see here), to fully understand their function and interactions.

Furthermore, most structural biology techniques require specialized programs to process and analyze the data. For example, data from cryo-electron microscopy or X-ray diffraction is acquired with software that comes together with hardware, and then inspected, processed and analyzed with programs like Relion, EMAN2, cryoSPARC, FREALIGN, Coot, and other specialized apps for image/diffraction processing and analysis. Totally different in nature, NMR data is also most of the time processed with software that comes with the instruments, and then there are a series of programs specialized for data acquisition, processing and analysis in different domains, depending if you are using the technique to identify a small molecule, or to assign a protein’s resonances, or to measure relation or diffusion, to mention some. The same happens with other lower-resolution techniques: most of the time, software and formats are not convertible; think of mass spectrometry data, small-angle scattering, spectroscopies such as fluorescence or infrared, etc. The data is of varied nature, often requires very different methods to be processed and even displayed, and instruments for different techniques are most often built by different companies.

The field of structural biology also heavily relies on various web-based resources, such as databases and servers, to store, share, and analyze large amounts of structural data. Or even to make predictions, as we’ll also cover here superficially and focusing only on protein structure prediction among a myriad of other kinds of predictions that can be done -say of localization sequences, ORFs, genome annotations, etc. The main structural resources include the Worldwide Protein Data Bank (wwPDB) and its tens of related sites, and the databases of structures predicted through novel methods such as AlphaFold. Especially the wwPDB, these resources are critical for the field of structural biology as they provide easy access to structural data in a standardized way connected to other databases and also to publications.

And there’s yet another cornerstone of modern structural biology that relies on computing: molecular graphics programs such as such as Pymol, Coot, Chimera or VMD, among the best-known ones. These programs are essential for even the simplest tasks related to visualizing molecular structures. Most provide some general tools and features, but also specialize in different niches; for example, VMD is specialized for molecular simulations, Coot for X-ray diffraction, Chimera is widely used for CryoEM maps, etc. In the near future, tools based on augmented reality or virtual reality could kick in and revolutionize the field, by allowing not only true 3D visualization but also very natural handling of molecules, even with two hands which is impossible with current software that uses only the mouse as input.

This article provides a curated list of essential software and online tools for structural biology, grouped by topic and with some useful comments along the way. I compiled this list from an actual hands-on course on structural biology that I lecture with my colleagues at EPFL. We make students install or access once these tools before each class on each topic, and then offer them hands-on activities to carry out on them. We usually provide a list only, so this document is useful as an extension with minimal explanations about each program or resource.

Jump to: - Structures of biological macromolecules - Molecular visualization - Molecular modeling - Nuclear Magnetic Resonance - X-ray diffraction by protein crystals - Cryo-electron microscopy

Structures of biological macromolecules

Sequences and sequence analyses for structural aims

https://www.uniprot.org/ -a comprehensive, freely accessible database of protein sequence with annotations about structure, function, evolution, localization, etc. including cross-references to other relevant databases.

Some predictors of transmembrane helices: https://services.healthtech.dtu.dk/service.php?TMHMM-2.0

One of a few predictors of transmembrane beta barrels: http://www.compgen.org/tools/PRED-TMBB2

One among tens of disorder predictors: https://iupred.elte.hu/

MoRF prediction: https://morf.msl.ubc.ca/index.xhtml -MoRF stands for molecular recognition feature, which is a short, disordered segment of a protein that undergoes ordering upon specific binding to a target, to which it usually binds to effect a function.

One of a few coiled-coil prediction tools: http://cb.csail.mit.edu/cb/multicoil2/cgi-bin/multicoil2.cgi

Experimental structures

https://www.rcsb.org/ -The main US-based sub-site of the wwPDB.

Clickable chart for PDB and its related databases:, at my website: http://lucianoabriata.altervista.org/papersdata/bib2016.html

https://www.ebi.ac.uk/emdb/ -The resource that collects 3D EM maps and associated experimental data determined using electron microscopy or tomography of biological specimens.

Modeled structures

AlphaFold — EBI database by Deepmind/EBI: https://alphafold.ebi.ac.uk/

AlphaFill (AlphaFold models enriched with ligands and co-factors): https://alphafill.eu/

ESM Atlas by Meta: https://esmatlas.com/

Integrative PDB: https://pdb-dev.wwpdb.org/

SAXS-based modeling: https://www.sasbdb.org/

Model Archive by the Swiss Institute of Bioinformatics: https://www.modelarchive.org/

All models of protein structures ever generated for CASP: https://predictioncenter.org/download_area/

Structural analysis and manipulation websites/programs

Foldseek (search your PDB query for structurally similar entries in the PDB or AlphaFold database): https://search.foldseek.com/

DALI to search the PDB and EBI-AlphaFold: http://ekhidna2.biocenter.helsinki.fi/dali/

Electrostatics: https://server.poissonboltzmann.org/

Sequence-independent structural alignment with TM scores: https://zhanggroup.org/TM-align/

Find potential ligands in structure: https://zhanggroup.org/COFACTOR/

PDB manipulation suite (shift numbers, chains, map B-factors, etc.): http://lucianoabriata.altervista.org/pdbms/index.html

Sequence manipulation suite: https://www.bioinformatics.org/sms2/

Molecular Visualization programs and websites

Visualization is key, and while we and others work on bringing augmented and virtual realities to make it all more immersive and intuitive, nowadays we are stuck in mouse-screen-flat screen interfaces. These are the programs that people actually use today:

https://pymol.org/2/ or the legacy PyMOL 0.99 from various sources like https://pymol.en.uptodown.com/windows/download

VMD (best for molecular simulations and formats from the simulation world): https://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=VMD

ChimeraX: https://www.cgl.ucsf.edu/chimerax/

Prepare any structure for view in smartphone augmented reality or high-end virtual reality: https://molecularweb.epfl.ch/pages/pdb2ar.html

Molecular Modeling websites/programs

General protein modeling

Modern tools for protein modeling in lack of homology (pre-AlphaFold): http://lucianoabriata.altervista.org/papersdata/bib2020.html

ColabFold (various AlphaFold versions plus ESMFold and RoseTTAFold): https://github.com/sokrypton/ColabFold. Full tutorial here:

ESMFold: https://esmatlas.com/resources?action=fold

EBI-AlphaFold database: https://alphafold.ebi.ac.uk/

AlphaFill (AlphaFold models enriched with ligands and co-factors): https://alphafill.eu/

(C-)I-TASSER and related tools: https://zhanggroup.org/I-TASSER/ and https://zhanggroup.org/C-I-TASSER/

SwissModel for rapid template search and homology modeling including oligo state: https://swissmodel.expasy.org/

Docking: https://wenmr.science.uu.nl/ (see also AlphaFold-multimer in ColabFold!!)

Canonical structures

Build canonical coiled-coils: http://coiledcoils.chm.bris.ac.uk/ccbuilder2/builder

Larger biomolecular systems

OPM for proteins in membranes: https://opm.phar.umich.edu/

CHARMM-GUI: (website to build complex models and also to parametrize them for molecular dynamics simulations)

Proteins in membrane or just membranes: https://charmm-gui.org/?doc=input/membrane.bilayer

Micelles: https://charmm-gui.org/?doc=input/membrane.micelle

Nanodisks: https://charmm-gui.org/?doc=input/membrane.nanodisc

Explore more: LPS modeling, glycans, glycolypids, organic polymers, carbon nanotubes, etc.

Soon: build systems by moving molecules with your bare hands!

Small molecules

Hack-a-mol (create small molecules and interconvert them in different formats: https://chemapps.stolaf.edu/jmol/jsmol/hackamol.htm)

Nuclear Magnetic Resonance software and resources

TopSpin (Bruker’s software for data acquisition and analysis on their instruments): https://www.bruker.com/en/products-and-solutions/mr/nmr-software/topspin.html?gclid=EAIaIQobChMIxu-PkJPm9QIVzed3Ch26hAIgEAAYASAAEgI0hvD_BwE

CARA for spectral assignment: http://cara.nmr.ch/doku.php/cara_downloads

Sparky-NMRfam for spectral analysis: https://nmrfam.wisc.edu/nmrfam-sparky-distribution/

Cyana for structure calculation (not covered, paid program): https://www.las.jp/english/products/cyana.html

Unio for automated NOESY assignment (coupled to Cyana, so not covered): http://unio-nmr.fr/

BioMagResBank -compiles all NMR data relevant to biology, from protein structure assignments to tools and databases: https://bmrb.io/

Data files for use in the course:

Practical on spectra navigation, assignment and structure calculation: download here

Practical on live monitoring of a protein phosphorylation reaction: download here

X-ray diffraction

XDS (for X-ray Diffraction data processing; Linux/Mac only https://xds.mr.mpg.de/)

Phenix https://phenix-online.org (Xray structure determination, and refinement. CryoEM refinement)

CCP4 suite (https://www.ccp4.ac.uk/download/#os=mac) (Xray structure determination and refinement)

COOT Xray and EM model building (included in CCP4 installation (Win/Mac/Linux) OR Linux: https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/ OR Linux/Mac : https://phenix-online.org/download/other.html OR Windows: https://bernhardcl.github.io/coot/) Note: only CCP4 versions on Linux/Mac are the latest

X11/XQuartz — Essential to install CCP4 or Coot on a Mac: https://www.xquartz.org

Cryo-electron microscopy

CryoSPARC — https://cryosparc.com (installed on server already, access via Firefox browser)

Relion -https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page

EMAN2 — https://blake.bcm.edu/emanwiki/EMAN2

SPHIRE — https://sphire.mpg.de/wiki/doku.php

www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs, check my services page here. You can contact me here.

Science
Biology
Bioinformatics
Technology
Alphafold
Recommended from ReadMedium