Key Websites and Programs for Structural Biology and Bioinformatics
Notes from a real-life course for Master and PhD levels
Introduction
→ Skip introduction and see the links
Modern structural biology relies heavily on computing for a variety of reasons. First and foremost, the sheer amount of data generated by structural biology experiments, especially the high-resolution techniques such as X-ray crystallography, cryo-electron microscopy (CryoEM) and nuclear magnetic resonance spectroscopy (NMR), is vast and requires powerful computational resources to process and analyze it. Additionally, deep characterization of the complex nature of biological structures often requires methods that are intrinsically computational, such as molecular dynamics simulations (see this) or some kind of advanced tool for prediction, such as the popular AlphaFold (see here), to fully understand their function and interactions.
Furthermore, most structural biology techniques require specialized programs to process and analyze the data. For example, data from cryo-electron microscopy or X-ray diffraction is acquired with software that comes together with hardware, and then inspected, processed and analyzed with programs like Relion, EMAN2, cryoSPARC, FREALIGN, Coot, and other specialized apps for image/diffraction processing and analysis. Totally different in nature, NMR data is also most of the time processed with software that comes with the instruments, and then there are a series of programs specialized for data acquisition, processing and analysis in different domains, depending if you are using the technique to identify a small molecule, or to assign a protein’s resonances, or to measure relation or diffusion, to mention some. The same happens with other lower-resolution techniques: most of the time, software and formats are not convertible; think of mass spectrometry data, small-angle scattering, spectroscopies such as fluorescence or infrared, etc. The data is of varied nature, often requires very different methods to be processed and even displayed, and instruments for different techniques are most often built by different companies.
The field of structural biology also heavily relies on various web-based resources, such as databases and servers, to store, share, and analyze large amounts of structural data. Or even to make predictions, as we’ll also cover here superficially and focusing only on protein structure prediction among a myriad of other kinds of predictions that can be done -say of localization sequences, ORFs, genome annotations, etc. The main structural resources include the Worldwide Protein Data Bank (wwPDB) and its tens of related sites, and the databases of structures predicted through novel methods such as AlphaFold. Especially the wwPDB, these resources are critical for the field of structural biology as they provide easy access to structural data in a standardized way connected to other databases and also to publications.
And there’s yet another cornerstone of modern structural biology that relies on computing: molecular graphics programs such as such as Pymol, Coot, Chimera or VMD, among the best-known ones. These programs are essential for even the simplest tasks related to visualizing molecular structures. Most provide some general tools and features, but also specialize in different niches; for example, VMD is specialized for molecular simulations, Coot for X-ray diffraction, Chimera is widely used for CryoEM maps, etc. In the near future, tools based on augmented reality or virtual reality could kick in and revolutionize the field, by allowing not only true 3D visualization but also very natural handling of molecules, even with two hands which is impossible with current software that uses only the mouse as input.
This article provides a curated list of essential software and online tools for structural biology, grouped by topic and with some useful comments along the way. I compiled this list from an actual hands-on course on structural biology that I lecture with my colleagues at EPFL. We make students install or access once these tools before each class on each topic, and then offer them hands-on activities to carry out on them. We usually provide a list only, so this document is useful as an extension with minimal explanations about each program or resource.
Jump to: - Structures of biological macromolecules - Molecular visualization - Molecular modeling - Nuclear Magnetic Resonance - X-ray diffraction by protein crystals - Cryo-electron microscopy
Structures of biological macromolecules
Sequences and sequence analyses for structural aims
https://www.uniprot.org/ -a comprehensive, freely accessible database of protein sequence with annotations about structure, function, evolution, localization, etc. including cross-references to other relevant databases.
Some predictors of transmembrane helices: https://services.healthtech.dtu.dk/service.php?TMHMM-2.0
One of a few predictors of transmembrane beta barrels: http://www.compgen.org/tools/PRED-TMBB2
One among tens of disorder predictors: https://iupred.elte.hu/
MoRF prediction: https://morf.msl.ubc.ca/index.xhtml -MoRF stands for molecular recognition feature, which is a short, disordered segment of a protein that undergoes ordering upon specific binding to a target, to which it usually binds to effect a function.
One of a few coiled-coil prediction tools: http://cb.csail.mit.edu/cb/multicoil2/cgi-bin/multicoil2.cgi
Experimental structures
https://www.rcsb.org/ -The main US-based sub-site of the wwPDB.
Clickable chart for PDB and its related databases:, at my website: http://lucianoabriata.altervista.org/papersdata/bib2016.html
https://www.ebi.ac.uk/emdb/ -The resource that collects 3D EM maps and associated experimental data determined using electron microscopy or tomography of biological specimens.
Modeled structures
AlphaFold — EBI database by Deepmind/EBI: https://alphafold.ebi.ac.uk/
AlphaFill (AlphaFold models enriched with ligands and co-factors): https://alphafill.eu/
ESM Atlas by Meta: https://esmatlas.com/
Integrative PDB: https://pdb-dev.wwpdb.org/
SAXS-based modeling: https://www.sasbdb.org/
Model Archive by the Swiss Institute of Bioinformatics: https://www.modelarchive.org/
All models of protein structures ever generated for CASP: https://predictioncenter.org/download_area/
Structural analysis and manipulation websites/programs
Foldseek (search your PDB query for structurally similar entries in the PDB or AlphaFold database): https://search.foldseek.com/
DALI to search the PDB and EBI-AlphaFold: http://ekhidna2.biocenter.helsinki.fi/dali/
Electrostatics: https://server.poissonboltzmann.org/
Sequence-independent structural alignment with TM scores: https://zhanggroup.org/TM-align/
Find potential ligands in structure: https://zhanggroup.org/COFACTOR/
PDB manipulation suite (shift numbers, chains, map B-factors, etc.): http://lucianoabriata.altervista.org/pdbms/index.html
Sequence manipulation suite: https://www.bioinformatics.org/sms2/
Molecular Visualization programs and websites
Visualization is key, and while we and others work on bringing augmented and virtual realities to make it all more immersive and intuitive, nowadays we are stuck in mouse-screen-flat screen interfaces. These are the programs that people actually use today:
https://pymol.org/2/ or the legacy PyMOL 0.99 from various sources like https://pymol.en.uptodown.com/windows/download
VMD (best for molecular simulations and formats from the simulation world): https://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=VMD
ChimeraX: https://www.cgl.ucsf.edu/chimerax/
Prepare any structure for view in smartphone augmented reality or high-end virtual reality: https://molecularweb.epfl.ch/pages/pdb2ar.html
Molecular Modeling websites/programs
General protein modeling
Modern tools for protein modeling in lack of homology (pre-AlphaFold): http://lucianoabriata.altervista.org/papersdata/bib2020.html
ColabFold (various AlphaFold versions plus ESMFold and RoseTTAFold): https://github.com/sokrypton/ColabFold. Full tutorial here:
ESMFold: https://esmatlas.com/resources?action=fold
EBI-AlphaFold database: https://alphafold.ebi.ac.uk/
AlphaFill (AlphaFold models enriched with ligands and co-factors): https://alphafill.eu/
(C-)I-TASSER and related tools: https://zhanggroup.org/I-TASSER/ and https://zhanggroup.org/C-I-TASSER/
SwissModel for rapid template search and homology modeling including oligo state: https://swissmodel.expasy.org/
Docking: https://wenmr.science.uu.nl/ (see also AlphaFold-multimer in ColabFold!!)
Canonical structures
Build canonical coiled-coils: http://coiledcoils.chm.bris.ac.uk/ccbuilder2/builder
Larger biomolecular systems
OPM for proteins in membranes: https://opm.phar.umich.edu/
CHARMM-GUI: (website to build complex models and also to parametrize them for molecular dynamics simulations)
Proteins in membrane or just membranes: https://charmm-gui.org/?doc=input/membrane.bilayer
Micelles: https://charmm-gui.org/?doc=input/membrane.micelle
Nanodisks: https://charmm-gui.org/?doc=input/membrane.nanodisc
Explore more: LPS modeling, glycans, glycolypids, organic polymers, carbon nanotubes, etc.
Soon: build systems by moving molecules with your bare hands!
Small molecules
Hack-a-mol (create small molecules and interconvert them in different formats: https://chemapps.stolaf.edu/jmol/jsmol/hackamol.htm)
Nuclear Magnetic Resonance software and resources
TopSpin (Bruker’s software for data acquisition and analysis on their instruments): https://www.bruker.com/en/products-and-solutions/mr/nmr-software/topspin.html?gclid=EAIaIQobChMIxu-PkJPm9QIVzed3Ch26hAIgEAAYASAAEgI0hvD_BwE
CARA for spectral assignment: http://cara.nmr.ch/doku.php/cara_downloads
Sparky-NMRfam for spectral analysis: https://nmrfam.wisc.edu/nmrfam-sparky-distribution/
Cyana for structure calculation (not covered, paid program): https://www.las.jp/english/products/cyana.html
Unio for automated NOESY assignment (coupled to Cyana, so not covered): http://unio-nmr.fr/
BioMagResBank -compiles all NMR data relevant to biology, from protein structure assignments to tools and databases: https://bmrb.io/
Data files for use in the course:
→ Practical on spectra navigation, assignment and structure calculation: download here
→ Practical on live monitoring of a protein phosphorylation reaction: download here
X-ray diffraction
XDS (for X-ray Diffraction data processing; Linux/Mac only https://xds.mr.mpg.de/)
Phenix https://phenix-online.org (Xray structure determination, and refinement. CryoEM refinement)
CCP4 suite (https://www.ccp4.ac.uk/download/#os=mac) (Xray structure determination and refinement)
COOT Xray and EM model building (included in CCP4 installation (Win/Mac/Linux) OR Linux: https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/ OR Linux/Mac : https://phenix-online.org/download/other.html OR Windows: https://bernhardcl.github.io/coot/) Note: only CCP4 versions on Linux/Mac are the latest
X11/XQuartz — Essential to install CCP4 or Coot on a Mac: https://www.xquartz.org
Cryo-electron microscopy
CryoSPARC — https://cryosparc.com (installed on server already, access via Firefox browser)
Relion -https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Main_Page
EMAN2 — https://blake.bcm.edu/emanwiki/EMAN2
SPHIRE — https://sphire.mpg.de/wiki/doku.php
www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs, check my services page here. You can contact me here.





