avatarDr. Alessandro Crimi

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4482

Abstract

-COV2 protein but tested also in other related COVID19 proteins.</p><p id="c66f"><b>As tons of reviews about this tool have been already all over the media, I limit to share the repository</b> for the first version <a href="https://github.com/deepmind/deepmind-research/tree/master/alphafold_casp13">here</a> which won the CASP13 in 2018 which is based on Tensorflow 1.14. The original paper was published in <a href="https://www.nature.com/articles/s41586-019-1923-7.epdf?author_access_token=Z_KaZKDqtKzbE7Wd5HtwI9RgN0jAjWel9jnR3ZoTv0MCcgAwHMgRx9mvLjNQdB2TlQQaa7l420UCtGo8vYQ39gg8lFWR9mAZtvsN_1PrccXfIbc6e-tGSgazNL_XdtQzn1PHfy21qdcxV7Pw-k3htw%3D%3D">Nature 2020</a>.</p><p id="d8cb">Alphafold 2 has been preseted but a “proper article” and the official repository has not been published. Until there will be, the official publication is the <a href="https://predictioncenter.org/casp14/doc/CASP14_Abstracts.pdf">abstract of Jumper et al. at CASP14</a> and the unofficial implementations based on Pytorch available at <a href="https://github.com/lucidrains/alphafold2">https://github.com/lucidrains/alphafold2</a></p><p id="8867">The major difference between the tools is that AlphaFold 1 and AlphaFold 2 are in the used neural network architecture. Version 1 used concurrent neural networks (CNNs) and version 2 uses <a href="https://arxiv.org/abs/1706.03762">Transformers</a>.</p><figure id="2d85"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*PH-ZdTdXZpLodVPOKEMwkg.png"><figcaption>Alpha Fold 2 architecture summary. Credits: <a href="https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology">Deepmind</a>.</figcaption></figure><h1 id="8c2a">Generative Tensorial Reinforcement Learning</h1><p id="fef1">The generative<a href="https://www.nature.com/articles/s41587-019-0224-x"> Tensorial Reinforcement Learning (GENTRL)</a> tool is developed by <a href="https://insilico.com/">Insilico Medicine</a> to accelerates the process of experimental validation. At the core, there is a variational autoencoder. The framework uses tensor decompositions to encode the relations between molecular structures and their properties and to learn on data with missing values. As the name said it is based on reinforcement learning and a typical pipeline as below.</p><figure id="7d9e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*9TbEvBk5Awm1NmrjU1qlZw.png"><figcaption>Credits: <a href="https://github.com/insilicomedicine/GENTRL">GENTRL</a></figcaption></figure><p id="f0fd">The tools are available on the <a href="https://github.com/insilicomedicine/gentrl">repository</a>, and it is based on Pytorch and RDKIT. It has been used to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases. In the<a href="https://www.nature.com/articles/s41587-019-0224-x"> reported experiment</a>, the entire process took only 21 days with the then-available computers.</p><h1 id="5235">DeepChem</h1><p id="4a9d">DeepChem is an open-source deep learning framework aiming at democratizing drug discovery. The features listed on their website are</p><blockquote id="1158"><p>Predict the solubility of small drug-like molecules</p></blockquote><blockquote id="328f"><p>Predict binding affinity for small molecule to protein targets</p></blockquote><blockquote id="df9a"><p>Predict physical properties of simple materials</p></blockquote><blockquote id="00c3"><p>Analyze protein structures and extract useful descriptors</p></blockquote><blockquote id="5c36"><p>Count the number of cells in a microscopy image</p></blockquote><p id="07cb">It uses both <a href="https://analyticsindiamag.com/google-announces-tensorflow-quantum-0-5-0-expected-features-updates/">Google TensorFlow</a> and scikit-learn machine learning tasks and <a href="https://www.rdkit.org/">RDKit</a> for operations on molecular data, like converting SMILES strings into molecular graphs.</p> <figure id="fd5a"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FGnkpVjp117k%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DGnkpVjp117k&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FGnkpVjp117k%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=te

Options

xt%2Fhtml&schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854"> </div> </div> </figure></iframe></div></div></figure><h1 id="2c2f">Open Drug Discovery Toolkit</h1><p id="c2bf"><a href="https://github.com/oddt/oddt">Open Drug Discovery Toolkit (ODDT)</a> is a modular and comprehensive toolkit written in Python for use in cheminformatics, molecular modeling. It does not take advantage of a well-known deep learning library and it requires only SK-learn.</p><p id="a82f">ODDT includes many state-of-the-art methods, like machine learning scoring functions (RF-Score and NNScore). The goal is to create a tool that other people can customize for their computer-aided drug discovery. Apart from the tutorial linked in the repository, there is a <a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-015-0078-2">reference paper published in the Journal of Cheminformatics</a>. To manage and translate molecular data it requires installed the already mentioned RDkit or <a href="http://openbabel.org/wiki/Main_Page">Openababel</a>.</p><p id="2e00">As interesting features, it includes 3 <a href="https://oddt.readthedocs.io/en/latest/#id15">Molecular shape comparison</a> methods (USR, USRCAT, and Electroshape)</p><h1 id="afc8">Unified rational protein engineering with sequence-based deep representation learning (UNIREP)</h1><p id="8964"><a href="https://github.com/churchlab/UniRep">Unified rational protein engineering with sequence-based deep representation learning (Unirep)</a> is a deep learning architecture which from unlabeled amino-acid sequences and phytochemistry can predict features of a protein that are semantically, evolutionarily, structurally, and biophysically grounded. It is an approach based on LSTM (<b>mLSTM</b>), a <a href="https://arxiv.org/abs/1609.07959v3">recurrent neural network architecture for sequence modeling that combines the long short-term memory</a> (LSTM), which predicts the stability of natural and de novo designed proteins.</p><p id="b440">The code is freely available on <a href="https://github.com/churchlab/UniRep">GitHub</a>, and the reference paper was published on <a href="https://www.nature.com/articles/s41592-019-0598-1">Nature Methods 2019</a>.</p><figure id="eaab"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*JtP3rWb7xhIJZAPcrj2U0w.png"><figcaption>Workflow to learn and apply deep protein representations. Credits: <a href="https://www.nature.com/articles/s41592-019-0598-1">Alley et al.</a></figcaption></figure><h1 id="2199">Conclusions</h1><p id="b0f0">Until now, many pharmaceutical companies are still experiencing difficulties in drug design due to the high costs and time needed. Computational tools, especially those based on machine learning holds the promise to facilitate this process. A simple high-throughput screening library usually uses about 1 million compounds, where each compound typically costs 50–100 USD. Doing the same in-silico requires just a computer with a good graphic card, you can do the math.</p><p id="5bdd">Nevertheless, it is important to bear in mind that deep learning models are still something like a black box. A tremendous amount of work has been done to incorporate AI tools to expedite the drug discovery cycle, but further successful implementations of these tools will be necessary before the full potential of AI in drug discovery can be realized. It is highly appreciated that even commercial companies offer open-source tools and are open-minded to have communities expanding their tools.</p><h1 id="553b">CONNECT IF YOU WANT</h1><figure id="49c0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*Mo18xbuW2h6mX7eL"><figcaption><a href="https://twitter.com/Dr_Alex_Crimi">@Dr_Alex_Crimi</a></figcaption></figure><figure id="7a1f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*DBj1eyp2M3D7p9Sn"><figcaption><a href="https://www.instagram.com/dr.alecrimi/">@dr.alecrimi</a></figcaption></figure><figure id="8a62"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*_ROpVjailyNwRDfl.png"><figcaption><a href="https://www.youtube.com/alecrimi">Alessandro Crimi — YouTube</a></figcaption></figure><figure id="e883"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*T1_R-LvwpBrcADEU"><figcaption><a href="https://web.facebook.com/dralecrimi">https://web.facebook.com/dralecrimi</a></figcaption></figure></article></body>

5 Cool AI-Powered Drug Discovery Tools

Harnessing the power of machine learning with Open Source tools for drug discovery

Credits: Unsplash

Proteins, made up of chains of amino acids, are the building blocks of life, drug discovery tasks generally include predicting protein folding, docking, and interactions. This has several applications from vaccines to other kinds of drugs especially for neurodegenerative diseases, and material science. As in other fields, biotech, biology, and medicine have been positively affected by the advancements in machine learning and quantum computing. The main advantages of in-silico drug design are related to the infinite variations which can be simulated and tried compared to tenfold costs and time in trial-and-error tests in the real world. Optimization procedures based on optimization look like the one shown in the video below:

Among the most popular tool for this kind of optimization is Ab-initio Rosetta which is based on the Montecarlo method. More recently quantum annealers are outperforming this kind of optimization by taking advantage of processes like quantum tunneling. Further applications based on machine learning techniques exist. I will skip commercial tools as Exscientia and quantum machine learning. If you are interested in this read this article below

Instead, in this article I list the most well-known open-source tools for drug discovery, linking their Github repositories and official papers:

  • AlphaFold
  • Generative Tensorial Reinforcement Learning (GENTRL)
  • DeepChem
  • Open Drug Discovery Toolkit (ODDT)
  • Unified rational protein engineering with sequence-based deep representation learning (UNIREP)

AlphaFold

Biotech News in the latest 2 years has been focused on this tool from Deepmind. Indeed the two software versions from DeepMind’s program outperformed the other 97 teams in a biennial protein-structure prediction challenge Critical Assessment of Structure Prediction (CASP) in the 13th and 14th editions. AlphaFold is practically a neural network that predicts the structure of a protein’s spatial graph given its DNA. It is used as input genetic sequences and with the combination of multiple sequence alignment, geometric representations, and deep learning to obtain the protein’s structural graph. The 14th CASP edition was focused on the ORF3a protein of the SARS-COV2 protein but tested also in other related COVID19 proteins.

As tons of reviews about this tool have been already all over the media, I limit to share the repository for the first version here which won the CASP13 in 2018 which is based on Tensorflow 1.14. The original paper was published in Nature 2020.

Alphafold 2 has been preseted but a “proper article” and the official repository has not been published. Until there will be, the official publication is the abstract of Jumper et al. at CASP14 and the unofficial implementations based on Pytorch available at https://github.com/lucidrains/alphafold2

The major difference between the tools is that AlphaFold 1 and AlphaFold 2 are in the used neural network architecture. Version 1 used concurrent neural networks (CNNs) and version 2 uses Transformers.

Alpha Fold 2 architecture summary. Credits: Deepmind.

Generative Tensorial Reinforcement Learning

The generative Tensorial Reinforcement Learning (GENTRL) tool is developed by Insilico Medicine to accelerates the process of experimental validation. At the core, there is a variational autoencoder. The framework uses tensor decompositions to encode the relations between molecular structures and their properties and to learn on data with missing values. As the name said it is based on reinforcement learning and a typical pipeline as below.

Credits: GENTRL

The tools are available on the repository, and it is based on Pytorch and RDKIT. It has been used to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases. In the reported experiment, the entire process took only 21 days with the then-available computers.

DeepChem

DeepChem is an open-source deep learning framework aiming at democratizing drug discovery. The features listed on their website are

Predict the solubility of small drug-like molecules

Predict binding affinity for small molecule to protein targets

Predict physical properties of simple materials

Analyze protein structures and extract useful descriptors

Count the number of cells in a microscopy image

It uses both Google TensorFlow and scikit-learn machine learning tasks and RDKit for operations on molecular data, like converting SMILES strings into molecular graphs.

Open Drug Discovery Toolkit

Open Drug Discovery Toolkit (ODDT) is a modular and comprehensive toolkit written in Python for use in cheminformatics, molecular modeling. It does not take advantage of a well-known deep learning library and it requires only SK-learn.

ODDT includes many state-of-the-art methods, like machine learning scoring functions (RF-Score and NNScore). The goal is to create a tool that other people can customize for their computer-aided drug discovery. Apart from the tutorial linked in the repository, there is a reference paper published in the Journal of Cheminformatics. To manage and translate molecular data it requires installed the already mentioned RDkit or Openababel.

As interesting features, it includes 3 Molecular shape comparison methods (USR, USRCAT, and Electroshape)

Unified rational protein engineering with sequence-based deep representation learning (UNIREP)

Unified rational protein engineering with sequence-based deep representation learning (Unirep) is a deep learning architecture which from unlabeled amino-acid sequences and phytochemistry can predict features of a protein that are semantically, evolutionarily, structurally, and biophysically grounded. It is an approach based on LSTM (mLSTM), a recurrent neural network architecture for sequence modeling that combines the long short-term memory (LSTM), which predicts the stability of natural and de novo designed proteins.

The code is freely available on GitHub, and the reference paper was published on Nature Methods 2019.

Workflow to learn and apply deep protein representations. Credits: Alley et al.

Conclusions

Until now, many pharmaceutical companies are still experiencing difficulties in drug design due to the high costs and time needed. Computational tools, especially those based on machine learning holds the promise to facilitate this process. A simple high-throughput screening library usually uses about 1 million compounds, where each compound typically costs 50–100 USD. Doing the same in-silico requires just a computer with a good graphic card, you can do the math.

Nevertheless, it is important to bear in mind that deep learning models are still something like a black box. A tremendous amount of work has been done to incorporate AI tools to expedite the drug discovery cycle, but further successful implementations of these tools will be necessary before the full potential of AI in drug discovery can be realized. It is highly appreciated that even commercial companies offer open-source tools and are open-minded to have communities expanding their tools.

CONNECT IF YOU WANT

@Dr_Alex_Crimi
@dr.alecrimi
Alessandro Crimi — YouTube
https://web.facebook.com/dralecrimi
Machine Learning
Deep Learning
Medicine
Technology
Science
Recommended from ReadMedium