Know Your Drugs: A Knowledge Graph for Drug-Drug-Disease Interactions
Build a shared database for drugmakers, FDA, doctors, and patients
Disclaimer: This article does not provide medical advice. It is intended for informational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment.
In today’s healthcare landscape, treating complex diseases often requires taking multiple medications concurrently, such as the cocktail therapies against HIV or COVID. Synergistic or additive drug combinations can increase efficacy, reduce toxicity, shorten duration, or prevent drug resistance (1). While this approach can be helpful in the fight against cancer or infections, other times it is a bad idea because of harmful drug-drug interactions. These interactions, when one medication influences the effectiveness or safety of another, can range from mild inconveniences to life-threatening risks.
Navigating this complex drug-drug interaction landscape demands a shared responsibility among all stakeholders: drug manufacturers, the Food and Drug Administration (FDA), doctors, and most importantly, patients. Firstly, manufacturers and the FDA play a crucial role in thoroughly evaluating and labeling medications for potential interactions. Accurate and comprehensive labeling empowers healthcare professionals and patients to make informed decisions. Secondly, doctors are gatekeepers, carefully considering the potential interactions before issuing prescriptions. A proactive approach, including reviewing patient medication histories and consulting drug interaction databases, is vital to minimize risks. Finally, patients need to be actively involved in safeguarding their own health, especially those under the care of multiple specialists who may be unaware of the other prescriptions (1). They should ask healthcare providers informed questions about drug-drug interactions and report adverse drug reactions to the authorities.
The key to shared responsibility is shared data accessible to all four parties, especially the drug-drug interaction knowledge data. Unfortunately, this data is both rare and difficult to access, sitting either behind paywalls or within data tables. As a result, it is hard to develop an open, easy-to-use knowledge portal for the public.
This article addresses this exact issue. It leverages drug-drug and drug-disease data from kaggle.com to construct a user-friendly knowledge graph. On the one hand, this graph allows users to efficiently query interactions for existing drugs. On the other hand, the pipeline predicts potential interactions by analyzing the interactions of their chemically similar counterparts within the knowledge graph. The code for this project is hosted on my GitHub repository.
1. Data and architecture
I merged two Kaggle.com datasets for this project: Indian Medicine Data (Apache 2.0), which details drug-drug interactions and side effects, and 11000 Medicine Details (CC0: Public Domain), which provided additional side effect information. In addition, I retrieved the may_treat, may_prevent, contraindicated_with_disease, has_mechanism_of_action, has_structural_class, and has_therapeutic_class relations from RXNORM in UMLS. You can read more about UMLS and its data retrieval process in my previous article, Getting Insights from 3,000+ Clinical Trials in a Knowledge Graph.

The drugs’ chemical structures in SMILES format were fetched via the REST API from PubChem. They can be used later as references in molecular similarity searches. Finally, I attached the KEGG IDs to the drugs (Figure 1) for the KEGG users. The data were sorted into nodes and relations. The former were formatted into JSON, while the latter were stored in the TSV format.
2. The knowledge graph
The JSON and TSV files were then imported into a Neo4j database. There are five types of nodes in the knowledge graph (Figure 2). Among them, the Medicine and the Condition nodes are the most important (Figures 2 & 3). This small graph contained 499 medicines, 869 conditions, and 1,395 drug-drug interactions.


As Figures 2 and 3 show, there can be three types of relationships between Medicine and Condition: MAY_TREAT, MAY_PREVENT, and CONTRAINDICATED_WITH_DISEASE. Here, contraindication refers to a situation in which a medicine should not be used for a disease. For example, aspirin, a blood thinner, is generally contraindicated for those with bleeding disorders, as it can worsen the condition. Contraindications come in two categories: absolute and relative. Absolute contraindications represent a clear “no” due to high potential risks. For example, a severe penicillin allergy would be an absolute contraindication for taking any medication that contains penicillin. In contrast, relative contraindications require careful consideration, weighing the potential benefits against the risks involved. Interestingly, the knowledge graph may contain some hints about these two contraindications.
## Code 1
MATCH p=(m:Medicine)-[r:CONTRAINDICATED_WITH_DISEASE]->(c:Condition)
WHERE (m)-[:MAY_TREAT]->(c) OR (m)-[:MAY_PREVENT]->(c)
RETURN pTo identify potential relative contraindications, I searched for drugs that have either the MAY_TREAT or the MAY_PREVENT, and the CONTRAINDICATED_WITH_DISEASE relations with a particular disease.

The graph returned 20 such pairs, such as Loperamide ➡️ Diarrhea (1) and Digoxin ➡️ Atrial Fibrillation (2). Atrial Fibrillation, myocardial infarction, heart failure, atrial flutter, and six other diseases form a cluster.
Next, I modified the query slightly to search for absolute contraindications.
## Code 2
MATCH p=(m:Medicine)-[r:CONTRAINDICATED_WITH_DISEASE]->(c)
WHERE NOT (m)-[:MAY_TREAT]->(c) AND NOT (m)-[:MAY_PREVENT]->(c)
RETURN p LIMIT 50The idea behind this query is that if a disease only has a CONTRAINDICATED_WITH_DISEASE but no beneficial relationship with a drug, it is likely to be a case of absolute contraindication. The query yields pairs such as Glimepiride ➡️ Diabetic ketoacidosis, Warfarin ➡️ Hemorrhage, and 1,317 others.
Afterward, I asked for antagonistic drug pairs that treat the same disease but share some of the side effects.
## Code 3
MATCH (m1:Medicine) -[:MAY_TREAT]->(c:Condition) <-[:MAY_TREAT]-(m2:Medicine) -[:INTERACTS_WITH]->(m1)
WHERE size(apoc.coll.intersection(m1.side_effects, m2.side_effects)) > 0
RETURN DISTINCT m1.name, m2.name, apoc.coll.intersection(m1.side_effects, m2.side_effects)







