Drug repurposing with graph neural networks for COVID-19 and Myotonic Dystrophy type 1

Start date: 01-02-2022
End date: 31-07-2022

Clinical Problem

The development of novel drugs is an expensive, time-consuming and risky process. Drug repurposing, where already tested and approved drugs are used for other indications than they were originally developed for, can be a possible solution to this problem. Drug repurposing can significantly shorten the development and testing process, decreasing developmental costs, duration and the risk of testing involved. This especially can be of great value for novel and rare diseases. This project presents a method that uses graph neural networks for link prediction on biomedical knowledge graphs in order to predict potential drug repurposing candidates. We focussed on the prediction of drug repurposing candidates for the novel COVID-19 disease and the rare disease Myotonic Dystrophy type 1 (DM1). The outbreak of COVID-19 led to a global pandemic in 2019. The contagious disease is caused by the SARS-CoV-2 virus. Its common symptoms are e.g. fever, cough and fatigue. More severe symptoms such as breathing difficulties can lead to hospitalization. DM1 is a rare, genetic, muscular disease that affects at least 1 in 8,000 people. DM1 is highly variable in severity, type of symptoms and the age of onset. Its common symptoms are muscle weakness, muscle stiffness, daytime sleepiness and disturbance of the heart rhythm. For both diseases, there is no effective treatment yet. Therefore, drug repurposing using graph neural networks can be of great value for these diseases.

Methods

In this study, the drug repurposing knowledge graph (DRKG) is used as the biomedical knowledge graph. This knowledge graph incorporates knowledge on many different diseases, including COVID-19 and DM1, represented in a triple format. For example, [drug, treats, disease]. image

To find drug repurposing candidates, a relational graph autoencoder (GAE) is implemented to learn a low-dimensional representation (embedding) of the knowledge graph. The model is optimized using balanced batches (i.e., minority classes are oversampled to account for edge-type imbalance) with a binary cross-entropy loss. Each node embedding is learnt using the local structure of the neighbourhood of a node. Potential drug repurposing candidates are computed by a link prediction task, where the similarity between two node embeddings of interest is computed. Nodes are likely to be connected when the node embeddings show high similarity. We are therefore looking for drug and disease nodes with similar embeddings that are not yet connected in the original knowledge graph. The embeddings are evaluated using AUROC and AUPRC. image

We enriched the generic DRKG using experimental disease-specific information. For COVID-19, an analysis on transcriptomic data of hospitalised patients was conducted to identify specific gene-gene relations that were added to the network. For DM1, an analysis on transcriptomic data measured in the brain, blood and muscles was used to identify relevant genes and gene-gene relations. Additionally, we experimented with knowledge directly linked to the disease of interest. For COVID-19, we used novel knowledge obtained from the fast-growing amount of research. For DM1, we used a dataset on symptoms and associated genes. We investigated how the predictions of these enriched knowledge graphs differed from the predictions made using the generic knowledge graph and whether this leads to more relevant results.

Results

As we investigated 2 diseases, the results will be described for both diseases below. A more in-depth analysis can be found in the theses.

COVID-19

Generic knowledge graph

The table below shows the top-10 ranked drug repurposing candidates for COVID-19 based on the generic knowledge graph. As you can see, there are mostly generic and irrelevant drugs in this top-10.

Drugname Probability Out-degree
Cyclosporine 0.989 2924
Colchicine 0.987 964
Methotrexate 0.982 2030
Cyclophosphamide 0.979 1435
Cisplatin 0.978 1868
Simvastatin 0.976 1569
Paclitaxel 0.975 1505
Curcumin 0.974 544
Doxorubicin 0.974 2011
Cholesterol 0.974 854

Disease-specific gene-expression knowledge graph

After adding the gene-expression data, the following top-10 is obtained. Again, we see mostly generic and irrelevant drugs. One interesting drug in this top-10 is dexamethasone. This is an anti-inflammatory that is FDA approved as a COVID-19 treatment. It is found that this treatment reduces ventilation and oxygen need. However, there is some overlap with the top-10 obtained from the generic knowledge graph. Therefore, we looked at the differences between predicted probabilities of all drugs for both knowledge graphs, shown in the figure below.

Drugname Probability Degree
Cyclosporine 0.991 2924
Clarithromycin 0.987 783
Methotrexate 0.980 2030
Ketoconazole 0.978 874
Fluconazole 0.977 1503
Dexamethasone 0.977 2158
Azithromycin 0.974 1187
Itraconazole 0.974 806
Prednisolone 0.974 1135
Dextrose, unspecified form 0.973 1691

image

From this image, it can be observed that there is a small effect on some of the drugs. Most drugs are pushed to the bottom of the ranking. However, there are also some drugs pushed up the ranking. These are potentially interesting as they are positively affected by adding COVID-19 related data. For a closer look into these drugs, we refer to the thesis.

New direct evidence knowledge graph

From adding new direct evidence, the following top-10 drug repurposing candidates are obtained. As seen before, there are mostly generic and irrelevant drugs observed here. Again, cyclosporine is ranked as the best drug repurposing candidate. However, from the figure below, it can be observed that the drugs are substantially more affected by adding the new direct evidence. This implies that the effect of adding new evidence that is directly linked to the disease node has a bigger effect on the ranking. For a closer look into these affected drugs, we refer to the thesis.

Drugname Probability Degree
Cyclosporine 0.985 2924
Clarithromycin 0.975 783
Itraconazole 0.973 806
Fluconazole 0.970 1503
Zidovudine 0.966 1127
Colchicine 0.965 964
Ketoconazole 0.960 874
Azithromycin 0.956 1187
Methotrexate 0.955 2030
Omeprazole 0.954 1215

image

The results above show that the biggest effect is observed when the generic knowledge graph is enhanced with novel knowledge that is directly linked to the disease node of interest. Although a bigger difference does not imply better results, there is at least more effect from adding this data.

DM1

In the table below, the top-10 predicted drugs using the generic DRKG can be found. Norepinephrine, Epinephrine & Dopamine at first seemed interested as they are linked to the treatment of symptoms related to DM1. However, in practice, they are only used in severe cases which makes them unlikely drug repurposing candidates. All other drugs are likely to be irrelevant.

Drug name Probability Out-degree
Ethanol 0.992 1637
Dextrose 0.990 1691
Oxygen 0.983 967
Cholesterol 0.982 854
Norepinephrine 0.978 876
Epinephrine 0.977 992
Melatonin 0.977 1038
Dopamine 0.977 1047
Ascorbic acid 0.976 707
Curcumin 0.976 544

Extending DRKG with disease-specific gene-gene edges

In the first series of experiments, DRKG was enriched with disease-specific gene-gene edges derived from the aforementioned gene-expression datasets. It was observed that adding these edges only had a limited impact on the ranking (i.e., the probabilities and ranking positions of the drugs did not change a lot). Below, the top-10 predicted drugs using the DRKG extended with DM1-related gene-gene edges derived from blood gene-expression data can be seen. This ranking differed most from the ranking predicted using the generic knowledge graph. 4 drugs are overlapping with the generic ranking. Metformin is an interesting prediction as it showed efficacy for DM1 in preclinical research. However, it is connected via a direct edge in DRKG. All other drugs are either irrelevant or do not show a clear connection to DM1.

Drug name Probability Out-degree
Ethanol 0.995 1637
Dextrose 0.990 1691
Melatonin 0.990 1038
Nitric Oxide 0.989 1590
Dexamethasone 0.989 2158
Acetylsalicylic acid 0.989 1492
Metformin 0.989 1436
Progesterone 0.985 1768
Nicotine 0.985 1444
Oxygen 0.984 967

Below, the predicted probability for each drug in the generic ranking is plotted against the probability predicted in this experiment. As can be seen, probabilities are more similar at the top and bottom than they are in between. These drugs are analysed in detail in the thesis.

image

Extending DRKG with DM1 related symptoms and associated genes

In the second experiment, DRKG was extended with knowledge on DM1-related symptoms and associated genes. The ranking and probabilities per drug showed substantially more different from the generic ranking compared to the gene-expression experiments as also shown in the scatter plot below. 2 drugs are overlapping with the generic ranking. Phenytoin and Colchicine are interesting predictions. Phenytoin is used for the treatment of Myotonia, which is a symptom of DM1. Colchicine showed efficacy in previous research. Again, for all other drugs, no direct link to DM1 or one of its symptoms was found.

Drug name Probability Out-degree
Phenytoin 0.982 1642
Colchicine 0.979 964
Dinoprostone 0.979 693
Vitamin E 0.978 480
Dexamethasone 0.977 2158
Acetylsalicylic acid 0.976 1492
Hydrocortisone 0.976 1550
Ethanol 0.976 1637
Epinephrine 0.975 992
Morphine 0.975 1678

The scatter plot below shows that the probabilities predicted for each drug differed more from the generic ranking compared to the gene-expression experiments. These drugs are also analysed in more detail in the thesis.

image

Similar to the results on COVID-19, it can be seen that adding edges directly to the disease in the knowledge graph (knowledge on disease symptoms) results in a more different ranking. However, it did not lead to the prediction of drugs for which it is evident that they are useful for the treatment of DM1. Therefore, a more different ranking again does not necessarily mean a better ranking in terms of potential efficacy.

Conclusion

During this project, we developed a pipeline for the prediction of drug repurposing candidates using graph neural networks. In comparison to published work, we tried to make a more disease-specific knowledge graph. Potentially resulting in more relevant drug repurposing candidates. The results show that there are differences between the predicted rankings for both the diseases that were investigated. However, at a first glance, the top-ranked drug repurposing candidates are mostly generic and irrelevant. Furthermore, the top of the ranking is influenced by highly connected nodes, which introduces a bias in our results towards these highly connected nodes, making the results less reliable. Future work should be aimed at mitigating this bias. The results show that there is a clearer difference in predicted probabilities when enhancing the generic knowledge graph with direct edges compared to indirect edges (i.e. the gene-gene edges). However, it is hard to validate the results as there is no ground truth available as there is no cure for the diseases we investigated.

People

Djesse Dirckx

Djesse Dirckx

Master Student

CMBI, Radboudumc

Lotte Willems

Lotte Willems

Master Student

CMBI, Radboudumc

Peter-Bram t Hoen

Peter-Bram t Hoen

Professor

Bioinformatics, Radboudumc

Tom Heskes

Tom Heskes

Professor

Data Science, Radboud University