Thesis (Ph.D.) - Indiana University, Luddy School of Informatics, Computing, and Engineering, 2022
What is the strongest biomedical evidence about a disease for discovery of novel pharmaceutical therapies? This is a fundamental challenge for biomedical scientists, but also directly translates to a parallel question for informatics and data science: Can we systematically assemble and query biomedical heterogeneous knowledge graphs in a computational discovery platform guided by rational, algorithmic measures of relevance and confidence, facilitating scientific discovery? And, how have continuing waves of scientific and technological progress informed and empowered these inquiries?
The research described herein consists of several projects unified by this common theme, each from a distinct area of molecular biomedicine. The three main projects are (1) Badapple: Bioassay data associative promiscuity prediction learning engine, (2) TIGA: Target illumination GWAS analytics, and (3) KGAP: Knowledge graph analytics platform.
Badapple employs empirical bioassay data from PubChem and the NIH Molecular Libraries Program to recognize patterns of promiscuity (non-selectivity), associated with molecular scaffolds. KGAP combines data from two NIH programs, LINCS (Library of integrated network-based cell signatures), i.e. genomic signatures, and IDG (Illuminating the druggable genome) to generate and evaluate hypotheses for novel drug targets from gene expression profiles. TIGA processes data from the NHGRI-EBI GWAS Catalog to aggregate experimental genome wide variant to trait associations as novel drug target hypotheses. Peer-reviewed papers, with the author as first author, have been published, for Badapple in 2016, TIGA in 2021 and KGAP in 2022. Relevant portions of other projects are also described, each reinforcing the common theme, that scientific discovery is empowered by rational, algorithmic, semantic, domain-aware assembly and querying of knowledge graphs.