Main Menu

Knowledge Management Center for Illuminating the Druggable Genome

Avi Ma’ayan, PhD
Principal Investigator
Professor, Department of Pharmacological Sciences
Director, Mount Sinai Center for Bioinformatics
Icahn School of Medicine at Mount Sinai
NIH grant number: U24CA224260-01


To better understand the function of the understudied protein targets, which are the focus of the implementation phase of the Illuminating the Druggable Genome (IDG) project, we impute knowledge using machine learning strategies. To establish this classification system, we organize data from many omics- and literature-based resources into attribute tables where genes are the rows and their attributes are the columns. Examples of such attribute tables include gene or protein expression in cancer cell lines (CCLE) or human tissues (GTEx), changes in expression in response to drug perturbations or single-gene knockdowns (LINCS), regulation by transcription factors based on ChIP-seq data (ENCODE), and phenotypes in mice observed when single genes are knocked out (KOMP). In total, we process and abstract data from over 100 resources. We then predict target function, target association with pathways, small-molecules/drugs that modulate the activity and expression of the target, and target relevance to human disease. Overall, the KMC-ISMMS develops a useful resource that will accelerate target and drug discovery.

RCHS4 user interface
Diverse datasets from different resources are organized into attribute tables to perform machine learning strategies to impute knowledge about gene function of the understudied targets of IDG.

Screenshot from the ARCHS4 user interface: For developing the ARCHS4 resource, all available FASTQ files from RNA-seq experiments were retrieved from the Gene Expression Omnibus (GEO) and aligned using a cloud-based infrastructure. In total 137,792 samples are accessible through ARCHS4 with 72,363 mouse and 65,429 human samples. Through efficient use of cloud resources and dockerized deployment of the sequencing pipeline, the alignment cost per sample is reduced to less than one cent. The ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene landing pages that provide average expression across cell lines and tissues, top co-expressed genes, and predicted biological functions and protein-protein interactions for each gene, including all the IDG targets of interest, based on prior knowledge combined with co-expression data.



You Tube


KMC-ISMMS publications:

  1. Wang Z, He E, Sani K, Jagodnik KM, Silverstein M, Ma'ayan A. Drug Gene Budger (DGB): An application for ranking drugs to modulate a specific gene based on transcriptomic signatures. Bioinformatics. 2018 Aug 31. doi: 10.1093/bioinformatics/bty763. PMID: 30169739

  2. Clarke DJB, Kuleshov MV, Schilder BM, Torre D, Duffy ME, Keenan AB, Lachmann A, Feldmann AS, Gundersen GW, Silverstein MC, Wang Z, Ma'ayan A. eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Research 2018 Jul 2;46(W1):W171-W179. PMID: 29800326

  3. Grimes M, Hall B, Foltz L, Levy T, Rikova K, Gaiser J, Cook W, Smirnova E, Wheeler T, Clark NR, Lachmann A, Zhang B, Hornbeck P, Ma'ayan A, Comb M. Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks. Science Signaling 2018 May 22;11(531). pii: eaaq1087. PMID: 29789295

  4. Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma'ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 2018 Apr 10;9(1):1366. PMID: 29636450

  5. Torre D, Krawczuk P, Jagodnik KM, Lachmann A, Wang Z, Wang L, Kuleshov MV, Ma'ayan A. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Scientific Data 2018 Feb 27;5:180023. PMID: 29485625

  6. Wang Z, Lachmann A, Keenan AB, Ma'ayan A. L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics. 2018 Jun 15;34(12):2150-2152. PMID: 29420694

  7. Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma’ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. Journal of Biomedical Informatics 2017 Nov 4. pii: S1532-0464(17)30240-X. PMID: 29113935

  8. Niepel M, Hafner M, Duan Q, Wang Z, Paull EO, Chung M, Lu X, Stuart JM, Golub TR, Subramanian A, Ma'ayan A, Sorger PK. Common and cell-type specific responses to anti-cancer drugs revealed by high throughput transcript profiling. Nature Communications 2017 Oct 30;8(1):1186. PMID: 29084964

  9. Fernandez NF, Gundersen GW, Rahman A, Grimes ML, Rikova K, Hornbeck P, Ma'ayan A. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data 2017 Oct 10;4:170151. PMID: 28994825

  10. Asada N, Kunisaki Y, Pierce H, Wang Z, Fernandez NF, Birbrair A, Ma'ayan A, Frenette PS. Differential cytokine contributions of perivascular haematopoietic stem cell niches. Nature Cell Biology 2017 Mar;19(3):214-223. PMID: 28218906

  11. Shameer K, Glicksberg BS, Hodos R, Johnson KW, Badgeley MA, Readhead B, Tomlinson MS, O'Connor T, Miotto R, Kidd BA, Chen R, Ma'ayan A, Dudley JT. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Briefings in Bioinformatics 2017 Feb 15. PMID: 28200013

  12. Gundersen GW, Jagodnik KM, Woodland H, Fernandez NF, Sani K, Dohlman AB, Ung PM, Monteiro CD, Schlessinger A, Ma'ayan A. GEN3VA: aggregation and analysis of gene expression signatures from related studies. BMC Bioinformatics 2016 Nov 15;17(1):461. PMID: 27846806

  13. Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, Jenkins SL, Feldmann AS, Hu KS, McDermott MG, Duan Q, Clark NR, Jones MR, Kou Y, Goff T, Woodland H, Amaral FM, Szeto GL, Fuchs O, Schüssler-Fiorenza Rose SM, Sharma S, Schwartz U, Bausela XB, Szymkiewicz M, Maroulis V, Salykin A, Barra CM, Kruth CD, Bongio NJ, Mathur V, Todoric RD, Rubin UE, Malatras A, Fulp CT, Galindo JA, Motiejunaite R, Jüschke C, Dishuck PC, Lahl K, Jafari M, Aibar S, Zaravinos A, Steenhuizen LH, Allison LR, Gamallo P, de Andres Segura F, Dae Devlin T, Pérez-García V, Ma'ayan A. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications 2016 Sep 26;7:12846. PMID: 27667448

  14. Wang Z, Clark NR, Ma'ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics 2016 Aug 1;32(15):2338-45. PMID: 27153606

  15. Wang Z, Ma'ayan A. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study. F1000Research 2016 Jul 5;5:1574. PMID: 27583132

  16. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016 Jul 3;2016. PMID: 27374120/a>

  17. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016 Jul 8;44(W1):W90-7. PMID: 27141961

  18. Duan Q, Reid SP, Clark NR, Wang Z, Fernandez NF, Rouillard AD, Readhead B, Tritsch SR, Hodos R, Hafner M, Niepel M, Sorger PK, Dudley JT, Bavari S, Panchal RG, Ma'ayan A. L1000CDS2: LINCS L1000 characteristic direction signatures search engine. NPJ Systems Biology and Applications 2016;2. pii: 16015. PMID: 28413689

  19. Khan JA, Mendelson A, Kunisaki Y, Birbrair A, Kou Y, Arnal-Estapé A, Pinho S, Ciero P, Nakahara F, Ma'ayan A, Bergman A, Merad M, Frenette PS. Fetal liver hematopoietic stem cell niches associate with portal vessels. Science 2016 Jan 8;351(6269):176-80. PMID: 26634440

  20. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, Bottinger EP, Dudley JT. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science Translational Medicine 2015 Oct 28;7(311):311ra174. PMID: 26511511

  21. Rouillard AD, Wang Z, Ma'ayan A. Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction. Computational Biology and Chemistry 2015 Oct;58:104-19. PMID: 26101093

  22. Gundersen GW, Jones MR, Rouillard AD, Kou Y, Monteiro CD, Feldmann AS, Hu KS, Ma'ayan A. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions. Bioinformatics 2015 Sep 15;31(18):3060-2. PMID: 25971742

  23. Wang Z, Clark NR, Ma'ayan A. Dynamics of the discovery process of protein-protein interactions from low content studies. BMC Systems Biology 2015 Jun 6;9:26. PMID: 26048415

  24. Duan Q, Wang Z, Fernandez NF, Rouillard AD, Tan CM, Benes CH, Ma'ayan A. Drug/Cell-line Browser: interactive canvas visualization of cancer drug/cell-line viability assay datasets. Bioinformatics 2014 Nov 15;30(22):3289-90. PMID: 25100688

  25. Ma'ayan A, Duan Q. A blueprint of cell identity. Nature Biotechnology 2014 Oct;32(10):1007-8. PMID: 25299921

  26. Ma'ayan A, Rouillard AD, Clark NR, Wang Z, Duan Q, Kou Y. Lean Big Data integration in systems biology and systems pharmacology. Trends in Pharmacological Sciences 2014 Sep;35(9):450-60. PMID: 25109570

  27. Duan Q, Flynn C, Niepel M, Hafner M, Muhlich JL, Fernandez NF, Rouillard AD, Tan CM, Chen EY, Golub TR, Sorger PK, Subramanian A, Ma'ayan A. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures. Nucleic Acids Research 2014 Jul;42(Web Server issue):W449-60. PMID: 24906883