Knowledge Management Center for Illuminating the Druggable Genome


Avi Ma’ayan, PhD
Principal Investigator
Professor, Department of Pharmacological Sciences
Director, Mount Sinai Center for Bioinformatics
Icahn School of Medicine at Mount Sinai


To better understand the function of the understudied protein targets, which are the focus of the implementation phase of the Illuminating the Druggable Genome (IDG) project, we impute knowledge using machine learning strategies. To establish this classification system, we organize data from many omics- and literature-based resources into attribute tables where genes are the rows and their attributes are the columns. Examples of such attribute tables include gene or protein expression in cancer cell lines (CCLE) or human tissues (GTEx), changes in expression in response to drug perturbations or single-gene knockdowns (LINCS), regulation by transcription factors based on ChIP-seq data (ENCODE), and phenotypes in mice observed when single genes are knocked out (KOMP). In total, we process and abstract data from over 100 resources. We then predict target function, target association with pathways, small-molecules/drugs that modulate the activity and expression of the target, and target relevance to human disease. Overall, the KMC-ISMMS develops a useful resource that will accelerate target and drug discovery.

NIH grant number: U24CA224260-01

RCHS4 user interface

Diverse datasets from different resources are organized into attribute tables to perform machine learning strategies to impute knowledge about gene function of the understudied targets of IDG.

Screenshot from the ARCHS4 user interface: For developing the ARCHS4 resource, all available FASTQ files from RNA-seq experiments were retrieved from the Gene Expression Omnibus (GEO) and aligned using a cloud-based infrastructure. In total 137,792 samples are accessible through ARCHS4 with 72,363 mouse and 65,429 human samples. Through efficient use of cloud resources and dockerized deployment of the sequencing pipeline, the alignment cost per sample is reduced to less than one cent. The ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene landing pages that provide average expression across cell lines and tissues, top co-expressed genes, and predicted biological functions and protein-protein interactions for each gene, including all the IDG targets of interest, based on prior knowledge combined with co-expression data.



You Tube


KMC-ISMMS publications:

  1. Kuleshov MV, Stein DJ, Clarke DJB, Kropiwnicki E, Jagodnik KM, Bartal A, Evangelista JE, Hom J, Cheng M, Bailey A, Zhou A, Ferguson LB, Lachmann A, Ma'ayan A. The COVID-19 Drug and Gene Set Library. Patterns (N Y) 2020 Sep 11;1(6):100090. PMID: 32838343
  2. Wojciechowicz ML, Ma'ayan A. GPR84: an immune response dial? Nature Review Drug Discovery 2020 Jun;19(6):374. PMID: 32494048
  3. Edwards JJ, Rouillard AD, Fernandez NF, Wang Z, Lachmann A, Shankaran SS, Bisgrove BW, Demarest B, Turan N, Srivastava D, Bernstein D, Deanfield J, Giardini A, Porter G, Kim R, Roberts AE, Newburger JW, Goldmuntz E, Brueckner M, Lifton RP, Seidman CE, Chung WK, Tristani-Firouzi M, Yost HJ, Ma'ayan A, Gelb BD. Systems Analysis Implicates WAVE2 Complex in the Pathogenesis of Developmental Left-Sided Obstructive Heart Defects. JACC Basic to Translational Science 2020 Apr 8;5(4):376-386. PMID: 32368696
  4. Bartal A, Lachmann A, Clarke DJB, Seiden AH, Jagodnik KM, Ma'ayan A. EnrichrBot: Twitter bot tracking tweets about human genes. Bioinformatics 2020 Jun 1;36(12):3932-3934. PMID: 32277816
  5. Heitman N, Sennett R, Mok KW, Saxena N, Srivastava D, Martino P, Grisanti L, Wang Z, Ma'ayan A, Rompolas P, Rendl M. Dermal sheath contraction powers stem cell niche relocation during hair cycle regression. Science. 2020 Jan 10;367(6474):161-166. PMID: 31857493
  6. Duncan A, Heyer MP, Ishikawa M, Caligiuri SPB, Liu XA, Chen Z, Micioni Di Bonaventura MV, Elayouby KS, Ables JL, Howe WM, Bali P, Fillinger C, Williams M, O'Connor RM, Wang Z, Lu Q, Kamenecka TM, Ma'ayan A, O'Neill HC, Ibanez-Tallon I, Geurts AM, Kenny PJ. Habenular TCF7L2 links nicotine addiction to diabetes. Nature 2019 Oct;574(7778):372-377. PMID: 31619789
  7. Fernandez DM, Rahman AH, Fernandez NF, Chudnovskiy A, Amir ED, Amadori L, Khan NS, Wong CK, Shamailova R, Hill CA, Wang Z, Remark R, Li JR, Pina C, Faries C, Awad AJ, Moss N, Bjorkegren JLM, Kim-Schulze S, Gnjatic S, Ma'ayan A, Mocco J, Faries P, Merad M, Giannarelli C. Single-cell immune landscape of human atherosclerotic plaques. Nature Medicine 2019 Oct;25(10):1576-1588. PMID: 31591603
  8. Nakahara F, Borger DK, Wei Q, Pinho S, Maryanovich M, Zahalka AH, Suzuki M, Cruz CD, Wang Z, Xu C, Boulais PE, Ma'ayan A, Greally JM, Frenette PS. Engineering a haematopoietic stem cell niche by revitalizing mesenchymal stromal cells. Nature Cell Biology 2019 Apr 15. doi: 10.1038/s41556-019-0308-3. PMID: 30988422
  9. Wang Z, Lachmann A, Ma'ayan A. Mining data and metadata from the gene expression omnibus. Biophysical Reviews 2019 Feb;11(1):103-110. PMID: 30594974
  10. Ellis RJ, Wang Z, Genes N, Ma'ayan A. Predicting opioid dependence from electronic health records with machine learning. BioData Mining 2019 Jan 29;12:3. PMID: 30728857
  11. Mok KW, Saxena N, Heitman N, Grisanti L, Srivastava D, Muraro MJ, Jacob T, Sennett R, Wang Z, Su Y, Yang LM, Ma'ayan A, Ornitz DM, Kasper M, Rendl M. Dermal condensate niche fate specification occurs prior to formation and is placode progenitor dependent. Developmental Cell 2019 Jan 7;48(1):32-48.e5. PMID: 30595537
  12. Oprea TI, Jan L, Johnson GL, Roth BL, Ma'ayan A, Schürer S, Shoichet BK, Sklar LA, McManus MT. Far away from the lamppost. PLoS Biology 2018 Dec 11;16(12):e3000067. PMID: 30532236
  13. Torre D, Lachmann A, Ma'ayan A. BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. Cell Systems 2018 Nov 28;7(5):556-561.e3. PMID: 30447998
  14. Wang Z, He E, Sani K, Jagodnik KM, Silverstein M, Ma'ayan A. Drug Gene Budger (DGB): An application for ranking drugs to modulate a specific gene based on transcriptomic signatures. Bioinformatics. 2018 Aug 31. doi: 10.1093/bioinformatics/bty763. PMID: 30169739
  15. Clarke DJB, Kuleshov MV, Schilder BM, Torre D, Duffy ME, Keenan AB, Lachmann A, Feldmann AS, Gundersen GW, Silverstein MC, Wang Z, Ma'ayan A. eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Research 2018 Jul 2;46(W1):W171-W179. PMID: 29800326
  16. Grimes M, Hall B, Foltz L, Levy T, Rikova K, Gaiser J, Cook W, Smirnova E, Wheeler T, Clark NR, Lachmann A, Zhang B, Hornbeck P, Ma'ayan A, Comb M. Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks. Science Signaling 2018 May 22;11(531). pii: eaaq1087. PMID: 29789295
  17. Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma'ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 2018 Apr 10;9(1):1366. PMID: 29636450
  18. Torre D, Krawczuk P, Jagodnik KM, Lachmann A, Wang Z, Wang L, Kuleshov MV, Ma'ayan A. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Scientific Data 2018 Feb 27;5:180023. PMID: 29485625
  19. Wang Z, Lachmann A, Keenan AB, Ma'ayan A. L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics. 2018 Jun 15;34(12):2150-2152. PMID: 29420694
  20. Wang Z, Li L, Glicksberg BS, Israel A, Dudley JT, Ma’ayan A. Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age. Journal of Biomedical Informatics 2017 Nov 4. pii: S1532-0464(17)30240-X. PMID: 29113935
  21. Niepel M, Hafner M, Duan Q, Wang Z, Paull EO, Chung M, Lu X, Stuart JM, Golub TR, Subramanian A, Ma'ayan A, Sorger PK. Common and cell-type specific responses to anti-cancer drugs revealed by high throughput transcript profiling. Nature Communications 2017 Oct 30;8(1):1186. PMID: 29084964
  22. Fernandez NF, Gundersen GW, Rahman A, Grimes ML, Rikova K, Hornbeck P, Ma'ayan A. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data 2017 Oct 10;4:170151. PMID: 28994825
  23. Asada N, Kunisaki Y, Pierce H, Wang Z, Fernandez NF, Birbrair A, Ma'ayan A, Frenette PS. Differential cytokine contributions of perivascular haematopoietic stem cell niches. Nature Cell Biology 2017 Mar;19(3):214-223. PMID: 28218906
  24. Shameer K, Glicksberg BS, Hodos R, Johnson KW, Badgeley MA, Readhead B, Tomlinson MS, O'Connor T, Miotto R, Kidd BA, Chen R, Ma'ayan A, Dudley JT. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Briefings in Bioinformatics 2017 Feb 15. PMID: 28200013
  25. Gundersen GW, Jagodnik KM, Woodland H, Fernandez NF, Sani K, Dohlman AB, Ung PM, Monteiro CD, Schlessinger A, Ma'ayan A. GEN3VA: aggregation and analysis of gene expression signatures from related studies. BMC Bioinformatics 2016 Nov 15;17(1):461. PMID: 27846806
  26. Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, Jenkins SL, Feldmann AS, Hu KS, McDermott MG, Duan Q, Clark NR, Jones MR, Kou Y, Goff T, Woodland H, Amaral FM, Szeto GL, Fuchs O, Schüssler-Fiorenza Rose SM, Sharma S, Schwartz U, Bausela XB, Szymkiewicz M, Maroulis V, Salykin A, Barra CM, Kruth CD, Bongio NJ, Mathur V, Todoric RD, Rubin UE, Malatras A, Fulp CT, Galindo JA, Motiejunaite R, Jüschke C, Dishuck PC, Lahl K, Jafari M, Aibar S, Zaravinos A, Steenhuizen LH, Allison LR, Gamallo P, de Andres Segura F, Dae Devlin T, Pérez-García V, Ma'ayan A. Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd. Nature Communications 2016 Sep 26;7:12846. PMID: 27667448
  27. Wang Z, Clark NR, Ma'ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics 2016 Aug 1;32(15):2338-45. PMID: 27153606
  28. Wang Z, Ma'ayan A. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study. F1000Research 2016 Jul 5;5:1574. PMID: 27583132
  29. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016 Jul 3;2016. PMID: 27374120/a>
  30. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016 Jul 8;44(W1):W90-7. PMID: 27141961
  31. Duan Q, Reid SP, Clark NR, Wang Z, Fernandez NF, Rouillard AD, Readhead B, Tritsch SR, Hodos R, Hafner M, Niepel M, Sorger PK, Dudley JT, Bavari S, Panchal RG, Ma'ayan A. L1000CDS2: LINCS L1000 characteristic direction signatures search engine. NPJ Systems Biology and Applications 2016;2. pii: 16015. PMID: 28413689
  32. Khan JA, Mendelson A, Kunisaki Y, Birbrair A, Kou Y, Arnal-Estapé A, Pinho S, Ciero P, Nakahara F, Ma'ayan A, Bergman A, Merad M, Frenette PS. Fetal liver hematopoietic stem cell niches associate with portal vessels. Science 2016 Jan 8;351(6269):176-80. PMID: 26634440
  33. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, Bottinger EP, Dudley JT. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science Translational Medicine 2015 Oct 28;7(311):311ra174. PMID: 26511511
  34. Rouillard AD, Wang Z, Ma'ayan A. Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction. Computational Biology and Chemistry 2015 Oct;58:104-19. PMID: 26101093
  35. Gundersen GW, Jones MR, Rouillard AD, Kou Y, Monteiro CD, Feldmann AS, Hu KS, Ma'ayan A. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions. Bioinformatics 2015 Sep 15;31(18):3060-2. PMID: 25971742
  36. Wang Z, Clark NR, Ma'ayan A. Dynamics of the discovery process of protein-protein interactions from low content studies. BMC Systems Biology 2015 Jun 6;9:26. PMID: 26048415
  37. Duan Q, Wang Z, Fernandez NF, Rouillard AD, Tan CM, Benes CH, Ma'ayan A. Drug/Cell-line Browser: interactive canvas visualization of cancer drug/cell-line viability assay datasets. Bioinformatics 2014 Nov 15;30(22):3289-90. PMID: 25100688
  38. Ma'ayan A, Duan Q. A blueprint of cell identity. Nature Biotechnology 2014 Oct;32(10):1007-8. PMID: 25299921
  39. Ma'ayan A, Rouillard AD, Clark NR, Wang Z, Duan Q, Kou Y. Lean Big Data integration in systems biology and systems pharmacology. Trends in Pharmacological Sciences 2014 Sep;35(9):450-60. PMID: 25109570
  40. Duan Q, Flynn C, Niepel M, Hafner M, Muhlich JL, Fernandez NF, Rouillard AD, Tan CM, Chen EY, Golub TR, Sorger PK, Subramanian A, Ma'ayan A. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures. Nucleic Acids Research 2014 Jul;42(Web Server issue):W449-60. PMID: 24906883