Main Menu
IDG Header

Getting Started with the IDG KMC Datasets and Tools


The Illuminating the Druggable Genome (IDG) consortium is a National Institutes of Health (NIH) Common Fund program designed to enhance our knowledge of under-studied proteins, more specifically, proteins unannotated within the three most commonly drug-targeted protein families: G-protein coupled receptors, ion channels, and protein kinases. Since 2014, the IDG Knowledge Management Center (IDG-KMC) has generated several open-access datasets and resources that jointly serve as a highly translational machine-learning-ready knowledgebase focused on human protein-coding genes and their products. The goal of the IDG-KMC is to develop comprehensive integrated knowledge for the druggable genome to illuminate the uncharacterized or poorly annotated portion of the druggable genome. The tools derived from the IDG-KMC provide either user-friendly visualizations or ways to impute the knowledge about potential targets using machine learning strategies. In the following protocols, we describe how to use each web-based tool to accelerate illumination in under-studied proteins. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.

IDG Header

Nucleic Acid Research Database issue features IDG digital resources: TCRD and Pharos

TCRD and Pharos 2021: mining the human proteome for disease biology.
Two resources produced from these efforts are: The Target Central Resource Database (TCRD) ( and Pharos (, a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.

IDG Header

IDG Digital Tool Fest - Spring 2022

IDG is hosting the first IDG DIGITAL TOOL FEST on Tuesday, November 30th with an exciting line-up of 10-minute demonstrations. The presented tools were developed within the IDG consortium covering mechanisms for exploring Drugs and their gene Targets within the context of information extracted from text-mining, expression data, and signaling pathways. The tools offer users the ability to construct their own specialized queries to access information programmatically.

Register For the Event

IDG Header

Nucleic Acid Research Database issue features IDG digital resource:

New additions to include a set of pharmacokinetic properties for ∼1000 drugs, and a sex-based separation of side effects, processed from FAERS (FDA Adverse Event Reporting System); as well as a drug repositioning prioritization scheme based on the market availability and intellectual property rights forFDA approved drugs. In the context of the COVID19 pandemic, we also incorporated REDIAL-2020, a machine learning platform that estimates anti-SARS-CoV-2 activities, as well as the 'drugs in news' feature offers a brief enumeration of the most interesting drugs at the present moment. The full database dump and data files are available for download from the DrugCentral web portal.

IDG Header

Nucleic Acid Research Database issue features IDG digital resource: The Dark Kinase Knowledgebase

Here, we describe a data resource, the Dark Kinase Knowledgebase (DKK;, that is specifically focused on providing data and reagents for these understudied kinases to the broader research community. Supported through NIH’s Illuminating the Druggable Genome (IDG) Program, the DKK is focused on data and knowledge generation for 162 poorly studied or ‘dark’ kinases. Types of data provided through the DKK include parallel reaction monitoring (PRM) peptides for quantitative proteomics, protein interactions, NanoBRET reagents, and kinase-specific compounds.

IDG Header

IDG Consortium

Sponsorship by NIH’s Common Fund has established the program called Illuminating Druggable Genome (IDG) Consortium with the aim of highlighting current knowledge of protein targets through integration of informatics tools, and further study the function of specific understudied targets in three main druggable protein families: G-protein coupled receptors, Ion Channels and protein kinases. This consortium consists of a network of Data and Resource Generation Centers (DRGCs), each focusing their research on one of the three protein families, the Knowledge Management Center (KMC) organizing data and integrated informatics tools across various resources to illuminate understudied protein targets, and the Resource Dissemination and Outreach Center (RDOC) facilitating annotation and distribution of resources brought forth to shed light on to targets. In the spring of 2019, three additional groups joined IDG focusing on making Cutting Edge Informatics Tools for Illuminating the Druggable Genome (CEIT).

IDG Header

Pharos is the user interface to the Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) program funded by the National Institutes of Health (NIH) Common Fund. (Grant No. 1U24CA224370-01). The goal of KMC is to develop a comprehensive, integrated knowledge-base for the Druggable Genome (DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on four of the most commonly drug-targeted protein families: G-protein-coupled receptors (GPCRs); nuclear receptors (NRs); Ion Channels (ICs); and kinases.

IDG Header


Thanks to technological advances in genomics, transcriptomics, proteomics, metabolomics, and related fields, projects that generate a large number of measurements of the properties of cells, tissues, model organisms, and patients are becoming commonplace in biomedical research. In addition, curation projects are making great progress mining biomedical literature to extract and aggregate decades worth of research findings into online databases. Such projects are generating a wealth of information that potentially can guide research toward novel biomedical discoveries and advances in healthcare. To facilitate access to and learning from biomedical Big Data, we created the Harmonizome: a collection of information about genes and proteins from 114 datasets provided by 66 online resources.

IDG Header


On June 15th, the Target2035 webinar series will be highlighting "Drugging the dead – selective targeting of pseudokinases". It is a free webinar with four speakers: Patrick Eyers (University of Liverpool, IDG), Ben Major (Washington University in St. Louis, IDG), James Murphy (Walter and Eliza Hall Institute, Melbourne), Michael Lazarus (Mount Sinai School of Medicine). Please register at

IDG Header


Exciting news from the CEIT awardee group at Reactome. Today, June 7th, they announced the new Reactome IDG Portal: - where users can leverage Reactome's extensive pathway knowledge together with IDG’s Target Central Resource Database for numerous explorations of biochemical reactions, protein expression and biological pathways. Check it out!

Recent IDG News/Events



Knowledge Management Centers (NIH RFA-RM-16-024)

Main focus for KMC is to develop a platform to aggregate and integrate data from various sources to illuminate knowledge that together with machine learning algorithms and expert curation could inspire scientists to seek and explore new associations within the human proteome.

UNM Logo

T.I. Oprea (UNM)

KMC grant at UNM is led by Dr. Tudor Oprea and includes collaborations with Dr. Lars Juhl Jensen’s lab at NNFCPR and at EMBL-EBI with Dr. Andrew Leach as their Group Lead. Work from UNM includes the development of TCRD, the main database supporting Pharos, and appropriate visualization tools, such as TINx.



As a partner with the KMC, NCATS is developing and implementing PHAROS as the User Interface Portal to access all the integrated data, metadata and annotation collected into TCRD and via tools from ISMMS.


A. Leach (EMBL)

EMBL-EBI has developed tools for automated extraction and expert curation of medicinal chemistry data. The group led by A. Leach has extracted pertinent target-chemical pairs from the patent literature and late-stage drug development, as well as the clinical candidate literature.



L.J. Jensen (NNFCPR)

Work by Dr. Lars Juhl Jensen has resulted in improved text-mining technology that is applied to scientific literature for scoring how well studied each target protein may be and to support target prioritization. This text-mining platform also helps provide tissue and disease associations for the targets.


A. Ma'ayan (ISMMS)

The KMC at ISMMS is led by Dr. Avi Ma’ayan. Through systematic data integration and application of machine learning methods, the Ma'ayan Lab is filling knowledge gaps about the understudied IDG targets of interest. The Harmonizome which some of it is incorporated into Pharos, provides access to all integrated data and methods.

DRGC – Data and Resource Generation Centers (NIH RFA-RM-16-026)

The objective of the DRGCs is to further illuminate understudied targets in the three main druggable protein families: G-protein coupled receptors, ion channels, and protein kinases. Their experimental workflows incorporate multiple robust assays on cellular basis of target action and extends to animal models. Data and Resources generated are disseminated via RDOC.


B.L. Roth (UNC)
B. Shoichet (UCSF)


Collaborative work by Dr. Bryan Roth and Dr. Brian Shoichet focuses on illuminating the druggable GPCR-ome by a two-pronged approach of empirical screening of drugs followed by computational screening against modeled structures of the GPCR to produce optimized lead compounds. Their work has led to discovery of molecule “ogerin” binding to previously orphaned GPR68.



M.T. McManus (UCSF)

Ion Channels

Dr. Michael T. McManus together with Dr. Lily Jan leads a group researchers from UCSF and UCD focused on illuminating Ion Channels by utilizing CRISPR technology to map expression profiles, assess channel activities, develop antibodies, and generate new mouse lines.


G.L. Johnson (UNC)


Dr. Gary L. Johnson has established a network of collaborators; Dr. Shawn Gomez (UNC), Dr. Ben Major (UNC), Dr. Tim Willson (SGC), Dr. Reid Townsend (WU), and Dr. Peter Sorger (HMS), to tackle the illumination of function of the understudied druggable kinome. Their work includes technologies of Multiplex Inhibitor Beads (MIB) / Mass Spectrometry (MS) to identify kinase activation status in response to perturbants, and applies results to model cell lines and patient derived xenographs.


RDOC - Resource Dissemination and Outreach Center (NIH RFA-RM-16-025)

RDOC serves as a hub for exposing the outcomes from the IDG be it via dissemination of resources and data generated by the DRGC or training public on utility of knowledge accumulated by the KMC. The mission for RDOC is to highlight and expose the works by IDG in a manner that is enduring and fruitful for further scientific discovery.


S.C. Schürer (UofMiami)

The RDOC grant at U of Miami is led by Dr. Stephan Schürer. Work from his group will focus on dissemination of the resources generated from the DRGC by stewarding implementation of metadata standards for IDG and developing a Resource Management System (RMS) capable of a variety of data and resources.



T.I. Oprea & L.A. Sklar (UNM)

The RDOC collaboration extends to UNM. Dr. Larry Sklar heads the Management and Administration Core (MAC), which support and coordinates across the entire IDG Consortium. Dr. Tudor Oprea leads IDG outreach to other consortia or groups and develops training programs for PHAROS and informatics tools.


KMC Events