U.S. flag

An official website of the United States government

Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Transcriptomic biomarkers of toxicological effect

Oleg Stroganov1, Bing Zhou1, Tyler Myers1, Nicole Kleinstreuer2, Warren Casey2, Kelly Shipkowski2,3, Scott Auerbach4
1: Rancho Biosciences, San Diego, CA 92025
2: National Institute of Environmental Health Sciences, Division of Translational Toxicology, Research Triangle Park, NC 27709
3: Contract Office Representative
4: Project Lead
DOI: https://doi.org/10.22427/NTP-DATA-500-010-002-000-8
Dataset: https://cebs.niehs.nih.gov/datasets/search/biomarker-data-collection


Publication


Phase 1 Abstract

This project was centered around curating in vivo transcriptional biomarkers of toxicity for use in interpreting DTT transcriptomic dose-response studies, and secondarily for the construct validity assessment of in vitro biological systems. The curation encompassed extensive data compilation for each gene, including aliases, associations with toxicity and disease at the transcriptional level, and summaries of related protein families and structures. Additionally, we collected information on proteins known to interact with gene products, linked to relevant gene databases, and annotated with Gene Ontology (GO) terms, MSigDB signatures, and pathway descriptions. This curation also extended to the cellular locations of gene products, mechanistic insights into gene roles in pathogenesis, upstream regulators such as transcription factors, and expression profiles in specific tissues and cell types under normal and disease conditions. We also explored the chemical inducers of the transcriptional response and documented biomarker associations with specific diseases in relevant organs. This effort entailed extensive curation, data crawling, machine learning, and the use of artificial intelligence, specifically an LLM-based agentic workflow. The resultant product of this effort includes 125 reports covering transcriptional biomarkers across 11 tissues: liver, kidney, heart, skeletal muscle, lung, intestine, skin, thyroid, bone marrow, colon, and brain. This foundational work is pivotal to our initial documentation of how transcriptional biomarkers function within biological systems and respond to toxicological challenges, thereby facilitating more accurate assessments of potential hazards and therapeutic interventions. It is intended that this curated resource will act as a living repository of information and be updated as additional knowledge becomes available.

Phase 2 Abstract

The Division of Translational Toxicology (DTT) initiated a project to curate in vivo transcriptional biomarkers of toxicity to interpret transcriptomic dose-response studies and assess in vitro biological systems. Phase I successfully delivered 125 reports covering transcriptional biomarkers across 11-12 tissues. Building on this, Phase II aimed to identify an additional 100 genes not annotated in Phase I, plausibly associated with toxicity across 42 tissues, enhance the existing curation pipeline using Large Language Models (LLMs), and integrate the curated information into an evergreen wiki resource. A key objective was to develop an AI-driven, agentic pipeline using LLMs to rigorously assess the consensus confidence regarding the association of transcriptional biomarkers with target organ toxicity, validated against expert-curated information. The project employed a hybrid approach combining manual curation by PhD scientists with automated data extraction and AI-assisted summarization. LLM-based agentic workflows were utilized for generating summaries related to gene expression changes, protein structure, and mechanistic insights. Specific sections, such as associations with toxicity (Section 2), protein family and structure (Section 3), and role in other tissues (Section 12), were primarily generated with LLM assistance. Mechanistic information (Section 9) was generated using a multi-agent system. Factual summary statements were assigned confidence scores by a GPT-4o LLM model, ranging from 10 (absolutely confident) to -10 (false). Rigorous quality control, including manual review by expert curators and automated checks, ensured data accuracy, relevancy, and consistency. This foundational work provides a comprehensive, empirically derived dataset of transcriptional biomarkers, pivotal for understanding how biomarkers function within biological systems and respond to toxicological challenges. It advances the field of toxicogenomics by offering a scalable and precise tool for biomarker discovery and characterization, thereby facilitating more accurate assessments of potential hazards and therapeutic interventions. The curated resource is intended to be a living repository of information, regularly updated as new knowledge becomes available.

Material and Methods


Materials & Methods

Biomarker Summaries


Summaries per Tissue

Search Biomarker Summaries from Transcriptomic biomarkers of toxicological effect dataset .

Adrenal Gland Related Biomarker Summaries
Bone Marrow Related Biomarker Summaries
Brain Related Biomarker Summaries
Colon Related Biomarker Summaries
Esophagus Related Biomarker Summaries
Heart Related Biomarker Summaries
Intestine Related Biomarker Summaries
Kidney Related Biomarker Summaries
Liver Related Biomarker Summaries
Lung Related Biomarker Summaries
Mammary Related Biomarker Summaries
Ovary Related Biomarker Summaries
Peripheral Nerve Related Biomarker Summaries
Skeletal Muscle Related Biomarker Summaries
Skin Related Biomarker Summaries
Testis Related Biomarker Summaries
Thyroid Related Biomarker Summaries

All Tissues Summary

Supplemental Data


Phase 1 Extracted Data

Phase 2 Extracted Data

Additional Files