Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

DetoxNet: Toxicity Identification Using LLMs and Understanding its Drug-Origin for Model Fairness

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Toxicity detection becomes inevitable for many drug therapies and targeted medicines, as many proteins may induce intermediate stable structures inside human cells. These proteins can be detrimental to the human body. This paper explores protein toxicity classification using large language models (LLMs) applied to protein datasets, focusing on insights mainly from drug-induced toxicity. We have used a multimodal deep learning framework that integrates a pre-trained protein language model and graph neural networks for protein toxicity prediction. We have also integrated different scenarios where the toxicity is induced through drug usage. This work extensively examined how LLMs can identify stable toxic sequence motifs and patterns, drawing parallels between toxin classification and enzyme functional motifs. In particularly, we have highlighted sequence segments that align with known toxic domains, such as a cysteine-rich neurotoxin domain. We have shown how these motif-level insights can be leveraged to understand drug-induced toxicity, analogously to toxicophores in small molecules. Our model DetoxNet outperforms prior state-of-the-art methods (including its predecessor and ensemble language model approaches) on both original and independently induced test sets. We have also analyzed challenges in ensuring model fairness in toxicity prediction, such as class imbalance and domain-specific biases, and addressed some of these issues (e.g., through balanced training sets and focal loss to handle a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{1: 3 0}$</tex> toxic:non-toxic imbalance) how it is performing on induced datasets like drug-induced toxicity. We also highlighted how generative protein models can be used for exploratory sequence design to minimize toxicity (e.g., in silico mutation scanning for safer therapeutics). Overall, we can conclude that LLMs provides powerful tools for uncovering toxic motifs and understanding their origins, by translating these insights to drug safety—identifying structural “toxicophores” in proteins and compounds—researchers can improve model interpretability and fairness through structural analysis.

Autoren

Institutionen

Indian Institute of Technology Guwahati(IN)

Themen

Computational Drug Discovery MethodsArtificial Intelligence in Healthcare and EducationBig Data and Digital Economy

Volltext beim Verlag öffnen

DetoxNet: Toxicity Identification Using LLMs and Understanding its Drug-Origin for Model Fairness

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen