Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
DetoxNet: Toxicity Identification Using LLMs and Understanding its Drug-Origin for Model Fairness
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Toxicity detection becomes inevitable for many drug therapies and targeted medicines, as many proteins may induce intermediate stable structures inside human cells. These proteins can be detrimental to the human body. This paper explores protein toxicity classification using large language models (LLMs) applied to protein datasets, focusing on insights mainly from drug-induced toxicity. We have used a multimodal deep learning framework that integrates a pre-trained protein language model and graph neural networks for protein toxicity prediction. We have also integrated different scenarios where the toxicity is induced through drug usage. This work extensively examined how LLMs can identify stable toxic sequence motifs and patterns, drawing parallels between toxin classification and enzyme functional motifs. In particularly, we have highlighted sequence segments that align with known toxic domains, such as a cysteine-rich neurotoxin domain. We have shown how these motif-level insights can be leveraged to understand drug-induced toxicity, analogously to toxicophores in small molecules. Our model DetoxNet outperforms prior state-of-the-art methods (including its predecessor and ensemble language model approaches) on both original and independently induced test sets. We have also analyzed challenges in ensuring model fairness in toxicity prediction, such as class imbalance and domain-specific biases, and addressed some of these issues (e.g., through balanced training sets and focal loss to handle a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{1: 3 0}$</tex> toxic:non-toxic imbalance) how it is performing on induced datasets like drug-induced toxicity. We also highlighted how generative protein models can be used for exploratory sequence design to minimize toxicity (e.g., in silico mutation scanning for safer therapeutics). Overall, we can conclude that LLMs provides powerful tools for uncovering toxic motifs and understanding their origins, by translating these insights to drug safety—identifying structural “toxicophores” in proteins and compounds—researchers can improve model interpretability and fairness through structural analysis.
Ähnliche Arbeiten
A short history of<i>SHELX</i>
2007 · 86.970 Zit.
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
2009 · 35.680 Zit.
[20] Processing of X-ray diffraction data collected in oscillation mode
1997 · 33.537 Zit.
A new and rapid colorimetric determination of acetylcholinesterase activity
1961 · 26.665 Zit.
AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility
2009 · 24.167 Zit.