OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.04.2026, 21:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 4100: Large language models for tumor genomic interpretation

2026·0 Zitationen·Cancer Research
Volltext beim Verlag öffnen

0

Zitationen

13

Autoren

2026

Jahr

Abstract

Abstract Introduction: Algorithms trained on real-world data aid in tumor genomic prediction tasks, such as identifying cancer driver mutations and inferring cancer type. The extent to which generalist large language models (LLMs) trained on large natural language corpora can replace or supplement such domain-specific algorithms with zero-shot inference is unknown. Methods: We evaluated the zero-shot performance of proprietary (GPT-5, o3-mini, GPT-4o and Claude 3.7 Sonnet), open weight (DeepSeek and Qwen3) and domain-specialized medical (MedGemma) LLMs on three tasks: (i) Distinguishing tumor-somatic mutations from clonal hematopoietic (CH) variants in patients with matched tumor-whole blood profiling (N=37,179 patients; 54,807 samples), (ii) Classifying oncogenic variants using the OncoKB dataset as a positive control (N=10,489 patients; 10,752 samples; 13,470 variants), and (iii) Predicting cancer type from tumor genomic profiles in the multi-institution AACR GENIE dataset (N=97,074 patients; 102,791 samples). Results: Multiple LLMs approached the accuracy of MetaCH, a supervised model for distinguishing somatic tumor mutations from CH variants. o3-mini achieved the highest accuracy for distinguishing oncogenic driver from benign passenger mutations. Among patients with non-small cell lung cancer and mutations in KEAP1, those with VUSs classified as oncogenic by GPT-5 had worse overall survival than those with VUSs classified as benign. GPT-5, o3-mini, and Claude 3.7 Sonnet had accuracy approaching that of a supervised model, GDD-ENS, at classifying 34 cancer types using tumor genomic profiles. Ensemble approaches combining prediction results from GPT-5 and GDD-ENS improved cross-institutional generalizability and performance by an average of 20%. In their reasoning, LLMs discussed clinically relevant genomic features consistent with feature importances from GDD-ENS. Conclusion: Without task-specific training, LLMs achieved performance comparable to specialized supervised models across all tasks. Citation Format: Jennifer Yu, Madison Darmofal, Michele Waters, January Choy, Thinh N. Tran, Chenlian Fu, Leah Morales, Kaicheng U, Ross L. Levine, Nikolaus Schultz, Michael F. Berger, Quaid Morris, Justin Jee. Large language models for tumor genomic interpretation [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 4100.

Ähnliche Arbeiten