Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Interpretable Fine-tuned Large Language Models Facilitate Making Genetic Test Decisions for Rare Diseases
0
Zitationen
13
Autoren
2026
Jahr
Abstract
Abstract Clinical decision making often relies on expert judgment guided by established guidelines, which can be challenging to standardize and abstract to implement. For example, selecting between gene panels and whole exome/genome sequencing (WES/WGS) for rare disease diagnosis frequently requires interpretation of evidence-based recommendations from the American College of Medical Genetics and Genomics (ACMG) guideline. Traditional machine learning (ML) models predicting suitable genetic tests often face interpretability limitations. We hypothesize that large language models (LLMs) can be fine-tuned to “mimic” clinicians’ reasoning patterns by interpreting and applying clinical guidelines with chain-of-thought (CoT). We present RareDAI, an integrative approach that addresses this challenge by analyzing heterogeneous clinical data, including unstructured notes and structured Phecodes. Using seven domain-specific questions, we guide the Llama 3.1 and Qwen 3 models to generate structured CoT outputs. These outputs are refined via our proposed self-distillation fine-tuning (SDFT) approach, enabling the model to produce interpretable reasoning prior to recommendation. RareDAI outperforms traditional supervised fine-tuning and base LLMs (e.g., Llama 3.1, GPT-4) by up to 10-20% in all metrics (accuracy, precision, recall, and F1-score) on both in-house data and external data, effectively assisting clinicians in selecting between diagnostic modalities across healthcare systems.
Ähnliche Arbeiten
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
2015 · 30.914 Zit.
A global reference for human genetic variation
2015 · 19.451 Zit.
The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data
2012 · 18.066 Zit.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
2010 · 15.253 Zit.
A method and server for predicting damaging missense mutations
2010 · 13.440 Zit.