OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 09:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Enhancing atherosclerosis risk prediction with strategic feature and case selections in large language model

2025·0 Zitationen·European Heart Journal
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2025

Jahr

Abstract

Abstract Introduction Large language models (LLMs) have the potential to realize accurate risk stratification and disease prediction by integrating multimodal data, such as electronic health records, medical images, and genomic profiles. However, in complex tasks like atherosclerosis risk estimation, the design of the prompt is critical to bring out the performance of LLMs. Several researches have demonstrated that in-context learning, where the prompt contains some cases, improves the performance of LLMs when the contained cases are carefully selected. Because manual selection usually requires enormous costs, automatic case selection has been desired. Methods This study aims to improve the effectiveness of in-context learning in LLM for atherosclerosis risk prediction by introducing a strategy for case selection. Our method involves two key components: feature selection and case selection. In the feature selection phase, we compute the mutual information between the clinical features and atherosclerosis risk using the database to extract the essential clinical features. This feature selection removes the unimportant features that disturb the performance of the case selection mechanism. Then, in the case selection phase, we select several cases from the database that are the most similar to the patient being diagnosed. This study compares three metrics to compute the similarity score: - Mahalanobis distance (trained via large margin nearest neighbor classification; LMNN [1]) - Cosine similarity (between the vector representations in KNN-augmented in-context example selection; KATE [2]) - Euclidean distance (between the prompt features) This combined strategy supplies the LLM with informative cases and thereby enhances the diagnostic performance. Results We used a fine-tuned Llama3-8B model for our target language and analyzed a dataset of 117,709 cases. Each case contained 96 features that were collected from annual health check-up data. From this dataset, 1,000 cases each were randomly selected for validation and test datasets. The remaining data were used as the database in our system. We considered a binary classification task and assigned the positive label when the Cardio-Ankle Vascular Index (CAVI) was greater than 8.0. The proportion of selected features was adjusted in 10% increments to maximize the F1 score on the validation dataset. The result shows that our proposed method, the strategic feature and case selections, significantly improved the F1 score for the test dataset by around 2% compared with zero-shot prompt and random selection (see the attached table). Among three similarity metrics, Euclidean distance yielded the highest F1 scores. We also observed that the feature selection phase with the adjusted number of selected features improved the F1 score in almost all settings. Conclusion Our strategy for feature and case selections improved the performance of in-context learning for LLM-based atherosclerosis risk prediction.

Ähnliche Arbeiten