Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Intention-Execution Disconnect in Medical AI: The ReXecution Framework for Evaluating Real-World Clinical Performance

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

We present the ReXecution framework for conducting clinician-centered assessments of medical AI assistants, providing detailed insights into their reliability in realistic clinical settings. Using this framework, we assessed AI assistants for chest X-ray (CXR) interpretation, exploring the gap between current model capabilities and real-world radiological needs. Unlike prior benchmarks that rely on automatically generated questions with limited clinical relevance, our dataset consists of 100 expert-curated tasks that radiologists might realistically present to an AI assistant in their day-to-day workflow. Through detailed manual review by a radiologist, we evaluated two leading foundation models, ChatGPTo3 and MedGemma, on our tasks. While both models demonstrated considerable medical knowledge and reasoning capabilities on our tasks, they frequently struggled to interpret images and execute tasks accurately, producing correct outputs in only 5-10% of cases. Our detailed manual evaluation highlights a critical mismatch: models often abstractly understand radiology concepts but cannot reliably execute their plans when interpreting specific medical images. This work identifies key gaps in current models' ability to serve as comprehensive radiology assistants and provides insights into how the development and evaluation of models can better align with real-world clinician needs, enabling seamless clinician-AI collaboration.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Machine Learning in Healthcare

Volltext beim Verlag öffnen

The Intention-Execution Disconnect in Medical AI: The ReXecution Framework for Evaluating Real-World Clinical Performance

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen