Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Models for Zero-Shot Procedure Extraction in Orthopedic Surgery: A Comparative Evaluation

2025·1 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Background Operative notes in electronic health records contain critical information for understanding surgical care, yet manual coding is time-consuming, costly, and inconsistent. Large language models (LLMs) promise to transform this process by automatically extracting detailed procedure information — a capability with significant implications for scaling clinical registries and advancing surgical research. Methods We conducted a large-scale evaluation of state-of-the-art LLMs for zero-shot structured information extraction from orthopedic clinical notes. Fourteen open-source and proprietary models were tested on 800 real operative notes, annotated by both an orthopedic surgeon and an administrator using a curated list of 74 procedure classes. We compared model outputs to human annotations, assessing accuracy and exploring the effects of model scale, reasoning capabilities, and prompt design. Results Across models, LLMs consistently outperformed administrator-assigned labels, achieving macro-F1 scores above 0.6 and improving over administrative coding by up to 10 points. Larger models and reasoning capabilities further boosted performance, though gains plateaued beyond 30 billion parameters. Performance varied by procedure frequency, revealing clear strengths and persistent challenges for rare or complex cases. Conclusion Modern LLMs can already outperform routine administrative coding in extracting detailed surgical procedure data, pointing to a future where registry curation could be faster, cheaper, and more consistent. Yet, full alignment with surgical experts remains an open challenge—especially for rare procedures —emphasizing the need for domain adaptation and thoughtful deployment. Our findings illustrate how general-purpose LLMs can advance automated clinical data curation and inform the next generation of surgical informatics.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHip and Femur FracturesRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Large Language Models for Zero-Shot Procedure Extraction in Orthopedic Surgery: A Comparative Evaluation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen