OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.04.2026, 11:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Automating Registry Data Collection of Total Ankle Arthroplasty Complications Using Artificial Intelligence: A Pilot Study

2025·0 Zitationen·Foot & Ankle OrthopaedicsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2025

Jahr

Abstract

Research Type: Level 3 - Retrospective cohort study, Case-control study, Meta-analysis of Level 3 studies Introduction/Purpose: Identifying and tracking surgical complications is a critical component of maintaining quality registries and improving clinical care. Medical record review to record complications is often performed by a dedicated clinical team and is time-consuming and expensive. In recent years, Large Language Models (LLMs) have emerged as a promising Artificial Intelligence (AI) tool to more efficiently and accurately retrieve clinical information from patient records. However, early literature has shown that without careful prompt design, LLMs are vulnerable to error. The primary purpose of this study is to determine if an LLM platform, compared to traditional clinical chart reviewers, can be used reliably and automatically screen for complications directly from medical notes for patients who underwent total ankle arthroplasty (TAA). Methods: Following IRB approval, patient records were retrospectively identified from an institutional TAA registry with surgeries performed from 2015 to 2024. Patients were manually evaluated by the research team for intraoperative fracture (IOF), deep vein thrombosis (DVT), superficial wound infection (SWI), and deep wound infection (DWI). Patient records were then scrubbed of HIPAA-identifiers, age, and gender, and input into an LLM for analysis. An automated script was developed that assessed each patient visit note for complications and recorded the result in a table format with “yes” or “no” if a complication occurred. Disagreements between reviewer and LLM complications were secondarily reviewed by a blinded investigator to produce a final, gold standard data set. The sensitivity and specificity of both reviewer and LLM chart review are compared. Statistical difference between the groups is determined using a McNemar test and similarity of decisions was evaluated using an Intraclass Correlation Coefficient (ICC). Results: A total of 1952 notes were reviewed for 310 TAA procedures. The final rate of IOF, DVT, SWI, and DWI was found to be 5.2%, 0.0%, 4.2%, and 0.6%, respectively. Chart reviewers had high agreement with the LLM in evaluated DVT (100% match, ICC 1.00), and DWI (99.7% match, ICC 0.88), but significant disagreement in rate of IOF (97.1% match, ICC 0.77, p = 0.008) and SWI (91.9% match, ICC 0.49, p = 0.05). After secondary review by a blinded author, the LLM was found to have a higher sensitivity compared to reviewers for SWI (0.85 vs. 0.69, respectively) and IOF (1.00 vs. 0.50, respectively). However, reviewers had a higher specificity than the LLM for SWI (0.98 vs. 0.95, respectively). Conclusion: Our results support that current LLMs can be applied to screen free text medical records for complications at a sensitivity and specificity comparable to clinical chart reviewers. The LLM is more prone to both true and false positives, indicated by both a higher sensitivity and lower specificity. Importantly, this assessment of complications using the LLM was completely automated and did not require human intervention to run, allowing substantially higher efficiency. LLMs show promise to dramatically scale the size and reliability of clinical outcome and quality registries in coming years. Further refinement of the LLM script may improve accuracy. Comparing identified complications following total ankle arthroplasty between manual chart review and a large language model Rate of intraoperative fracture (IOP), deep vein thrombosis (DVT), superficial wound infection (SWI) and deep wound infection (DWI) identified by manual chart reviewers and a large language model (LLM). Sensitivity and specificity for each are provided relative to a verified gold-standard, a fellowship trained blind investigator.

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationCutaneous Melanoma Detection and ManagementElectronic Health Records Systems
Volltext beim Verlag öffnen