Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases

2025·5 Zitationen·Journal of the American Medical Informatics AssociationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVES: This study assesses the abilities of 2 large language models (LLMs), GPT-4 and BioMistral 7B, in responding to patient queries, particularly concerning rare diseases, and compares their performance with that of physicians. MATERIALS AND METHODS: A total of 103 patient queries and corresponding physician answers were extracted from EXABO, a question-answering forum dedicated to rare respiratory diseases. The responses provided by physicians and generated by LLMs were ranked on a Likert scale by a panel of 4 experts based on 4 key quality criteria for health communication: correctness, comprehensibility, relevance, and empathy. RESULTS: The performance of generative pretrained transformer 4 (GPT-4) was significantly better than the performance of the physicians and BioMistral 7B. While the overall ranking considers GPT-4's responses to be mostly correct, comprehensive, relevant, and emphatic, the responses provided by BioMistral 7B were only partially correct and empathetic. The responses given by physicians rank in between. The experts concur that an LLM could lighten the load for physicians, rigorous validation is considered essential to guarantee dependability and efficacy. DISCUSSION: Open-source models such as BioMistral 7B offer the advantage of privacy by running locally in health-care settings. GPT-4, on the other hand, demonstrates proficiency in communication and knowledge depth. However, challenges persist, including the management of response variability, the balancing of comprehensibility with medical accuracy, and the assurance of consistent performance across different languages. CONCLUSION: The performance of GPT-4 underscores the potential of LLMs in facilitating physician-patient communication. However, it is imperative that these systems are handled with care, as erroneous responses have the potential to cause harm without the requisite validation procedures.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationGenomics and Rare DiseasesMachine Learning in Healthcare

Volltext beim Verlag öffnen

MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen