Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Physician and Large Language Model Chatbot Responses to Ear, Nose, and Throat Inquiries on an Online Forum: A Comparative Analysis (Preprint)

2024·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

<sec> <title>BACKGROUND</title> Large language models (LLMs) have the potential to improve the accessibility and quality of medical information for patients. Assessing the quality of LLM-generated responses in real-world clinical settings is crucial for determining their suitability and optimizing healthcare efficiency. </sec> <sec> <title>OBJECTIVE</title> This study aims to comprehensively evaluate the reliability of responses generated by an LLM-driven chatbot compared to those written by physicians, demonstrating that artificial intelligence (AI) can enhance the quality of otorhinolaryngological advice in complex, nuanced text-based workflows. </sec> <sec> <title>METHODS</title> Inquiries and verified physician responses related to otorhinolaryngology posted on a public social media forum between December 20 and 21, 2023, were extracted and anonymized. ChatGPT-4 was tasked with generating responses to each inquiry. A panel of seven board-certified otorhinolaryngologists evaluated both physician and ChatGPT-4 responses in a masked, randomized manner. The responses were assessed based on six criteria: overall quality, empathy, alignment with medical consensus, accuracy or appropriateness of information, inquiry comprehension, and potential harm. Logistic regression analysis was employed to identify predictors of preference for ChatGPT-4 responses and their influence on overall quality. </sec> <sec> <title>RESULTS</title> A total of 60 question–response pairs were included in the analysis. ChatGPT-4 responses were significantly longer (median: 162 words) compared to physician responses (median: 67 words; p<.0001). The expert panel preferred ChatGPT-4-generated responses in 70.7% of cases. ChatGPT-4 responses were rated higher across all six criteria. Multivariate analysis identified significant predictors of preference for ChatGPT-4 responses: alignment with medical consensus (odds ratio [OR]: 2.783), incorrect or inappropriate information (OR: 2.540), and empathy (OR: 1.362). For physician responses, alignment with medical consensus (OR: 1.477), empathy (OR: 1.089), inquiry comprehension (OR: 0.529), and word count (OR: 0.007) positively impacted overall quality. For chatbot responses, empathy (OR: 1.209), information appropriateness (OR: 0.903), and alignment with medical consensus (OR: 0.768) were significantly associated with high-quality ratings. </sec> <sec> <title>CONCLUSIONS</title> ChatGPT-4 outperformed physicians in generating high-quality responses. Therefore, integrating AI into clinical workflows may enhance the quality of physicians’ responses by improving comprehension of complex inquiries and providing more detailed information, thereby enhancing perceived quality. </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationAI in Service Interactions

Volltext beim Verlag öffnen

Physician and Large Language Model Chatbot Responses to Ear, Nose, and Throat Inquiries on an Online Forum: A Comparative Analysis (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen