OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 19:07

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

SUN-809 Can AI Chatbots Provide Accurate Information About Thyroid Eye Disease?

2025·0 Zitationen·Journal of the Endocrine SocietyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2025

Jahr

Abstract

Abstract Disclosure: H. Lee: None. A. Shams: None. G. Wu: None. S. Sidhu: None. A. Sidhu: None. E. Pan: None. A. Ashok: None. P. Badala: None. B. Hoang: None. M. Del Buono: None. Background: Thyroid eye disease affects about 0.25% of people and is more common in women (16 per 100,000) than men (2.9 per 100,000).In patients with Graves’ Disease, the incidence of Thyroid Eye Disease can be between 25% and 40%. Patients who develop thyroid eye disease, additionally, have an elevated risk of developing other ocular symptoms such as dry eye disease. Purpose: Evaluate the ability and accuracy of AI models in correctly diagnosing and providing information about thyroid eye disease. Methods: Questions were asked in English to Claude, Cohere, Gemini, GPT 4o Mini, and GPT 4o.Textual responses from Claude, Cohere, Gemini, GPT 4o Mini, and GPT 4o were recorded and translated with help from native speakers. Manual and AI Scores were rated on a scale from 1 to 5, 5 being the most accurate response with 1 being the least accurate response. A series of paired T-tests were used to track the difference between scores, with score difference being calculated as the AI self-score subtracted from the manual score. Results: Our R-value, or Pearson’s Correlation Coefficient, of 0.505 indicates a mild positive correlation between the manual and AI Scores across the LLMs. Therefore, both of the manual and AI scores amongst the various chatbots are fairy accurate. The manual scores were not significantly different among the different LLMs (one-way ANOVA: F(4,15) = 2.28, p = 0.108). AI rated scores and manual scores relatively align among different chatbots, suggesting that chatbots were accurate in assessing the quality of their responses. Conclusion: There is a need for further training of these LLMs on more diverse datasets when queried about less common diseases. Presentation: Sunday, July 13, 2025

Ähnliche Arbeiten