Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of artificial intelligence (AI) chatbots for providing sexual health information: a consensus study using real-world clinical queries

2025·10 Zitationen·BMC Public HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

INTRODUCTION: Artificial Intelligence (AI) chatbots could potentially provide information on sensitive topics, including sexual health, to the public. However, their performance compared to nurses and across different AI chatbots, particularly in the field of sexual health, remains understudied. This study evaluated the performance of three AI chatbots - two prompt-tuned (Alice and Azure) and one standard chatbot (ChatGPT by OpenAI) - in providing sexual health information on questions that experienced sexual health nurses could correctly answer. METHODS: We analysed 195 anonymised sexual health questions received by the Melbourne Sexual Health Centre phone line. A panel of experts in a blinded order using a consensus-based approach evaluated responses to these questions from nurses and the three AI chatbots. Performance was assessed based on overall correctness and five specific measures: guidance, accuracy, safety, ease of access, and provision of necessary information. We conducted subgroup analyses for clinic-specific (e.g., opening hours) and general sexual health questions and a sensitivity analysis excluding questions that Azure could not answer. RESULTS: Alice demonstrated the highest overall correctness (85.2%; 95% confidence interval (CI), 82.1-88.0%), followed by Azure (69.3%; 95% CI, 65.3-73.0%) and ChatGPT (64.8%; 95% CI, 60.7-68.7%). Prompt-tuned chatbots outperformed the base ChatGPT across all measures. Among all outcome measures, all chatbots performed best on safety, with Azure achieving the highest safety score (97.9%; 95% CI, 96.4-98.9%), indicating the lowest risk of providing potentially harmful advice. In subgroup analysis, all chatbots performed better on general sexual health questions compared to clinic-specific queries. Sensitivity analysis showed a narrower performance gap between Alice and Azure when excluding questions Azure could not answer. CONCLUSIONS: Prompt-tuned AI chatbots demonstrated superior performance in providing sexual health information compared to base ChatGPT, with high safety scores particularly noteworthy. However, all AI chatbots showed susceptibility to generating incorrect information. These findings suggest the potential for AI chatbots as adjuncts to human healthcare providers for providing sexual health information while highlighting the need for continued refinement and human oversight. Future research should focus on larger-scale evaluations and real-world implementations.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsMobile Health and mHealth Applications

Volltext beim Verlag öffnen

Evaluation of artificial intelligence (AI) chatbots for providing sexual health information: a consensus study using real-world clinical queries

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen