Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI (Preprint)
0
Zitationen
5
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Effective communication about breast and cervical cancers remains a public health challenge, with widespread misinformation and barriers to cancer-related language understanding. Large Language Models (LLMs) offer potential for scalable health communication, yet tradeoffs between quality, safety, and accessibility of general-purpose and medical-domain LLMs remain underexplored. </sec> <sec> <title>OBJECTIVE</title> We propose a comprehensive evaluation framework and systematically assesses the performance of LLMs in generating breast and cervical cancer information, with a focus on linguistic quality, safety and trustworthiness, and communication accessibility and affectiveness </sec> <sec> <title>METHODS</title> This mixed-methods evaluation study assessed outputs from five general-purpose and three medical large language models (LLMs) using real-world breast and cervical cancer–related questions curated from publicly available medical datasets. LLM-generated responses were evaluated in a controlled offline setting. Primary outcomes included linguistic quality (fluency, coherence, accuracy), safety and trustworthiness (toxicity, bias, harm potential), and communication accessibility and affectiveness (readability, empathy, clarity). Qualitative ratings were performed by domain experts, while quantitative metrics were compared across models. Statistical analyses included Welch’s ANOVA to detect differences in metric scores, Games-Howell tests for pairwise comparisons, and Hedges’ g to assess effect sizes. </sec> <sec> <title>RESULTS</title> General-purpose LLMs, particularly Llama 3 and Gemma, demonstrated superior linguistic quality and affectiveness but often produced complex outputs that may limit accessibility. In contrast, medical LLMs (e.g., MedAlpaca, BioMistral) generated simpler content suitable for broader audiences but scored lower in safety and empathy due to higher levels of hallucination, bias, and toxicity. </sec> <sec> <title>CONCLUSIONS</title> While LLMs show promise for improving digital cancer communication, our findings reveal a trade-off between domain specialization and overall communication quality and safety. Future development of health-focused LLMs should prioritize hybrid modeling strategies to enhance trust, clarity, and clinical relevance in patient-facing tools. </sec> <sec> <title>CLINICALTRIAL</title> Not applicable </sec>