OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.05.2026, 13:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Development of a Synthetic Oncology Pathology Dataset for Large Language Model Evaluation in Medical Text Classification

2025·0 Zitationen·Studies in health technology and informaticsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

11

Autoren

2025

Jahr

Abstract

BACKGROUND: Large Language Models (LLMs) offer promising applications in oncology pathology report classification, improving efficiency, accuracy, and automation. However, the use of real patient data is restricted due to legal and ethical concerns, necessitating privacy-compliant alternatives. OBJECTIVES: This study aimed to develop a synthetic oncology pathology dataset to serve as a benchmark for LLM evaluation, enabling reproducible and privacy-preserving AI research. METHODS: A total of 227 synthetic pathology reports were generated using Microsoft Copilot, ChatGPT Plus, and Perplexity Pro to ensure structural and linguistic diversity. The dataset included cases of prostate (n=75), lung (n=78), and breast (n=74) cancer, evenly distributed between malignant (n=113) and benign (n=114) findings. Reports were reviewed and classified by three independent cancer registrars using a consensus-based validation process. RESULTS & CONCLUSION: The dataset provides a structured, clinically relevant benchmark for evaluating LLM performance in pathology text classification. It enables AI model assessment without compromising data privacy, paving the way for scalable and ethical AI-driven oncology documentation.

Ähnliche Arbeiten