Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
CORAL: Expert-Curated Oncology Reports to Advance Language Model Inference
56
Zitationen
6
Autoren
2024
Jahr
Abstract
BACKGROUND: Both medical care and observational studies in oncology require a thorough understanding of a patient's disease progression and treatment history, often elaborately documented within clinical notes. As large language models (LLMs) are being considered for use within medical workflows, it becomes important to evaluate their potential in oncology. However, no current information representation schema fully encapsulates the diversity of oncology information within clinical notes, and no comprehensively annotated oncology notes exist publicly, thereby limiting a thorough evaluation. METHODS: extraction of detailed oncological information from two narrative sections of clinical progress notes. Model performance was quantified with BLEU-4, ROUGE-1, and exact-match (EM) F1 score metrics. RESULTS: Our team annotated 9028 entities, 9986 modifiers, and 5312 relationships. The GPT-4 model exhibited overall best performance, with an average BLEU score of 0.73, an average ROUGE score of 0.72, an average EM F1 score of 0.51, and an average accuracy of 68% (expert manual evaluation on subset). Notably, GPT-4 was proficient in tumor characteristic and medication extraction and demonstrated superior performance in advanced reasoning tasks of inferring symptoms due to cancer and considerations of future medications. Common errors included partial responses with missing information and hallucinations with note-specific information. CONCLUSIONS: By developing a comprehensive schema and benchmark of oncology-specific information in oncology notes, we uncovered both the strengths and the limitations of LLMs. Our evaluation showed variable zero-shot extraction capability among the GPT-3.5-turbo, GPT-4, and FLAN-UL2 models and highlighted a need for further improvements, particularly in complex medical reasoning, before performing reliable information extraction for clinical research and complex population management and documenting quality patient care. (Funded by the National Institute of Health, Food and Drug Administration and others.).