OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 02:46

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating large language models for natural-language-to-code generation on aggregate Czech public health data analysis

2025·0 Zitationen·medRxivOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

Abstract Large language models (LLMs) are increasingly explored as tools for healthcare research and data analysis. However, their applicability to structured public health datasets, especially in non-English contexts, remains underexamined. We systematically evaluated 11 state-of-the-art LLMs on their ability to generate executable Python code for analytical queries over Czech public health datasets, focusing on incidence and prevalence data provided by the National Health Information Portal (known as NZIP). A set of representative analytical queries were designed, covering filtering, aggregation, weighted averages, and identification of primary diagnoses. Each model was prompted in Czech and assessed on code executability, correctness of results, and ability to adapt to local terminology. In the majority of cases, the models generated syntactically valid code within one minute, but performance varied. For the main objective of replicating “ground truth” queries as per dataset documentation, ChatGPT-4o achieved the highest accuracy, followed closely by GPT-4.1 mini. Claude and Gemini models frequently failed to apply critical filtering instructions, while Deepseek-R1, though accurate, defaulted to English output. Some models produced code that executed successfully but returned incorrect results, underscoring the need for systematic validation. Overall, LLMs show strong potential as coding assistants in public health analytics, even in Czech-language settings. Their integration into hybrid human–AI workflows, combined with validation mechanisms and retrieval-augmented generation, may accelerate the creation of reliable analytical pipelines.

Ähnliche Arbeiten