OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 19:54

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

English-Czech Output Bias in LLMs: A Geometry-Based Case Study

2025·0 Zitationen·Proceedings of the International Conference on AI Research.Open Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

The rapid integration of large language models (LLMs) into educational, professional, and public discourse has prompted increasing scrutiny of their multilingual capabilities. While English dominates as a testing and training language, understanding LLM performance in less-resourced languages—such as Czech—is critical for equitable AI deployment. This study investigates a subtle but systematic bias in LLM behaviour: the relative verbosity of their responses in Czech versus English within the domain of elementary geometry. We compiled a dataset of 48 paired mathematical prompts, posed in both Czech and English to six prominent LLMs (ChatGPT, Claude, Gemini, Mistral Large, Copilot Quick-Nuance, and Copilot Deep-Thinker), yielding 576 total responses. Each model was accessed in a controlled language-specific context to ensure fair comparison. Using surface-level metrics—word count and character count—we observed a consistent pattern: English responses were significantly longer than Czech ones across all models. Statistical analysis confirmed the robustness of these differences, with medium to large effect sizes (Cohen’s d) in both metrics. Notably, even morphologically richer Czech did not yield longer outputs in character count, contradicting initial assumptions. Beyond confirming a consistent verbosity gap, our analysis employed rigorous statistical testing, including paired t-tests and Wilcoxon signed-rank tests, as well as effect size estimation to quantify the magnitude of the disparity. We interpret these findings in the context of known architectural and training imbalances in LLM development—particularly differences in how text is segmented and processed, alongside the relative abundance of English-language data. While stylistic conventions and user context may also influence response length, our results consistently indicate that LLMs, even those marketed as multilingual, tend to produce more verbose output in English. This raises concerns about potential discrepancies in explanation quality across languages, which may have implications for fairness and pedagogical effectiveness in multilingual educational settings. The study lays the groundwork for follow-up research that will move beyond surface metrics toward semantic content analysis of mathematical reasoning across languages. Future work will assess whether English verbosity corresponds to greater mathematical depth, or if Czech responses deliver equivalent content more concisely. This line of inquiry is vital for ensuring fairness, clarity, and effectiveness in multilingual AI deployment—especially in contexts such as mathematics education, where explanation quality directly impacts learning outcomes.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Computational and Text Analysis Methods
Volltext beim Verlag öffnen