Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Study on the Performance of SOTA LLMs on Nepalese IOE Entrance Examination
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Large language models have extensively been evaluated on standardized tests across high-resource settings, including studies in India, the United States, and other nations. However, their performance on entrance examinations in low-resource countries, such as Nepal, remains less explored. This study evaluates four state-of-the-art (SOTA) LLMs: ChatGPT (gpt-5-2025-08-07), Claude (claude-opus-4.1-20250805), Gemini (gemini-2.5-pro), and DeepSeek (deepseek-chat), on multiple-choice questions from the Tribhuvan University (TU) Institute of Engineering (IOE) entrance examination in Nepal, where subjects included are Mathematics, Physics, Chemistry, and English. We assessed both the overall percentage points scored and the responses across categories and systematic behavioral patterns in the answers received. Results showed that the percentage points scored ranged from 77.24% to 85.77%, which is considerably above average performance compared to past trends among human test takers. Performance varied notably by subject, with Chemistry having the highest correct responses (85.71%-92.86%) and Mathematics having the lowest (72.16%-80.11%). Counterintuitively, English underperformed despite LLMs being mainly trained on natural language data. Statistical analysis showed position-dependent patterns, with questions having correct answers at position 4 showing 78.4% higher odds of being correct across all models. Chi-square tests identified significant option-selection bias only in DeepSeek when the answer provided was incorrect. These findings indicate that while LLMs demonstrate above-average competency on the IOE entrance examination, they still exhibit systematic biases in their responses.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.436 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.311 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.753 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.523 Zit.