OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.04.2026, 01:48

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Study on the Performance of SOTA LLMs on Nepalese IOE Entrance Examination

2026·0 Zitationen·European Journal of Applied Science Engineering and TechnologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

Large language models have extensively been evaluated on standardized tests across high-resource settings, including studies in India, the United States, and other nations. However, their performance on entrance examinations in low-resource countries, such as Nepal, remains less explored. This study evaluates four state-of-the-art (SOTA) LLMs: ChatGPT (gpt-5-2025-08-07), Claude (claude-opus-4.1-20250805), Gemini (gemini-2.5-pro), and DeepSeek (deepseek-chat), on multiple-choice questions from the Tribhuvan University (TU) Institute of Engineering (IOE) entrance examination in Nepal, where subjects included are Mathematics, Physics, Chemistry, and English. We assessed both the overall percentage points scored and the responses across categories and systematic behavioral patterns in the answers received. Results showed that the percentage points scored ranged from 77.24% to 85.77%, which is considerably above average performance compared to past trends among human test takers. Performance varied notably by subject, with Chemistry having the highest correct responses (85.71%-92.86%) and Mathematics having the lowest (72.16%-80.11%). Counterintuitively, English underperformed despite LLMs being mainly trained on natural language data. Statistical analysis showed position-dependent patterns, with questions having correct answers at position 4 showing 78.4% higher odds of being correct across all models. Chi-square tests identified significant option-selection bias only in DeepSeek when the answer provided was incorrect. These findings indicate that while LLMs demonstrate above-average competency on the IOE entrance examination, they still exhibit systematic biases in their responses.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationText Readability and SimplificationExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen