Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Fine-tuning AI models for enhanced consistency and precision in chemistry educational assessments
4
Zitationen
5
Autoren
2025
Jahr
Abstract
Integrating artificial intelligence (AI) into educational assessments represents a paradigm shift, especially in STEM subjects such as chemistry, which require complex problem-solving and written feedback. This study focuses on the effects of fine-tuning on each of the four AI models—Gemini 1.5, Gemini 1.0, BERT, and XLNet—and their performance on chemistry tasks. The models were implemented using three evaluation methods: Seq2Seq for Sodium Reaction Grading, a Regression Task for Overall Grading, and Attention-Based Grading for Key Steps Grading. The fine-tuning process significantly improved the models' accuracy, precision, and stability. Gemini 1.5 outperformed the other models across all three metrics, with accuracy increasing from 80 % to 89.5 % and TPR from 0.73 to 0.93, whereas Gemini 1.0 achieved TPR gains from 0.69 to 0.89. BERT and XLNet, which had substantially lower baselines, also showed significant improvements, particularly in identifying fundamental steps in the evaluation. These advances highlight the critical role of fine-tuning in refining AI model output to align with expert grading standards, ensuring accuracy and reliability in assessment. The results confirm that fine-tuning is essential in preparing AI models for teaching applications, particularly for complex tasks such as chemistry evaluation, thus enabling scalable solutions. The findings of this study provide further justification for the wider adoption of fine-tuned AI models to improve the reliability, scalability, and effectiveness of grading systems in STEM education.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.