OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 03:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT vs human expertise in the context of IT recruitment

2024·11 Zitationen·Expert Systems with ApplicationsOpen Access
Volltext beim Verlag öffnen

11

Zitationen

1

Autoren

2024

Jahr

Abstract

This study aims to address the gap in understanding the extent to which AI can replace human technical interviewers in the recruitment process. It investigates the potential of a Large Language Models, specifically ChatGPT, Google Gemini and Mistral, in assessing candidates’ competencies in Information Technology (IT) compared to evaluations made by human experts. The experiment involved three experienced DevOps specialists who assessed the written responses of 21 candidates to ten industry-relevant questions; each limited to 500 characters. The evaluation was conducted using a simple yet effective −2 to 2 scale, with −2 indicating a negative assessment for incorrect answers, 0 for ambiguous or incomplete answers, and 2 for excellent responses. The same set of responses was then evaluated by LLMs, adhering to the identical criteria and scale. This comparative analysis aims to determine the reliability and accuracy of AI in replicating expert human judgement in IT recruitment. The study’s findings, backed by the Fleiss kappa test, show that human reviewers are not perfectly aligned in their judgement. On the other hand, the AI tool also lacks consistency, as the consequent repetition of the same review request may result in a different decision. The results are anticipated to contribute to the ongoing discourse on AI-assisted decision-making and its practical applications in human resource management and recruitment. • Comparison of ChatGPT and other models, and human experts in IT candidates’ skill evaluation. • AI assessments: significantly faster but noticeably less consistent than human reviews. • Minor inter-rater unreliability identified in human reviewer evaluations according to Fleiss’ Kappa. • No significant score-average difference between human and AI assessments.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

AI and HR TechnologiesEconomic and Technological Developments in RussiaArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen