Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence Augmented Competence Committees: A Collaborative Path Forward in Competency‐Based Medical Education
0
Zitationen
3
Autoren
2026
Jahr
Abstract
The growing adoption of Competency-Based Medical Education (CBME) has fundamentally reshaped postgraduate medical training, emphasising demonstrated competencies rather than time-based progression. This individualised and flexible learning framework prepares residents to meet evolving healthcare demands [1]. Competence committees (CCs) are central to the CBME processes, synthesising high volumes of data including entrustable professional activity (EPA) assessments, in-training evaluation reports (ITERs), simulation assessments and academic coach summaries to make high-stakes resident advancement decisions [2]. However, narrative data formats, variability in assessor language and high volume of assessment requirements can make the process labour-intensive, prone to inconsistency and at risk of cognitive overload [3]. These challenges risk delaying feedback, creating inequities and reducing transparency for learners. Recent developments in artificial intelligence (AI) present new opportunities to assist CCs in managing and interpreting complex data. The potential of AI to standardise synthesis, identify trends and highlight areas of early intervention may boost CC efficiency and accuracy. This viewpoint article explores emerging evidence supporting AI applications in CBME and outlines the ethical and practical considerations for responsible use. AI encompasses various computational techniques that identify patterns and make predictions. Within medical education, machine learning (ML) enables pattern recognition in large datasets, while natural language processing (NLP) and large language models (LLMs) extend these capabilities into language interpretation and generation [4]. In practice, simpler NLP methods (like bag-of-words models) are interpretable and may be most appropriate for transparent theme extraction in feedback but may miss context. On the other hand, transformer-based LLMs (e.g., GPT) could generate summaries and draft feedback but risk hallucinations and lack transparency [4]. ML models, by contrast, offer quantitative consistency but lack the contextual sensitivity of language models. Selecting among these approaches depends on whether the goal is interpretability, scalability or contextual understanding [4]. Early AI studies demonstrate feasibility. Abbott [5] used NLP to predict milestone ratings from narrative comments with strong accuracy, reducing variability in committee interpretation, while Stahl [6] applied topic modelling to EPA comments, identifying language that mapped to autonomy levels. Additionally, Yilmaz [7] achieved 87% accuracy classifying feedback to competency domains, although these tools were better at confirming satisfactory performance than detecting struggling residents. These examples show that AI tools are best suited as preliminary filters rather than final decision-makers. Beyond text-based analyses, AI also enables predictive analytics. In a precision-education framework, AI uses ongoing data to forecast outcomes. A study by Turner [4] illustrates how AI can enable proactive data collection and personalised coaching. In surgery or procedure-based training, computer vision and deep learning have been used to assess technical skill levels from video with high accuracy [8]. Furthermore, Lee [9] demonstrated that an LLM-based report writing system could assist teachers in generating competency assessments with high sensibility and contextual accuracy, significantly reducing workload while maintaining quality. While not specific to CCs, these examples illustrate AI's ability to provide objective, scalable assessments where human raters may disagree. In non-technical domains, generative AI could even help standardise written assessments or exam questions. These innovations signal a growing body of research suggesting that AI has the potential to augment CC operations into a more efficient, consistent and proactive system. Within the context of CCs, AI has the potential to enhance educator expertise by the following. NLP tools have been tested to distil hundreds of narrative comments into key themes. In a pilot, educators noted that reviewing narrative feedback was ‘time-consuming’ and explored whether NLP could create efficiencies [10]. Reviews of AI in medical education support this idea: Khan [11] concludes that AI-driven text analysis can ‘automate theme extraction’, substantially reducing faculty workload. ML models can estimate learner competence and predict who may struggle. Abbott [5] built NLP models from evaluation text that predicted residents' milestone ratings with high accuracy. In practice, a resident exhibiting certain patterns of EPA feedback could be flagged early, allowing CCs to offer proactive targeted support. NLP methods can audit narrative comments for potential bias. For instance, prior analyses using AI found systematic differences in the language that faculty used to describe students of different genders or racial/ethnic backgrounds [12]. Such tools could be adapted to highlight gendered or culturally biased wording in current feedback pools, prompting committees to scrutinise and address those biases during their deliberations. One potential issue in CCs is inter-rater reliability, where differing interpretations of the same data lead to inconsistent decisions [13]. AI algorithms like ChatGPT can be used to standardise interpretations by assessing data based on definitive guidelines provided by CCs. This standardisation can reduce variability in decisions, making the assessment process more consistent. Advances in LLMs are beginning to influence CC work. A CC study noted that future LLM improvements might enhance narrative summarisation and even suggest competency-based phrasing [10]. In theory, generative AI could eventually help standardise written summaries or draft personalised learning plans for residents. AI-driven feedback analysis can substantially lower educator workload [11]. By automating routine tasks such as data collation and preliminary analysis, AI may enable committee members to focus their expertise on nuanced deliberations and decision-making. This optimisation would not only enhance efficiency, but also allow committee members to engage in complex cases where judgement and contextual understanding are critical. Prior to the use of AI tools, data must be de-identified and securely stored. Institutions should form a cross-disciplinary AI oversight group (educators + data scientists + ethicists + clinicians + learners) to develop consent forms and data pipelines. Iterative feedback loops with educators and data scientists are essential to ensure algorithms reflect educational priorities and technical accuracy [14]. Consent processes should clearly inform residents how their assessment data might be used for educational improvement. AI tools should be trained on anonymised, context-relevant data (e.g., past assessments within the same programme). Begin with a small, controlled pilot (e.g., one programme's committee) where AI analyses are run in parallel to usual review processes. Collect feedback on utility and errors. Adjust AI models and processes based on pilot learnings. AI should serve as an assistant, not a decision-maker. For example, if a system highlights ‘communication issues’, the committee should examine the evidence, discuss context, and decide whether intervention is appropriate. The AI output becomes a conversation starter, not a verdict. Introducing AI into CCs represents a cultural as much as a technical change. Faculty may initially be sceptical on how to interpret AI outputs. Building confidence requires open dialogue, iterative training, and demonstrating tangible benefits such as reduced workload or clearer resident feedback. Institutions should create safe pilot environments where AI tools are used in a shadow mode, generating summaries for comparison without influencing actual decisions until reliability and trust are established. Committee members must understand how AI systems work, what their limitations are, and how to interpret outputs. Training could include short workshops or interactive scenario-based modules explaining key concepts like bias, data privacy, and uncertainty. Institutions should host an interactive seminar where faculty review AI-summarised feedback alongside raw comments to learn how to judge AI's accuracy. Faculty should also learn how to communicate with residents about AI-supported decisions. This requires technical instruction and cultural shifts to embrace AI as a collaborative partner rather than a disruptive force. Continuous validation of AI-assisted decisions against independent benchmarks is necessary to ensure reliability. Research exploring the impact of AI on CC workflows and trainee outcomes may provide valuable insights for further refinement. Eventually, a gradual roll out while maintaining continuous monitoring and evaluation (e.g., check predicted ‘at-risk’ flags against actual outcomes) would allow practical use. Ethical implementation should align with established frameworks such as the Association of American Medical Colleges principles for responsible AI in medical education and the broader ‘human-in-the-loop’ model, ensuring final accountability rests with educators [15]. Training should include modules on ethical reasoning, bias recognition, and data management so committee members internalise the technical and ethical mindset necessary for responsible AI application. Deliberate attention to ethics and fairness is required. AI models reflect the data they are trained on, and if past assessments embed bias, AI may propagate it. For example, if narrative feedback has subtle gender or racial bias, NLP tools might erroneously flag certain groups as at-risk. Rigorous validation and bias audits are therefore required before trusting AI outputs [8]. Furthermore, transparency with residents and educators on when and how AI is being used in assessment is essential. Informed consent procedures should ensure trainees understand how their assessment data will be processed by AI, and they should have an avenue to question any AI-generated recommendations. CC members should understand, at least qualitatively, how an AI tool reached a given suggestion. Black-box algorithms may undermine trust; where possible, using explainable models and disclosing their limitations is important. Delineating the role of AI in the decision-making process and providing committee members with interpretive control over AI-generated outputs are critical to maintaining trust and accountability. AI could generate initial performance summaries and recommendations, which committee members would then review and contextualise. Additionally, trainee assessment data is sensitive, thus only de-identified data should be analysed using AI and data access must be controlled to maintain privacy standards. Institutional policies determining who oversees, audits, and maintains these tools must be in place prior to the use of AI in CCs. The guidelines should emphasise that AI in medical education should be used ethically, with full transparency to learners and educators, and with strict attention to data privacy [15]. The integration of AI into CC workflows has the potential to be a paradigm shift that enhances the foundational principles of CBME. By augmenting the capabilities of CCs, AI may streamline data synthesis, improve decision-making consistency, and optimise the use of committee expertise. While AI may streamline synthesis and promote consistency, its role must remain supportive rather than determinative. The greatest promise of AI lies in amplifying its effectiveness through responsible, transparent collaboration. AI-augmented CCs is a transformative step in realising the full potential of CBME: fostering competent, compassionate, and adaptable medical professionals equipped to meet the challenges of tomorrow. Nibra Yasin: conceptualization, investigation, funding acquisition, writing – original draft, writing – review and editing, project administration. Elif Bilgic: writing – review and editing, writing – original draft, funding acquisition, investigation, conceptualization, project administration, supervision. Mohammad S. Zubairi: conceptualization, investigation, funding acquisition, writing – original draft, writing – review and editing, project administration, supervision. We would like to extend our gratitude to Dr. Quang Ngo and Dr. James Leung for their invaluable support and contribution to this work. This work was funded through the Department of Pediatrics Education Endowment Fund, McMaster University. The authors declare no conflicts of interest. Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.303 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.155 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.555 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.453 Zit.