OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 24.03.2026, 11:09

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Collaborative intelligence in AI: Evaluating the performance of a council of AIs on the USMLE

2025·1 Zitationen·PLOS Digital HealthOpen Access
Volltext beim Verlag öffnen

1

Zitationen

16

Autoren

2025

Jahr

Abstract

The stochastic nature of next-token generation and resulting response variability in Large Language Models (LLMs) outputs pose challenges in ensuring consistency and accuracy on knowledge assessments. This study introduces a novel multi-agent framework, referred to as a "Council of AIs", to enhance LLM performance through collaborative decision-making. The Council consists of multiple GPT-4 instances that iteratively discuss and reach consensus on answers facilitated by a designated "Facilitator AI." This methodology was applied to 325 United States Medical Licensing Exam (USMLE) questions across all three exam stages: Step 1, focusing on biomedical sciences; Step 2 evaluating clinical knowledge (CK)\; and Step 3, evaluating readiness for independent medical practice. The Council achieved consensus that were correct 97%, 93%, and 94% of the time for Step 1, Step 2 CK, and Step 3, respectively, outperforming single-instance GPT-4 models. In cases where there wasn't an initial unanimous response, the Council deliberations achieved a consensus that was the correct answer 83% of the time, with the Council correcting over half (53%) of the responses that majority vote had gotten incorrect. The odds of a majority voting response changing from incorrect to correct were 5 (95% CI: 1.1, 22.8) times higher than the odds of changing from correct to incorrect after discussion. This study provides the first evidence that the semantic entropy of the response space can consistently be reduced to zero-demonstrated here through Council deliberation, and suggesting the possibility of other mechanisms to achieve the same outcome.. This study revealed that in a Council model, response variability, often considered a limitation, can be transformed into a strength that supports adaptive reasoning and collaborative refinement of answers. These findings suggest new paradigms for AI implementation and reveal the heightened strength that emerges when AIs begin to collaborate as a collective rather than operate alone.

Ähnliche Arbeiten