Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
23
Zitationen
2
Autoren
2024
Jahr
Abstract
Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation.In our study, we evaluated LLMs, including Google's Gemini, across various medical tasks.Despite Gemini's capabilities, it underperformed compared to leading models like MedPaLM 2 and GPT-4, particularly in medical visual question answering (VQA), with a notable accuracy gap (Gemini at 61.45% vs. GPT-4V at 88%).Our analysis revealed that Gemini is highly susceptible to hallucinations, overconfidence, and knowledge gaps, which indicate risks if deployed uncritically.We also performed a detailed analysis by medical subject and test type, providing actionable feedback for developers and clinicians.To mitigate risks, we implemented effective prompting strategies, improving performance, and contributed to the field by releasing a Python module for medical LLM evaluation and establishing a leaderboard on Hugging Face for ongoing research and development.Python module can be found at github.com/promptslab/RosettaEval
Ähnliche Arbeiten
The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods
2009 · 5.711 Zit.
The Stress Process
1981 · 4.480 Zit.
Mental health problems and social media exposure during COVID-19 outbreak
2020 · 2.793 Zit.
Cross-national prevalence and risk factors for suicidal ideation, plans and attempts
2008 · 2.633 Zit.
Psychological Aspects of Natural Language Use: Our Words, Our Selves
2002 · 2.556 Zit.