OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 04:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A benchmark for large language models in bioinformatics

2023·8 ZitationenOpen Access
Volltext beim Verlag öffnen

8

Zitationen

10

Autoren

2023

Jahr

Abstract

Abstract The rapid advancements in artificial intelligence, particularly in Large Language Models (LLMs) such as GPT-4, Gemini, and LLaMA, have opened new avenues for computational biology and bioinformatics. We report the development of BioLLMBench, a novel framework designed to evaluate LLMs in bioinformatics tasks. This study assessed GPT-4, Gemini, and LLaMA through 2,160 experimental runs, focusing on 24 distinct tasks across six key areas: domain expertise, mathematical problem-solving, coding proficiency, data visualization, research paper summarization, and machine learning model development. Tasks ranged from fundamental to expert-level challenges, and each area was evaluated using seven specific metrics. A Contextual Response Variability Analysis was implemented to understand how model responses varied under different conditions. Results showed diverse performance: GPT-4 led in most tasks, achieving a 91.3% proficiency in domain knowledge, while Gemini excelled in mathematical problem-solving with a 97.5% proficiency score. GPT-4 also outperformed in machine learning model development, though Gemini and LLaMA struggled to generate executable code. All models faced challenges in research paper summarization, scoring below 40% using the ROUGE metric. Model performance variance increased when using a new chat window, though average scores remained similar. The study also discusses the limitations and potential misuse risks of these models in bioinformatics.

Ähnliche Arbeiten