Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Knowledge Distillation in Mixture of Experts for Multi-Modal Medical LLMs

2024·2 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Large Language Models (LLMs) have made significant strides in recent years, performing a wide range of complex tasks across domains. However, in specialized fields like healthcare, LLMs often fall short due to their reliance on general-purpose data and their inability to effectively process domain-specific, multi-modal inputs. Medical data, which typically includes both textual and visual information (e.g., X-rays, MRIs), requires models capable of processing these diverse formats in a cohesive and efficient manner. The demand for lightweight, multi-modal systems that can operate in resource-constrained clinical environments is growing, as many healthcare settings lack the computational power required for large-scale models.In this work, we present a novel framework designed to optimize multi-modal medical tasks by leveraging pre-trained Mixture of Experts (MoE) models. We avoid redundant training by copying the weights from the Med MoE model, including the vision encoder, word embedding, and self-attention modules. The frozen weights allow the model to efficiently inherit general medical knowledge without the need for further training. The next phase focuses on training task-specific expert models through knowledge distillation, transferring the capabilities of larger models to smaller, specialized experts. This architecture enables the system to handle both discriminative and generative tasks while significantly reducing computational overhead.Our approach integrates both expert and Expert of Expert(EoE) models, inspired by the multi-disciplinary team (MDT) strategy used in clinical settings, to enhance the system’s ability to address complex medical decision-making scenarios. By selectively activating relevant experts based on task demands, the framework provides scalable, high-performance learning across diverse medical contexts. Comprehensive experiments demonstrate that our framework achieves great performance with a significantly reduced parameter count, offering a practical and efficient solution for real-world medical applications.

Autoren

Institutionen

Indian Institute of Technology Patna(IN)

Themen

Artificial Intelligence in HealthcareArtificial Intelligence in Healthcare and EducationBiomedical Text Mining and Ontologies

Volltext beim Verlag öffnen

Knowledge Distillation in Mixture of Experts for Multi-Modal Medical LLMs

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen