Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

2024·6 Zitationen·ElectronicsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Despite recent significant advancements in large language models (LLMs) for medical services, the deployment difficulties of LLMs in e-healthcare hinder complex medical applications in the Internet of Medical Things (IoMT). People are increasingly concerned about e-healthcare risks and privacy protection. Existing LLMs face difficulties in providing accurate medical questions and answers (Q&As) and meeting the deployment resource demands in the IoMT. To address these challenges, we propose MedMixtral 8x7B, a new medical LLM based on the mixture-of-experts (MoE) architecture with an offloading strategy, enabling deployment on the IoMT, improving the privacy protection for users. Additionally, we find that the significant factors affecting latency include the method of device interconnection, the location of offloading servers, and the speed of the disk.

Autoren

Institutionen

Themen

COVID-19 diagnosis using AIPrivacy-Preserving Technologies in DataArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen