Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance Characterization of Large Language Models on High-Speed Interconnects
6
Zitationen
5
Autoren
2023
Jahr
Abstract
Large Language Models (LLMs) have recently gained significant popularity due to their ability to generate human-like text and perform a wide range of natural language processing tasks. Training these models usually requires a large amount of computational resources and is often done in a distributed manner. The use of high-speed interconnects can significantly influence the efficiency of distributed training. Therefore, there poses a need for systematic studies to explore the distributed training characteristics of these models on high-speed interconnects. This paper presents a comprehensive performance characterization of representative large language models: GPT, BERT, and T5. We evaluate their training performance in terms of iteration time, interconnect utilization, and scalability, over different high-speed interconnects and communication protocols, including TCP/IP, IPoIB, and RDMA. We observe that interconnects play a vital role in LLM training. Specifically, RDMA-100 Gbps outperforms IPoIB-100 Gbps and TCP/IP-10 Gbps by an average of 2.51x and 4.79x regarding training iteration time, and scores the highest interconnect utilization (up to 60 Gbps) in both strong and weak scaling, compared to IPoIB with up to 20 Gbps and TCP/IP with up to 9 Gbps, leading to the shortest training time. We also observe that larger models tend to have higher requirements for communication bandwidth, especially for AllReduce during backward propagation, which can take up to 91.12% of training time. Through our evaluation, we envision opportunities to improve the communication time for better training performance of LLMs. We extensively explore and summarize the role communication plays in distributed LLM training.