Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Large language models (LLMs) like GPT-4 and LLaMA-3 utilize the powerful in-context learning (ICL) capability of Transformer architecture to learn on the fly from limited examples. While ICL underpins many LLM applications, its full potential remains hindered by a limited understanding of its generalization boundaries and vulnerabilities. We present a systematic investigation of transformers' generalization capability with ICL relative to training data coverage by defining a task-centric framework along three dimensions: inter-problem, intra-problem, and intra-task generalization. Through extensive simulation and real-world experiments, encompassing tasks such as function fitting, API calling, and translation, we find that transformers lack inter-problem generalization with ICL, but excel in intra-task and intra-problem generalization. When the training data includes a greater variety of mixed tasks, it significantly enhances the generalization ability of ICL on unseen tasks and even on known simple tasks. This guides us in designing training data to maximize the diversity of tasks covered and to combine different tasks whenever possible, rather than solely focusing on the target task for testing.
Ähnliche Arbeiten
Rethinking the Inception Architecture for Computer Vision
2016 · 30.316 Zit.
MobileNetV2: Inverted Residuals and Linear Bottlenecks
2018 · 24.385 Zit.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2020 · 21.292 Zit.
CBAM: Convolutional Block Attention Module
2018 · 21.257 Zit.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015 · 18.488 Zit.