OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 12:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Cost-Aware Approach for Collaborating Large Language Models and Small Language Models

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

The emerging reasoning ability of large language models (LLMs) and accompanying commercial applications offer a promising path for service providers to deploy intelligent agents on their own products through API calls. However, the black-box nature of LLMs has driven providers to try prompt tuning to improve reasoning quality for competitiveness, while the generated reasoning logic results in additional service costs. Although some works have proposed collaborating LLMs and Small Language Models (SLMs) to reduce the frequency of LLM calls, most overlook the actual number of tokens interacting with the LLMs, which results in a potentially high cost still. Furthermore, directly compressing the prompt to reduce tokens often leads to a significant accuracy loss. To address the above challenges, we propose a cost-aware approach for collaborating LLMs and SLMs, named Coco. In our method, a confidence-based task assignment method is designed which leverages the result confidence of SLMs to assess task complexity and determine whether LLM involvement is necessary. For complex tasks, the SLM adapts the input by compressing unnecessary information according to confidence. Considering the potential loss of accuracy, prompt tuning-based reasoning optimization methods are introduced to guide the LLM in generating both the reasoning logic sketch and the final result. Finally, logic alignment is applied to fuse sketches from both models, ensuring the rationality of the reasoning logic. Experimental results on three open-source datasets demonstrate that our approach effectively reduces the cost of API calls to LLMs while ensuring the reasoning accuracy and the reasonableness of generated logic.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingNatural Language Processing TechniquesArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen