Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Enhanced reasoning and task planning for surgical autonomy using multi-modal large language models with gradual learning
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Large language models (LLMs) have been widely adopted in robotic applications in recent years, but their ability in task planning of long-horizon and complex tasks remains a challenge. In this work, we present a gradual learning method to address this challenge and explore its usability in surgical training tasks that require high levels of reasoning, such as peg transfer and the sliding puzzle task. Experiments were conducted using the da Vinci Research Kit (dVRK), with environment feedback initiating follow-up prompts for the LLM when necessary, as well as in a simulation environment. Results showed that for complex tasks, the gradual learning method outperformed the direct approach in the LLM’s task and motion planning, requiring fewer follow-up prompts and leading to higher success rates with faster execution. This suggests that for complex pseudo-surgical tasks, it is more efficient to have the LLM solve simpler versions of the task while incrementally increasing complexity, rather than tackling the full complex task at once. The approach shows promise for enhancing robot-assisted surgery where tasks are complex, long-horizon, and demand high-reasoning abilities.
Ähnliche Arbeiten
MizAR 60 for Mizar 50
2023 · 75.195 Zit.
ImageNet: A large-scale hierarchical image database
2009 · 60.998 Zit.
Microsoft COCO: Common Objects in Context
2014 · 41.540 Zit.
Fully convolutional networks for semantic segmentation
2015 · 36.566 Zit.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.767 Zit.