Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision
61
Zitationen
4
Autoren
2025
Jahr
Abstract
RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the frame-work design and the Hungarian matching. However, compared to dense supervision detectors like the YOLO se-ries, the Hungarian matching provides much sparser su-pervision, leading to insufficient model training and diffi-cult to achieve optimal results. To address these issues, we proposed a hierarchical dense positive supervision method based on RT-DETR, named RT-DETRv3. Firstly, we in-troduce a CNN-based auxiliary branch that provides dense supervision that collaborates with the original decoder to enhance the encoder's feature representation. Secondly, to address insufficient decoder training, we propose a novel learning strategy involving self-attention perturbation. This strategy diversifies label assignment for positive samples across multiple query groups, thereby enriching positive su-pervisions. Additionally, we introduce a shared-weight de-coder branch for dense positive supervision to ensure more high-quality queries matching each ground truth. Notably, all aforementioned modules are training-only. We con-duct extensive experiments to demonstrate the effectiveness of our approach on COCO val2017. RT-DETRv3 signif-icantly outperforms existing real-time detectors, including the RT-DETR series and the YOLO series. For example, RT-DETRv3-R18 achieves 48.1% AP (+1.6%/+1.4%) compared to RT-DETR-R18/RT-DETRv2-R18, while maintaining the same latency. Furthermore, RT-DETRv3-R101 can attain an impressive 54.6% AP outperforming YOLOv10-X. The code will be released at https://github.com/clxia12/RT-DETRv3.
Ähnliche Arbeiten
Deep Residual Learning for Image Recognition
2016 · 216.020 Zit.
U-Net: Convolutional Networks for Biomedical Image Segmentation
2015 · 85.918 Zit.
ImageNet classification with deep convolutional neural networks
2017 · 75.547 Zit.
Very Deep Convolutional Networks for Large-Scale Image Recognition
2014 · 75.404 Zit.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
2016 · 52.636 Zit.