Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision

2025·61 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the frame-work design and the Hungarian matching. However, compared to dense supervision detectors like the YOLO se-ries, the Hungarian matching provides much sparser su-pervision, leading to insufficient model training and diffi-cult to achieve optimal results. To address these issues, we proposed a hierarchical dense positive supervision method based on RT-DETR, named RT-DETRv3. Firstly, we in-troduce a CNN-based auxiliary branch that provides dense supervision that collaborates with the original decoder to enhance the encoder's feature representation. Secondly, to address insufficient decoder training, we propose a novel learning strategy involving self-attention perturbation. This strategy diversifies label assignment for positive samples across multiple query groups, thereby enriching positive su-pervisions. Additionally, we introduce a shared-weight de-coder branch for dense positive supervision to ensure more high-quality queries matching each ground truth. Notably, all aforementioned modules are training-only. We con-duct extensive experiments to demonstrate the effectiveness of our approach on COCO val2017. RT-DETRv3 signif-icantly outperforms existing real-time detectors, including the RT-DETR series and the YOLO series. For example, RT-DETRv3-R18 achieves 48.1% AP (+1.6%/+1.4%) compared to RT-DETR-R18/RT-DETRv2-R18, while maintaining the same latency. Furthermore, RT-DETRv3-R101 can attain an impressive 54.6% AP outperforming YOLOv10-X. The code will be released at https://github.com/clxia12/RT-DETRv3.

Autoren

Institutionen

Baidu (China)(CN)

Themen

Advanced Neural Network ApplicationsAdvanced Image and Video Retrieval TechniquesMedical Image Segmentation Techniques

Volltext beim Verlag öffnen

RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen