Recently, 3D object detection algorithms based on radar and camera fusion
have shown excellent performance, setting the stage for their application in
autonomous driving perception tasks. Existing methods have focused on dealing
with feature misalignment caused by the domain gap between radar and camera.
Tuttavia, existing methods either neglect inter-modal features interaction
during alignment or fail to effectively align features at the same spatial
location across modalities. To alleviate the above problems, we propose a new
alignment model called Radar Camera Alignment (RCAlign). Specifically, we
design a Dual-Route Alignment (DRA) module based on contrastive learning to
align and fuse the features between radar and camera. Moreover, considering the
sparsity of radar BEV features, a Radar Feature Enhancement (RFE) module is
proposed to improve the densification of radar BEV features with the knowledge
distillation loss. Experiments show RCAlign achieves a new state-of-the-art on
the public nuScenes benchmark in radar camera fusion for 3D Object Detection.
Furthermore, the RCAlign achieves a significant performance gain (4.3\% NDS and
8.4\% mAP) in real-time 3D detection compared to the latest state-of-the-art
method (RCBEVDet).
Questo articolo esplora i giri e le loro implicazioni.
Scarica PDF:
2504.16368v1