Radar-Camera Fusion with Deep Neural Networks for Automotive Object Detection

  • Reliable environment perception is critical for advanced driver assistance systems (ADAS) and automated driving systems, which rely heavily on sensor data from cameras, LiDARs, and radars. This dissertation focuses on the promising yet under-researched area of radar-camera fusion via deep neural networks. Radars and cameras complement each other well, with radars providing robust measurements in challenging conditions and cameras offering high-resolution visual data for object classification and accurate localization. Radar-camera fusion presents significant challenges due to the difference in semantic level. Integrating sparse radar point clouds with detailed images raises questions about the optimal fusion point within the neural network. This thesis introduces Fusion Point Pruning, a technique that identifies ideal fusion points during training. Additionally, a novel projection method maps enriched radar data onto the image plane that increases data density based on azimuth uncertainty, which improves 2D object detection. Another challenge is the different measurement coordinates of radars and cameras. Traditional fusion methods either lose radar’s geometric information when merged on the image plane or require high computational resources to fuse in 3D space. This thesis proposes RC-BEVFusion, a new approach that merges radar and camera data in the bird’s eye view plane, leveraging camera-based object detection advancements. This method utilizes two radar encoders and integrates seamlessly with existing image-only detection networks, significantly enhancing 3D object detection performance. An additional cross-dataset study on sensor quality and scale indicates that image networks benefit from large-scale, varied visual inputs, while radar networks rely more on resolution than volume. To showcase the real-world applicability of the proposed algorithms, a data processing pipeline for a test vehicle equipped with multiple sensors was developed. This includes extensive data pre-processing, calibration, and real-time data management. A new dataset was generated using a proposed automated labeling pipeline based on LiDAR data. Domain adaptation techniques were employed to enhance network performance on the generated dataset. The optimized networks were then tested in real-time on an Edge AI device, balancing computational efficiency and detection accuracy. The findings contribute valuable insights into sensor fusion and automotive object detection.

Export metadata

Additional Services

Search Google Scholar
Metadaten
Author:Lukas Stefan StäckerORCiD
URN:urn:nbn:de:hbz:386-kluedo-90595
DOI:https://doi.org/10.26204/KLUEDO/9059
Advisor:Didier Stricker
Document Type:Doctoral Thesis
Cumulative document:No
Language of publication:English
Date of Publication (online):2025/06/14
Date of first Publication:2025/06/14
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2025/05/09
Date of the Publication (Server):2025/06/16
Tag:Deep Neural Networks; Object Detection; Radar-Camera Fusion
GND Keyword:Radar; Maschinelles Sehen; FMCW-Radar; Tiefes neuronales Netz; Datenfusion
Page Number:XII, 127
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
CCS-Classification (computer science):I. Computing Methodologies / I.4 IMAGE PROCESSING AND COMPUTER VISION (REVISED)
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)