This repository is for the paper RGB-Event Fusion for Moving Object Detection in Autonomous Driving, by Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Cédric Demonceaux, Dominique Ginhac.
PDF version of the paper is available here.
Dataset DSEC-MOD : DSEC - Moving Object Detection can be found here.
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Despite plausible results of deep learning methods, most existing approaches are only frame-based and may fail to reach reasonable performance when dealing with dynamic traffic participants. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. However, event-based works often adopt a pre-defined time window for event representation, and simply integrate it to estimate image intensities from events, neglecting much of the rich temporal information from the available asynchronous events. Therefore, from a new perspective, we propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD under challenging scenarios for autonomous driving. Specifically, we first design a temporal multi-scale aggregation module to fully leverage event frames from both the RGB exposure time and larger intervals. Then we introduce a bi-directional fusion module to attentively calibrate and fuse multi-modal features. To evaluate the performance of our network, we carefully select and annotate a sub-MOD dataset from the commonly used DSEC dataset. Extensive experiments demonstrate that our proposed method performs significantly better than the state-of-the-art RGB-Event fusion alternatives.
-
Mar. 17, 2023: Code of our RENet : RGB-Event fusion Network is released.
-
Mar. 14, 2023: Dataset DSEC-MOD : DSEC - Moving Object Detection is released.
If you use any of this code or the dataset DSEC-MOD in your research, please cite the following work:
@inproceedings{zhou2023rgb,
title={Rgb-event fusion for moving object detection in autonomous driving},
author={Zhou, Zhuyun and Wu, Zongwei and Boutteau, R{\'e}mi and Yang, Fan and Demonceaux, C{\'e}dric and Ginhac, Dominique},
booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
pages={7808--7815},
year={2023},
organization={IEEE}
}
DSEC-MOD : DSEC - Moving Object Detection can be downloaded here: Training and Testing.
In total, our DSEC-MOD dataset contains 16 sequences (13314 frames), with 11 sequences (10495 frames) for training and 5 other sequences (2819 frames) for testing.
In each sequence:
gt_bb
: ground truth bounding boxes of moving objects;rgb_calib
: RGB frames calibrated to the event-based coordinates, so that RGB and event maps have the same field of view and the same resolution;events
: event data from left sensor.
The format should be:
└── DSEC_MOD
├── training
│ ├── zurich_city_00_a
│ │ ├── gt_bb
│ │ │ ├── 000001.txt
│ │ │ └── ...
│ │ ├── rgb_calib
│ │ │ ├── 000001.png
│ │ │ └── ...
│ │ └── events
│ │ └── left
│ │ ├── events.h5
│ │ └── rectify_map.h5
│ └── ...
└── testing
├── zurich_city_13_a
│ └── ...
└── ...
DSEC is available here: https://dsec.ifi.uzh.ch.
Details can be found in the paper DSEC: A Stereo Event Camera Dataset for Driving Scenarios.
DSEC-MOS is available here: https://github.com/ZZY-Zhou/DSEC-MOS.
Details can be found in the paper DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle.
Our pre-trained weights for our RENet can be downloaded here.
To get the same experimental results as in our paper, Events should be pre-processed by 3 temporal scales (15ms, 30ms, 50ms), details can be found in Section III-A. E-TMA: Event-based Temporal Multi-scale Aggregation in our paper.
Then run the inference:
python3 det.py --task stream --model ROOT_OF_MODEL/best_model.pth --inference_dir PATH_TO_INF
The following command is to get the frame or video mAP:
python3 ACT.py --task TASK_NAME --th THRESHOLD --inference_dir PATH_TO_INF
For instance:
python3 ACT.py --task frameAP --th 0.5 --inference_dir PATH_TO_INF
or
python3 ACT.py --task videoAP --th 0.2 --inference_dir PATH_TO_INF
The initial pre-trained weights are also available: ResNet-101 backbone for RGB stream, and ResNet-18 backbone for Event stream.
- Clone
git clone https://github.com/ZZY-Zhou/RENet
cd RENet
- Create and activate conda environment
conda create -n ENV_NAME
conda activate ENV_NAME