This is a pytorch implementation for CVPR 2022 paper "Cross Modal Background Suppression for Audio-Visual Event Localization".
We are concerned about an important problem: audio-visual event localization, which requires the model to recognize the event category and localize the event boundary when the event is both audible and visible at the same time.
Unlike previous methods, we consider the problem of audio-visual event localization from the viewpoint of cross-modal background suppression. We first define the "background" category from two aspects: 1) If the audio and visual information in the small video segment do not represent the same event, then the video segment will be labeled as background. 2) If an event only occurs in one modality but has a low probability in another, then this event category will be labeled as background in this video, i.e., offscreen voice.
Hence, this paper proposes a novel cross-modal background suppression method considering two aspects: time-level and event-level, which allow the audio and visual modalities to serve as the supervisory signals complementing each other to solve the AVE task problems.
This package has the following requirements:
Python 3.7.6
Pytorch 1.10.2
CUDA 11.4
h5py 2.10.0
numpy 1.21.5
The VGG visual features can be downloaded from Visual_feature.
The VGG-like audio features can be downloaded from Audio_feature.
The noisy visual features used for weakly-supervised setting can be downloaded from Noisy_visual_feature.
After downloading the features, please place them into the data
folder.
If you are interested in the AVE raw videos, please refer to this repo and download the AVE dataset.
The configs/main.json
contains the main hyper-parameters used for fully-supervised training.
Training
bash supv_train.sh
Evaluating
bash supv_test.sh
The configs/weak.json
contains the main hyper-parameters used for weakly-supervised training.
Training
bash weak_train.sh
Evaluating
bash weak_test.sh
The pretrained models can be downloaded from Supervised model and WeaklySupervised model.
After downloading the pretrained models, please place them into the Exps
folder.
You can try different parameters or random seeds if you want to retrain the model, the results may be better.
Part of our code is borrowed from the following repositories.
We thank to the authors for releasing their codes. Please also consider citing their works.