Repetitive Activity Counting by Sight and Sound (CVPR 2021)

Yunhua Zhang, Ling Shao, Cees G.M. Snoek

CVPR Presentation Video

Demo video

demo.mp4

Demo code

Requirements

Python 3.7.4
PyTorch 1.4.0
librosa 0.8.0
opencv 3.4.2
tqdm 4.54.1

Run Demo

We provide an example video and the corresponding audio file with scale variation challenge for the demo code.
The pretrained checkpoints of our model can be downloaded at this link.
To run the demo code: python run_demo.py

Some Illustrations

The "VGGSound" folder is modified from their original repository.
The "sync_batchnorm" folder comes from this repository.
As cited in the paper, the regression function for counting uses the technique proposed in this paper "Deep expectation of real and apparent age from a single image without facial landmarks".
Some variables with "sr" in the names (sample rate) are for the temporal stride decision module.
The performance of the released model on Countix-AV and Extreme Countix-AV is a bit higher than that reported in the paper, due to some hyperparameter adjustments.
In our experiment, we extract the audio files (.wav) from videos by "moviepy", using the following code:

import moviepy.editor as mp
clip = mp.VideoFileClip(path_to_video).subclip(start_time, end_time)
clip.audio.write_audiofile(path_for_save)

If you want our extracted audio files, pls send me an email or create an issue with your email address.

Training on Countix & Countix-AV

For the following code, we train the modules separately so two NVIDIA 1080Ti GPUs are enough for the training. The visual model is trained on Countix, and the audio model and the cross-modal modules are trained on Countix-AV. The resulted overall model is expected to test on Countix-AV. To test on the Countix dataset, the reliablity estimation should be retrained on the Countix dataset. For our model, the hyparameters influence the performance to some extent, see the supplementary material for more details. To be specific, we try the number of branches from 20 to 50 to find the best one and for the margin for the temporal stride decision module, we try from 1.0 to 3.0.

Train the visual counting model

python train.py

Then, generate the counting predictions with the model of the sample rate from 1 to 7. After that, run this script to get the csv file for training the temporal stride decision module:

python generate_csv4sr.py

Train the temporal stride decision module based on the visual modality only

python train_sr.py

Train the temporal stride decision module based on sight and sound

python train_sr_audio.py

Train the audio counting model

python train_audio.py

Train the reliability estimation module

python train_conf.py

Some Tips for further improvement

Here we use the ResNet (2+1)D model and replacing it with a better model, e.g. mmaction2, should obtain a better performance.
The code provided by https://github.com/Xiaodomgdomg/Deep-Temporal-Repetition-Counting is helpful.

Datasets

Countix-AV

We provide the train, validation, and test sets of Countix-AV dataset in CountixAV_train.csv, CountixAV_val.csv, and CountixAV_test.csv.

Extreme Countix-AV

The dataset can be downloaded at this link

Contact

If you have any problems with the code, feel free to send an email to me: y.zhang9@uva.nl or create an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
VGGSound		VGGSound
models		models
sync_batchnorm		sync_batchnorm
.gitignore		.gitignore
CountixAV_test.csv		CountixAV_test.csv
CountixAV_train.csv		CountixAV_train.csv
CountixAV_val.csv		CountixAV_val.csv
README.md		README.md
data_loader2.py		data_loader2.py
data_loader_conf.py		data_loader_conf.py
data_loader_sr_audio.py		data_loader_sr_audio.py
dataloader.py		dataloader.py
dataloader_sr.py		dataloader_sr.py
eEG7ZOQG6LM.00.mp4		eEG7ZOQG6LM.00.mp4
eEG7ZOQG6LM.00.wav		eEG7ZOQG6LM.00.wav
eval_all_sr.py		eval_all_sr.py
generate_csv4sr.py		generate_csv4sr.py
run_demo.py		run_demo.py
train.py		train.py
train_audio.py		train_audio.py
train_conf.py		train_conf.py
train_sr.py		train_sr.py
train_sr_audio.py		train_sr_audio.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repetitive Activity Counting by Sight and Sound (CVPR 2021)

Demo video

Demo code

Requirements

Run Demo

Some Illustrations

Training on Countix & Countix-AV

Some Tips for further improvement

Datasets

Countix-AV

Extreme Countix-AV

Contact

About

Releases

Packages

Languages

xiaobai1217/RepetitionCounting

Folders and files

Latest commit

History

Repository files navigation

Repetitive Activity Counting by Sight and Sound (CVPR 2021)

Demo video

Demo code

Requirements

Run Demo

Some Illustrations

Training on Countix & Countix-AV

Some Tips for further improvement

Datasets

Countix-AV

Extreme Countix-AV

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages