DSNet: A Flexible Detect-to-Summarize Network for Video Summarization [paper]
A PyTorch implementation of our paper DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. Published in IEEE Transactions on Image Processing.
This project is developed on Ubuntu 18.04 with CUDA 9.0.176.
Create a virtual environment with python 3.6, preferably using Anaconda.
conda create --name dsnet python=3.6
conda activate dsnet
Install python dependencies.
pip install -r requirements.txt
Download the pre-processed datasets into datasets/
folder, including TVSum, SumMe datasets.
downloading from any below links.
- (Baidu Cloud) Link: https://pan.baidu.com/s/1LUK2aZzLvgNwbK07BUAQRQ Extraction Code: x09b
- (Google Drive) https://drive.google.com/file/d/11ulsvk1MZI7iDqymw9cfL7csAYS0cDYH/view?usp=sharing
Now the datasets structure should look like
DSNet
└── datasets/
├── eccv16_dataset_ovp_google_pool5.h5
├── eccv16_dataset_summe_google_pool5.h5
├── eccv16_dataset_tvsum_google_pool5.h5
├── eccv16_dataset_youtube_google_pool5.h5
└── readme.txt
Much similar to anchor-based models, to train on canonical TVSum and SumMe, run
python train.py anchor-free --model-dir ../models/af_basic --splits ../splits/tvsum.yml ../splits/summe.yml --nms-thresh 0.4
Note that NMS threshold is set to 0.4 for anchor-free models.
For anchor-free models, remember to specify NMS threshold as 0.4.
python evaluate.py anchor-free --model-dir ../models/af_basic/ --splits ../splits/tvsum.yml ../splits/summe.yml --nms-thresh 0.4
To predict the summary of a raw video, use infer.py
. For example, run
python infer.py anchor-based --ckpt-path ../models/custom/checkpoint/custom.yml.0.pt \
--source ../custom_data/videos/EE-bNr36nyA.mp4 --save-path ./output.mp4