This repository contains the official pytorch implementation of the below papers:
- DTVNet: Dynamic Time-lapse Video Generation via Single Still Image, ECCV'20, Spotlight
- DTVNet+: A High-Resolution Scenic Dataset for Dynamic Time-lapse Video Generation (A supplemental version that introduces a high-quality scenic dataset.)
12/20/2021
⭐News
: We release a high-quality and high-resolution Quick-Sky-Time (QST)
dataset in the extended version, which can be viewed as a new benchmark for high-quality scenic image and video generation tasks.
Example.mp4
This code has been developed under Python3.7
, PyTorch 1.5.1
and CUDA 10.1
on Ubuntu 16.04
.
# Install python3 packages
pip3 install -r requirements.txt
- Download Sky Timeplase dataset to
data
. You can refer to MDGAN and corresponding code for more details about the dataset. - Download
example datasets and checkpoints
from Google Drive or Baidu Cloud (Key: u6c0).
-
Our another work ARFlow (CVPR'20) is used as the unsupervised optical flow estimator in the paper. You can refer to
flow/ARFlow/README.md
for more details. -
Training:
> Modify `configs/sky.json` if you use another data_root or settings. cd flow/ARFlow python3 train.py
-
Testing:
> Pre-traind model is located in `checkpoints/Sky/sky_ckpt.pth.tar` python3 inference.py --show # Test and show a single pair images. python3 inference.py --root ../../data/sky_timelapse/ --save_path ../../data/sky_timelapse/flow/ # Generate optical flow in advance for Sky Time-lapse dataset.
-
Train
DTVNet
model.> Modify `configs/sky_timelapse.json` if you use another data_root or settings. python3 train.py
-
Test
DTVNet
model.> Pre-traind model is located in `checkpoints/DTV_Sky/200708162546` > Results are save in `checkpoints/DTV_Sky/200708162546/results` python3 Test.py
QST contains 1,167
video clips that are cut out from 216 time-lapse 4K videos
collected from YouTube, which can be used for a variety of tasks, such as (high-resolution) video generation
, (high-resolution) video prediction
, (high-resolution) image generation
, texture generation
, image inpainting
, image/video super-resolution
, image/video colorization
, image/video animating
, etc. Each short clip contains multiple frames (from a minimum of 58
frames to a maximum of 1,200
frames, a total of 285,446
frames), and the resolution of each frame is more than 1,024 x 1,024
. Specifically, QST consists of a training set (containing 1000
clips, totally 244,930
frames), a validation set (containing 100
clips, totally 23,200
frames), and a testing set (containing 67
clips, totally 17,316
frames). Click here (Key: qst1) to download the QST dataset.
# About QST:
├── Quick-Sky-Time
├── clips # contains 1,167 raw video clips
├── 00MOhFGvOJs # [video ID of the raw YouTube video]
├── 00MOhFGvOJs 00_00_14-00_00_25.mp4 # [ID] [start time]-[end time]
├── ...
├── ...
├── train_urls.txt # index names of the train set
├── test_urls.txt # index names of the test set
└── val_urls.txt # index names of the validation set
If our work is useful for your research, please consider citing:
@inproceedings{dtvnet,
title={DTVNet: Dynamic time-lapse video generation via single still image},
author={Zhang, Jiangning and Xu, Chao and Liu, Liang and Wang, Mengmeng and Wu, Xia and Liu, Yong and Jiang, Yunliang},
booktitle={European Conference on Computer Vision},
pages={300--315},
year={2020},
organization={Springer}
}
@article{dtvnet+,
title={DTVNet+: A High-Resolution Scenic Dataset for Dynamic Time-lapse Video Generation},
author={Zhang, Jiangning and Xu, Chao and Liu, Yong and Jiang, Yunliang},
journal={arXiv preprint arXiv:2008.04776},
year={2020}
}