- 2022.11.19 Init repository.
- 2022.11.21 Release the arXiv version with supplementary materials.
- 2023.04.04 🔥 Our code is publicly available.
- 2023.04.04 🔥 Release pretrained models.
- 2023.04.04 🔥 Release KITTI360-EX dataset.
- Code release.
- KITTI360-EX release.
- Towards higher performance with extra small costs.
Limited by hardware cost and system size, camera's Field-of-View (FoV) is not always satisfactory. However, from a spatio-temporal perspective, information beyond the camera’s physical FoV is off-the-shelf and can actually be obtained ''for free'' from past video streams. In this paper, we propose a novel task termed Beyond-FoV Estimation, aiming to exploit past visual cues and bidirectional break through the physical FoV of a camera. We put forward a FlowLens architecture to expand the FoV by achieving feature propagation explicitly by optical flow and implicitly by a novel clip-recurrent transformer, which has two appealing features: 1) FlowLens comprises a newly proposed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated in the temporal dimension. 2) A multi-branch Mix Fusion Feed Forward Network (MixF3N) is integrated to enhance the spatially-precise flow of local features. To foster training and evaluation, we establish KITTI360-EX, a dataset for outer- and inner FoV expansion. Extensive experiments on both video inpainting and beyond-FoV estimation tasks show that FlowLens achieves state-of-the-art performance.
(Outer Beyond-FoV)
(Inner Beyond-FoV)
(Object Removal)
This repo has been tested in the following environment:
torch == 1.10.2
cuda == 11.3
mmflow == 0.5.2
To train FlowLens(-S), use:
python train.py --config configs/KITTI360EX-I_FlowLens_small_re.json
To eval on KITTI360-EX, run:
python evaluate.py \
--model flowlens \
--cfg_path configs/KITTI360EX-I_FlowLens_small_re.json \
--ckpt release_model/FlowLens-S_re_Out_500000.pth --fov fov5
Turn on --reverse
for test time augmentation (TTA).
Trun on --save_results
to save your output.
The pretrained model can be found there:
https://share.weiyun.com/6G6QEdaa
The preprocessed KITTI360-EX can be downloaded from here:
https://share.weiyun.com/BReRdDiP
Method | Test Logic | TTA | PSNR | SSIM | VFID | Runtime (s/frame) |
---|---|---|---|---|---|---|
FlowLens-S (Paper) | Beyond-FoV | wo | 36.17 | 0.9916 | 0.030 | 0.023 |
FlowLens-S (This Repo) | Beyond-FoV | wo | 37.31 | 0.9926 | 0.025 | 0.015 |
FlowLens-S+ (This Repo) | Beyond-FoV | with | 38.36 | 0.9938 | 0.017 | 0.050 |
FlowLens-S (This Repo) | Video Inpainting | wo | 38.01 | 0.9938 | 0.022 | 0.042 |
FlowLens-S+ (This Repo) | Video Inpainting | with | 38.97 | 0.9947 | 0.015 | 0.142 |
Method | Test Logic | TTA | PSNR | SSIM | VFID | Runtime (s/frame) |
---|---|---|---|---|---|---|
FlowLens (Paper) | Beyond-FoV | wo | 36.69 | 0.9916 | 0.027 | 0.049 |
FlowLens (This Repo) | Beyond-FoV | wo | 37.65 | 0.9927 | 0.024 | 0.033 |
FlowLens+ (This Repo) | Beyond-FoV | with | 38.74 | 0.9941 | 0.017 | 0.095 |
FlowLens (This Repo) | Video Inpainting | wo | 38.38 | 0.9939 | 0.018 | 0.086 |
FlowLens+ (This Repo) | Video Inpainting | with | 39.40 | 0.9950 | 0.015 | 0.265 |
Method | Test Logic | TTA | PSNR | SSIM | VFID | Runtime (s/frame) |
---|---|---|---|---|---|---|
FlowLens-S (Paper) | Beyond-FoV | wo | 19.68 | 0.9247 | 0.300 | 0.023 |
FlowLens-S (This Repo) | Beyond-FoV | wo | 20.41 | 0.9332 | 0.285 | 0.021 |
FlowLens-S+ (This Repo) | Beyond-FoV | with | 21.30 | 0.9397 | 0.302 | 0.056 |
FlowLens-S (This Repo) | Video Inpainting | wo | 21.69 | 0.9453 | 0.245 | 0.048 |
FlowLens-S+ (This Repo) | Video Inpainting | with | 22.40 | 0.9503 | 0.271 | 0.146 |
Method | Test Logic | TTA | PSNR | SSIM | VFID | Runtime (s/frame) |
---|---|---|---|---|---|---|
FlowLens (Paper) | Beyond-FoV | wo | 20.13 | 0.9314 | 0.281 | 0.049 |
FlowLens (This Repo) | Beyond-FoV | wo | 20.85 | 0.9381 | 0.259 | 0.035 |
FlowLens+ (This Repo) | Beyond-FoV | with | 21.65 | 0.9432 | 0.276 | 0.097 |
FlowLens (This Repo) | Video Inpainting | wo | 22.23 | 0.9507 | 0.231 | 0.085 |
FlowLens+ (This Repo) | Video Inpainting | with | 22.86 | 0.9543 | 0.253 | 0.260 |
Note that when using the ''Video Inpainting'' logic for output, the model is allowed to use more reference frames from the future, and each local frame is estimated at least twice, thus higher accuracy can be obtained while result in slower inference speed, and it is not realistic for real-world deployment.
If you find our paper or repo useful, please consider citing our paper:
@article{shi2022flowlens,
title={FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer},
author={Shi, Hao and Jiang, Qi and Yang, Kailun and Yin, Xiaoting and Wang, Kaiwei},
journal={arXiv preprint arXiv:2211.11293},
year={2022}
}
This project would not have been possible without the following outstanding repositories:
Hao Shi
Feel free to contact me if you have additional questions or have interests in collaboration. Please drop me an email at haoshi@zju.edu.cn. =)