Skip to content

MasterHow/FlowLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer


Hao Shi·Qi Jiang·Kailun Yang·Xiaoting Yin·Kaiwei Wang

Paper

PWC

Update

  • 2022.11.19 Init repository.
  • 2022.11.21 Release the arXiv version with supplementary materials.
  • 2023.04.04 🔥 Our code is publicly available.
  • 2023.04.04 🔥 Release pretrained models.
  • 2023.04.04 🔥 Release KITTI360-EX dataset.

TODO List

  • Code release.
  • KITTI360-EX release.
  • Towards higher performance with extra small costs.

Abstract

Limited by hardware cost and system size, camera's Field-of-View (FoV) is not always satisfactory. However, from a spatio-temporal perspective, information beyond the camera’s physical FoV is off-the-shelf and can actually be obtained ''for free'' from past video streams. In this paper, we propose a novel task termed Beyond-FoV Estimation, aiming to exploit past visual cues and bidirectional break through the physical FoV of a camera. We put forward a FlowLens architecture to expand the FoV by achieving feature propagation explicitly by optical flow and implicitly by a novel clip-recurrent transformer, which has two appealing features: 1) FlowLens comprises a newly proposed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated in the temporal dimension. 2) A multi-branch Mix Fusion Feed Forward Network (MixF3N) is integrated to enhance the spatially-precise flow of local features. To foster training and evaluation, we establish KITTI360-EX, a dataset for outer- and inner FoV expansion. Extensive experiments on both video inpainting and beyond-FoV estimation tasks show that FlowLens achieves state-of-the-art performance.

Demos

(Outer Beyond-FoV)

Animation



(Inner Beyond-FoV)

Animation



(Object Removal)

Animation



Dependencies

This repo has been tested in the following environment:

torch == 1.10.2
cuda == 11.3
mmflow == 0.5.2

Usage

To train FlowLens(-S), use:

python train.py --config configs/KITTI360EX-I_FlowLens_small_re.json

To eval on KITTI360-EX, run:

python evaluate.py \
--model flowlens \
--cfg_path configs/KITTI360EX-I_FlowLens_small_re.json \
--ckpt release_model/FlowLens-S_re_Out_500000.pth --fov fov5

Turn on --reverse for test time augmentation (TTA).

Trun on --save_results to save your output.

Pretrained Models

The pretrained model can be found there:

https://share.weiyun.com/6G6QEdaa

KITTI360-EX for Beyond-FoV Estimation

The preprocessed KITTI360-EX can be downloaded from here:

https://share.weiyun.com/BReRdDiP

Results

KITTI360EX-InnerSphere

Method Test Logic TTA PSNR SSIM VFID Runtime (s/frame)
FlowLens-S (Paper) Beyond-FoV wo 36.17 0.9916 0.030 0.023
FlowLens-S (This Repo) Beyond-FoV wo 37.31 0.9926 0.025 0.015
FlowLens-S+ (This Repo) Beyond-FoV with 38.36 0.9938 0.017 0.050
FlowLens-S (This Repo) Video Inpainting wo 38.01 0.9938 0.022 0.042
FlowLens-S+ (This Repo) Video Inpainting with 38.97 0.9947 0.015 0.142
Method Test Logic TTA PSNR SSIM VFID Runtime (s/frame)
FlowLens (Paper) Beyond-FoV wo 36.69 0.9916 0.027 0.049
FlowLens (This Repo) Beyond-FoV wo 37.65 0.9927 0.024 0.033
FlowLens+ (This Repo) Beyond-FoV with 38.74 0.9941 0.017 0.095
FlowLens (This Repo) Video Inpainting wo 38.38 0.9939 0.018 0.086
FlowLens+ (This Repo) Video Inpainting with 39.40 0.9950 0.015 0.265

KITTI360EX-OuterPinhole

Method Test Logic TTA PSNR SSIM VFID Runtime (s/frame)
FlowLens-S (Paper) Beyond-FoV wo 19.68 0.9247 0.300 0.023
FlowLens-S (This Repo) Beyond-FoV wo 20.41 0.9332 0.285 0.021
FlowLens-S+ (This Repo) Beyond-FoV with 21.30 0.9397 0.302 0.056
FlowLens-S (This Repo) Video Inpainting wo 21.69 0.9453 0.245 0.048
FlowLens-S+ (This Repo) Video Inpainting with 22.40 0.9503 0.271 0.146
Method Test Logic TTA PSNR SSIM VFID Runtime (s/frame)
FlowLens (Paper) Beyond-FoV wo 20.13 0.9314 0.281 0.049
FlowLens (This Repo) Beyond-FoV wo 20.85 0.9381 0.259 0.035
FlowLens+ (This Repo) Beyond-FoV with 21.65 0.9432 0.276 0.097
FlowLens (This Repo) Video Inpainting wo 22.23 0.9507 0.231 0.085
FlowLens+ (This Repo) Video Inpainting with 22.86 0.9543 0.253 0.260

Note that when using the ''Video Inpainting'' logic for output, the model is allowed to use more reference frames from the future, and each local frame is estimated at least twice, thus higher accuracy can be obtained while result in slower inference speed, and it is not realistic for real-world deployment.

Citation

If you find our paper or repo useful, please consider citing our paper:

@article{shi2022flowlens,
title={FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer},
author={Shi, Hao and Jiang, Qi and Yang, Kailun and Yin, Xiaoting and Wang, Kaiwei},
journal={arXiv preprint arXiv:2211.11293},
year={2022}
}

Acknowledgement

This project would not have been possible without the following outstanding repositories:

STTN, MMFlow

Devs

Hao Shi

Contact

Feel free to contact me if you have additional questions or have interests in collaboration. Please drop me an email at haoshi@zju.edu.cn. =)