Skip to content

Commit

Permalink
[Docs] Add Abstract in model README (#28)
Browse files Browse the repository at this point in the history
* [Docs] Add Abstract in model README

* add images

* revise images size
  • Loading branch information
MeowZheng authored Nov 20, 2021
1 parent 3b37c16 commit b5c59aa
Show file tree
Hide file tree
Showing 8 changed files with 218 additions and 8 deletions.
26 changes: 25 additions & 1 deletion configs/flownet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,30 @@
# FlowNet

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Convolutional neural networks (CNNs) have recently been very successful
in a variety of computer vision tasks, especially on those linked to
recognition. Optical flow estimation has not been among
the tasks CNNs succeeded at. In this paper we construct CNNs
which are capable of solving the optical flow estimation problem
as a supervised learning task. We propose and compare two architectures:
a generic architecture and another one including a layer that correlates
feature vectors at different image locations. Since existing ground truth
data sets are not sufficiently large to train a CNN, we generate a large
synthetic Flying Chairs dataset. We show that networks trained
on this unrealistic data still generalize very well to
existing datasets such as Sintel and KITTI, achieving competitive accuracy
at frame rates of 5 to 10 fps.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731289-41f87333-c35e-4f15-8d3c-164e005200b8.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
31 changes: 30 additions & 1 deletion configs/flownet2/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
# FlowNet2

## Introduction
## Abstract

<!-- [ABSTRACT] -->

The FlowNet demonstrated that optical flow estimation
can be cast as a learning problem. However, the state of
the art with regard to the quality of the flow has still been
defined by traditional methods. Particularly on small displacements
and real-world data, FlowNet cannot compete with variational methods.
In this paper, we advance the concept of end-to-end learning of optical flow
and make it work really well.
The large improvements in quality and speed are caused
by three major contributions: first, we focus on the training data
and show that the schedule of presenting data during training is very important.
Second, we develop a stacked architecture that includes warping
of the second image with intermediate optical flow. Third,
we elaborate on small displacements by introducing a sub-network specializing
on small motions. FlowNet 2.0 is only marginally slower than
the original FlowNet but decreases the estimation error by more than 50%.
It performs on par with state-of-the-art methods, while running at interactive
frame rates. Moreover, we present faster variants that allow optical flow
computation at up to 140fps with accuracy matching the original FlowNet.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731310-af0c4586-97b6-4a1e-9ada-50c7b2ee0851.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
29 changes: 28 additions & 1 deletion configs/irr/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
# IRR

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Deep learning approaches to optical flow estimation
have seen rapid progress over the recent years. One common trait of
many networks is that they refine an initial flow estimate either
through multiple stages or across the levels of a coarse-to-fine representation.
While leading to more accurate results, the downside of this is an increased
number of parameters. Taking inspiration from both classical
energy minimization approaches as well as residual
networks, we propose an iterative residual refinement (IRR)
scheme based on weight sharing that can be combined with
several backbone networks. It reduces the number of parameters,
improves the accuracy, or even achieves both. Moreover,
we show that integrating occlusion prediction and bi-directional
flow estimation into our IRR scheme can
further boost the accuracy. Our full network achieves state-
of-the-art results for both optical flow
and occlusion estimation across several standard datasets.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731424-9cda1d89-e222-4bcf-b1b8-b18b31f7643b.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
33 changes: 32 additions & 1 deletion configs/liteflownet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,37 @@
# LiteFlowNet

## Introduction
## Abstract

<!-- [ABSTRACT] -->

FlowNet2, the state-of-the-art convolutional neural
network (CNN) for optical flow estimation, requires over
160M parameters to achieve accurate flow estimation. In
this paper we present an alternative network that outperforms
FlowNet2 on the challenging Sintel final pass and
KITTI benchmarks, while being 30 times smaller in the
model size and 1.36 times faster in the running speed. This
is made possible by drilling down to architectural details
that might have been missed in the current frameworks: (1)
We present a more effective flow inference approach at each
pyramid level through a lightweight cascaded network. It
not only improves flow estimation accuracy through early
correction, but also permits seamless incorporation of descriptor matching
in our network. (2) We present a novel flow regularization layer
to ameliorate the issue of outliers and vague flow boundaries
by using a feature-driven local convolution. (3) Our network owns
an effective structure for pyramidal feature extraction and embraces feature
warping rather than image warping as practiced in FlowNet2.
Our code and trained models are available at
https://github.com/twhui/LiteFlowNet.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731269-eee91f40-1a4d-4c9e-afc6-6d90b0674b62.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
26 changes: 25 additions & 1 deletion configs/liteflownet2/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,30 @@
# LiteFlowNet2

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Over four decades, the majority addresses the problem of optical flow estimation using variational methods. With the
advance of machine learning, some recent works have attempted to address the problem using convolutional neural network (CNN)
and have showed promising results. FlowNet2, the state-of-the-art CNN, requires over 160M parameters to achieve accurate flow
estimation. Our LiteFlowNet2 outperforms FlowNet2 on Sintel and KITTI benchmarks, while being 25.3 times smaller in the model size
and 3.1 times faster in the running speed. LiteFlowNet2 is built on the foundation laid by conventional methods and resembles the
corresponding roles as data fidelity and regularization in variational methods. We compute optical flow in a spatial-pyramid formulation
as SPyNet but through a novel lightweight cascaded flow inference. It provides high flow estimation accuracy through early
correction with seamless incorporation of descriptor matching. Flow regularization is used to ameliorate the issue of outliers and vague
flow boundaries through feature-driven local convolutions. Our network also owns an effective structure for pyramidal feature extraction
and embraces feature warping rather than image warping as practiced in FlowNet2 and SPyNet. Comparing to LiteFlowNet,
LiteFlowNet2 improves the optical flow accuracy on Sintel Clean by 23.3%, Sintel Final by 12.8%, KITTI 2012 by 19.6%, and KITTI
2015 by 18.8%, while being 2.2 times faster. Our network protocol and trained models are made publicly available on
https://github.com/twhui/LiteFlowNet2.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731269-eee91f40-1a4d-4c9e-afc6-6d90b0674b62.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
28 changes: 27 additions & 1 deletion configs/maskflownet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@
# MaskFlowNet

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Feature warping is a core technique in optical flow estimation;
however, the ambiguity caused by occluded areas during warping is a major
problem that remains unsolved. In this paper, we propose
an asymmetric occlusionaware feature matching module,
which can learn a rough occlusion mask that filters useless (occluded) areas
immediately after feature warping without any explicit supervision.
The proposed module can be easily integrated into
end-to-end network architectures and enjoys performance
gains while introducing negligible computational cost. The
learned occlusion mask can be further fed into a subsequent
network cascade with dual feature pyramids with which we
achieve state-of-the-art performance. At the time of submission,
our method, called MaskFlownet, surpasses all published optical flow
methods on the MPI Sintel, KITTI 2012 and 2015 benchmarks.
Code is available at https://github.com/microsoft/MaskFlownet.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731471-ed5fc41b-59f9-4e00-b27b-d0456b2a09a2.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
28 changes: 27 additions & 1 deletion configs/pwcnet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,32 @@
# PWC-Net

## Introduction
## Abstract

<!-- [ABSTRACT] -->

We present a compact but effective CNN model for optical flow,
called PWC-Net. PWC-Net has been designed
according to simple and well-established principles: pyramidal processing,
warping, and the use of a cost volume.
Cast in a learnable feature pyramid, PWC-Net uses the current optical flow
estimate to warp the CNN features of the
second image. It then uses the warped features and features of
the first image to construct a cost volume, which
is processed by a CNN to estimate the optical flow. PWC-Net is 17 times
smaller in size and easier to train than the
recent FlowNet2 model. Moreover, it outperforms all published optical flow
methods on the MPI Sintel final pass and
KITTI 2015 benchmarks, running at about 35 fps on Sintel
resolution (1024×436) images. Our models are available
on https://github.com/NVlabs/PWC-Net.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731246-f94698da-9c69-419d-bafe-7b9baab4a7aa.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down
25 changes: 24 additions & 1 deletion configs/raft/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,29 @@
# RAFT

## Introduction
## Abstract

<!-- [ABSTRACT] -->

We introduce Recurrent All-Pairs Field Transforms (RAFT),
a new deep network architecture for optical flow. RAFT extracts perpixel
features, builds multi-scale 4D correlation volumes for all pairs
of pixels, and iteratively updates a flow field through a recurrent unit
that performs lookups on the correlation volumes. RAFT achieves state-
of-the-art performance. On KITTI, RAFT achieves an F1-all error of
5.10%, a 16% error reduction from the best published result (6.10%).
On Sintel (final pass), RAFT obtains an end-point-error of 2.855 pixels,
a 30% error reduction from the best published result (4.098 pixels). In
addition, RAFT has strong cross-dataset generalization as well as high
efficiency in inference time, training speed, and parameter count. Code
is available at https://github.com/princeton-vl/RAFT.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/76149310/142731339-c1978af7-c9de-4b21-9d6c-e786daff9601.png" width="400"/>
</div>

## Citation

<!-- [ALGORITHM] -->

Expand Down

0 comments on commit b5c59aa

Please sign in to comment.