[Docs] Add Abstract in model README (#28)

* [Docs] Add Abstract in model README * add images * revise images size
open-mmlab · Nov 20, 2021 · b5c59aa · b5c59aa
1 parent 3b37c16
commit b5c59aa
Show file tree

Hide file tree

Showing 8 changed files with 218 additions and 8 deletions.
diff --git a/configs/flownet/README.md b/configs/flownet/README.md
@@ -1,6 +1,30 @@
 # FlowNet
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+Convolutional neural networks (CNNs) have recently been very successful
+in a variety of computer vision tasks, especially on those linked to
+recognition. Optical flow estimation has not been among
+the tasks CNNs succeeded at. In this paper we construct CNNs
+which are capable of solving the optical flow estimation problem
+as a supervised learning task. We propose and compare two architectures:
+a generic architecture and another one including a layer that correlates
+feature vectors at different image locations. Since existing ground truth
+data sets are not sufficiently large to train a CNN, we generate a large
+synthetic Flying Chairs dataset. We show that networks trained
+on this unrealistic data still generalize very well to
+existing datasets such as Sintel and KITTI, achieving competitive accuracy
+at frame rates of 5 to 10 fps.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731289-41f87333-c35e-4f15-8d3c-164e005200b8.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/flownet2/README.md b/configs/flownet2/README.md
@@ -1,6 +1,35 @@
 # FlowNet2
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+The FlowNet demonstrated that optical flow estimation
+can be cast as a learning problem. However, the state of
+the art with regard to the quality of the flow has still been
+defined by traditional methods. Particularly on small displacements
+and real-world data, FlowNet cannot compete with variational methods.
+In this paper, we advance the concept of end-to-end learning of optical flow
+and make it work really well.
+The large improvements in quality and speed are caused
+by three major contributions: first, we focus on the training data
+and show that the schedule of presenting data during training is very important.
+Second, we develop a stacked architecture that includes warping
+of the second image with intermediate optical flow. Third,
+we elaborate on small displacements by introducing a sub-network specializing
+on small motions. FlowNet 2.0 is only marginally slower than
+the original FlowNet but decreases the estimation error by more than 50%.
+It performs on par with state-of-the-art methods, while running at interactive
+frame rates. Moreover, we present faster variants that allow optical flow
+computation at up to 140fps with accuracy matching the original FlowNet.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731310-af0c4586-97b6-4a1e-9ada-50c7b2ee0851.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/irr/README.md b/configs/irr/README.md
@@ -1,6 +1,33 @@
 # IRR
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+Deep learning approaches to optical flow estimation
+have seen rapid progress over the recent years. One common trait of
+many networks is that they refine an initial flow estimate either
+through multiple stages or across the levels of a coarse-to-fine representation.
+While leading to more accurate results, the downside of this is an increased
+number of parameters. Taking inspiration from both classical
+energy minimization approaches as well as residual
+networks, we propose an iterative residual refinement (IRR)
+scheme based on weight sharing that can be combined with
+several backbone networks. It reduces the number of parameters,
+improves the accuracy, or even achieves both. Moreover,
+we show that integrating occlusion prediction and bi-directional
+flow estimation into our IRR scheme can
+further boost the accuracy. Our full network achieves state-
+of-the-art results for both optical flow
+and occlusion estimation across several standard datasets.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731424-9cda1d89-e222-4bcf-b1b8-b18b31f7643b.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/liteflownet/README.md b/configs/liteflownet/README.md
@@ -1,6 +1,37 @@
 # LiteFlowNet
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+FlowNet2, the state-of-the-art convolutional neural
+network (CNN) for optical flow estimation, requires over
+160M parameters to achieve accurate flow estimation. In
+this paper we present an alternative network that outperforms
+FlowNet2 on the challenging Sintel final pass and
+KITTI benchmarks, while being 30 times smaller in the
+model size and 1.36 times faster in the running speed. This
+is made possible by drilling down to architectural details
+that might have been missed in the current frameworks: (1)
+We present a more effective flow inference approach at each
+pyramid level through a lightweight cascaded network. It
+not only improves flow estimation accuracy through early
+correction, but also permits seamless incorporation of descriptor matching
+in our network. (2) We present a novel flow regularization layer
+to ameliorate the issue of outliers and vague flow boundaries
+by using a feature-driven local convolution. (3) Our network owns
+an effective structure for pyramidal feature extraction and embraces feature
+warping rather than image warping as practiced in FlowNet2.
+Our code and trained models are available at
+https://github.com/twhui/LiteFlowNet.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731269-eee91f40-1a4d-4c9e-afc6-6d90b0674b62.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/liteflownet2/README.md b/configs/liteflownet2/README.md
@@ -1,6 +1,30 @@
 # LiteFlowNet2
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+Over four decades, the majority addresses the problem of optical flow estimation using variational methods. With the
+advance of machine learning, some recent works have attempted to address the problem using convolutional neural network (CNN)
+and have showed promising results. FlowNet2, the state-of-the-art CNN, requires over 160M parameters to achieve accurate flow
+estimation. Our LiteFlowNet2 outperforms FlowNet2 on Sintel and KITTI benchmarks, while being 25.3 times smaller in the model size
+and 3.1 times faster in the running speed. LiteFlowNet2 is built on the foundation laid by conventional methods and resembles the
+corresponding roles as data fidelity and regularization in variational methods. We compute optical flow in a spatial-pyramid formulation
+as SPyNet but through a novel lightweight cascaded flow inference. It provides high flow estimation accuracy through early
+correction with seamless incorporation of descriptor matching. Flow regularization is used to ameliorate the issue of outliers and vague
+flow boundaries through feature-driven local convolutions. Our network also owns an effective structure for pyramidal feature extraction
+and embraces feature warping rather than image warping as practiced in FlowNet2 and SPyNet. Comparing to LiteFlowNet,
+LiteFlowNet2 improves the optical flow accuracy on Sintel Clean by 23.3%, Sintel Final by 12.8%, KITTI 2012 by 19.6%, and KITTI
+2015 by 18.8%, while being 2.2 times faster. Our network protocol and trained models are made publicly available on
+https://github.com/twhui/LiteFlowNet2.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731269-eee91f40-1a4d-4c9e-afc6-6d90b0674b62.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/maskflownet/README.md b/configs/maskflownet/README.md
@@ -1,6 +1,32 @@
 # MaskFlowNet
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+Feature warping is a core technique in optical flow estimation;
+however, the ambiguity caused by occluded areas during warping is a major
+problem that remains unsolved. In this paper, we propose
+an asymmetric occlusionaware feature matching module,
+which can learn a rough occlusion mask that filters useless (occluded) areas
+immediately after feature warping without any explicit supervision.
+The proposed module can be easily integrated into
+end-to-end network architectures and enjoys performance
+gains while introducing negligible computational cost. The
+learned occlusion mask can be further fed into a subsequent
+network cascade with dual feature pyramids with which we
+achieve state-of-the-art performance. At the time of submission,
+our method, called MaskFlownet, surpasses all published optical flow
+methods on the MPI Sintel, KITTI 2012 and 2015 benchmarks.
+Code is available at https://github.com/microsoft/MaskFlownet.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731471-ed5fc41b-59f9-4e00-b27b-d0456b2a09a2.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/pwcnet/README.md b/configs/pwcnet/README.md
@@ -1,6 +1,32 @@
 # PWC-Net
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+We present a compact but effective CNN model for optical flow,
+called PWC-Net. PWC-Net has been designed
+according to simple and well-established principles: pyramidal processing,
+warping, and the use of a cost volume.
+Cast in a learnable feature pyramid, PWC-Net uses the current optical flow
+estimate to warp the CNN features of the
+second image. It then uses the warped features and features of
+the first image to construct a cost volume, which
+is processed by a CNN to estimate the optical flow. PWC-Net is 17 times
+smaller in size and easier to train than the
+recent FlowNet2 model. Moreover, it outperforms all published optical flow
+methods on the MPI Sintel final pass and
+KITTI 2015 benchmarks, running at about 35 fps on Sintel
+resolution (1024×436) images. Our models are available
+on https://github.com/NVlabs/PWC-Net.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731246-f94698da-9c69-419d-bafe-7b9baab4a7aa.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->
 

diff --git a/configs/raft/README.md b/configs/raft/README.md
@@ -1,6 +1,29 @@
 # RAFT
 
-## Introduction
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+We introduce Recurrent All-Pairs Field Transforms (RAFT),
+a new deep network architecture for optical flow. RAFT extracts perpixel
+features, builds multi-scale 4D correlation volumes for all pairs
+of pixels, and iteratively updates a flow field through a recurrent unit
+that performs lookups on the correlation volumes. RAFT achieves state-
+of-the-art performance. On KITTI, RAFT achieves an F1-all error of
+5.10%, a 16% error reduction from the best published result (6.10%).
+On Sintel (final pass), RAFT obtains an end-point-error of 2.855 pixels,
+a 30% error reduction from the best published result (4.098 pixels). In
+addition, RAFT has strong cross-dataset generalization as well as high
+efficiency in inference time, training speed, and parameter count. Code
+is available at https://github.com/princeton-vl/RAFT.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/76149310/142731339-c1978af7-c9de-4b21-9d6c-e786daff9601.png" width="400"/>
+</div>
+
+## Citation
 
 <!-- [ALGORITHM] -->