Tutorial2020_Stereo: The Exercises on Recent Deep-Learning Stereo Vision Models. Prepared for the Summer School 2020 at the airlab in Carnegie Mellon University
This repository contains the exercise code for the course of Recent Advances of Binocular Stereo Vision held by the airlab in Carnegie Mellon University as a part of the Summer School 2020.
The course covers both recent non-learning and learning based methods. This repository contains the learning-based models discussed in the lecture. The non-learning part can be found here.
This exersice code presents two popular deep-learning structures for passive binocular stereo vision, namely, the 3D cost volume structure [1] and the cross-correlation structure [2](although [2] is for optical flow).
The backbone comes from the PSMNet[1]. The origina PSMNet is modified such that it also estimates the uncertainty of its own disparity prediction. This modified model is called PSMNU. Please refer to [3] for more details about PSMNU.
The implementaion of the cross-correlation is mainly from the PWC-Net [2] which does optical flow estimation. The model provided here is modified to match the current PyTorch version 1.5. Cross-correlation is only performed along the x-axis since we only care about the disparity not the optical flow.
Pre-trained models can be found in here.
Please note that there ara two models for each type of structure. For a single structure, the two pre-trained models deal with RGB and grayscale images, respectively. All models are trained on the Scene Flow datasets (with Flyingthings3D and Monkaa but not the driving scenes) [4]. For the models that have a 3D cost volume, maximum disparity is 256 pixel. The models having cross-correlation are trained with a maximum disparity range of 128 pixels.
Pre-trained models are provided here for exercice, not for benchmarking, since the overall performance is not optimized.
A GPU is required to run the exercises.
These code and pre-trained models are tested on Ubuntu 18.04 LTS with Python 3.6.9, CUDA 10, and PyTorch 1.0+, TorchVision.
Use the following command to install the required python packages to the current python environment (Not including PyTorch and TorchVision).
pip3 install opencv-python numba colorcet plyfile
Video is at https://youtu.be/lwX5S0MIFzs.
The sample input data are saved at /SampleData sub-directory.
In thise exercise, we will use the pre-trained models which implementing the 3D cost volume to perform disparity estimation. The steps are very simple.
- Create a new sub-folder /PSMNU/Pretrained and place the pre-trained models in the newly created folder.
- Go to directory /PSMNU.
- Run
python3 LocalTest.py
.
LocalTest.py
reads the content in /PSMNU/Cases.json
. Individual case can be disabled by setting the "enable"
key to "flase"
.
If everything works smoothly, a bunch of results will be produced. For sample case that have ground truth data, an error map will be drawn alongside the disparity prediction. In the following figure, the first column contains the Ref. and Tst. images. The ground truth dsiparity is the top-center one and the prediction locates at the bottom-center.
The models also predict the per-pixel uncertainty of its disparity prediction. The uncertainty is shown in the above image at the lower right corner. The colormaps used in the above image can be found here, specifically, rainbow
for disparity, coolwarm
for disparity error compared with the ground truth, and CET_L19
for uncertainty.
For disparity error and uncertainty maps, the numbers at the top left corners represent the average error (A), standard deviation of the error (S), minimum uncertainty (min), and maximum uncertainty(max).
If the ground truth disparity is not available, then the result will look like the following image.
If the camera parameters (intrinsic and extrinsic paramters) are known for a sample case, the reconstructed 3D point cloud will be generated as a PLY file.
The cross-correlation layer has to be compiled and installed to the python environment before testing the pre-trained models. Steps aftet the installation are pretty much the same with the 3D cost volumne exercise.
- Go to /Correlation/CorrelationCUDA.
- Run
python3 setup.py build_ext
. Wait for the compilation process. - Run
python3 setup.py install --record InstalledFiles.txt
. The cross-correlation layer will be installed to the current python environment. E.g, if you are using a virtual environment, the intallation destination will be the correct location specified by the virtual environment. The--record
command records the installed files during the installation. If you would like to remove the installed cross-correlation layer, usecat InstalledFiles.txt | xargs rm -f
. - Create a new sub-folder /Correlation/Pretrained and place in the pre-trained models.
- Go to /Correlation.
- Run
python3 LocalTest.py
.
Similar to 3D cost volume exercise, LocalTest.py
reads in Cases.json
file and process all the enabled cases. Disparity error map and 3D point cloud will be generated if associated ground truth data or camera parameters are available. The color maps are the same with the PSMNU exercise. The following figure is one of the results of the sample cases. The four sections in this image, in the top-to-bottom and left-to-right order, are the Ref. image, true disparity, disparity error map and the predicted disparity.
[1] Chang, Jia-Ren, and Yong-Sheng Chen. "Pyramid stereo matching network." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410-5418. 2018.
[2] Sun, Deqing, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. "Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8934-8943. 2018.
[3] Y. Hu, W. Zhen, and S. Scherer, “Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction,” presented at the 2020 IEEE International Conference on Robotics and Automation (ICRA), 2019.
[4] Mayer, Nikolaus, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. "A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4040-4048. 2016.