Official PyTorch implementation of paper:
Neural Markov Random Field for Stereo Matching, CVPR 2024
Tongfan Guan, Chen Wang, Yun-Hui Liu
[2024/07/18]
: 🚀 NMRF-Stereo-SwinT ranks first on KITTI 2012 and KITTI 2015-NOC, with the ImageNet pretrained Swin-T as backbone.
The stereo method of hand-crafted Markov Random Field (MRF) lacks sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of MRF models, the overall accuracy is still severely limited by the hand-crafted pairwise terms and message passing. To address these issues, we propose a neural MRF model, where both potential functions and message passing are designed using data-driven neural networks. Our fully data-driven model is built on the foundation of variational inference theory, to prevent convergence issues and retain stereo MRF's graph inductive bias. To make the inference tractable and scale well to high-resolution images, we also propose a Disparity Proposal Network (DPN) to adaptively prune the search space for every pixel.
-
High accuracy & efficiency
NMRF-Stereo reports state-of-the-art accuracy on Scene Flow and ranks first on KITTI 2012 and KITTI 2015 leaderboards among all published methods at the time of submission. The model runs at 90ms (RTX 3090) for KITTI data (1242x375).
-
Strong cross-domain generalization
NMRF-Stereo exhibits great generalization abilities on other dataset/scenes. The model is trained only with synthetic Scene Flow data:
-
Sharp depth boundaries
NMRF-Stereo is able to recover sharp depth boundaries, which is key to downstream applications, such as 3D reconstruction and object detection.
Our code is developed on Ubuntu 20.04 using Python 3.8 and PyTorch 1.13. Please note that the code has only been tested with these specified versions. We recommend using conda for the installation of dependencies:
- Create the
NMRF
conda environment and install all dependencies:
conda env create -f environment.yml
conda activate NMRF
- Build deformable attention and superpixel-guided disparity downsample operator:
cd ops && sh make.sh && cd ..
To train/evaluate NMRF-Stereo, you will need to download the required datasets.
- Scene Flow (Includes FlyingThings3D, Driving & Monkaa)
- Middlebury
- ETH3D
- KITTI 2012
- KITTI 2015
By default datasets.py
will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the $root/datasets
folder:
ln -s $YOUR_DATASET_ROOT datasets
Our folder structure is as follows:
├── datasets
├── ETH3D
│ ├── two_view_training
│ └── two_view_training_gt
├── KITTI
│ ├── KITTI_2012
│ │ ├── testing
│ │ └── training
│ └── KITTI_2015
│ ├── testing
│ └── training
├── Middlebury
│ ├── 2014
│ └── MiddEval3
└── SceneFlow
├── Driving
│ ├── disparity
│ └── frames_finalpass
├── FlyingThings3D
│ ├── disparity
│ └── frames_finalpass
└── Monkaa
├── disparity
└── frames_finalpass
We provide a script to generate occlusion mask for Scene Flow dataset. This may bring marginal performance improvement.
python tools/generate_occlusion_map.py
Pretrained models can be downloaded from google drive
We assume the downloaded weights are located under the pretrained directory.
You can demo a trained model on pairs of images. To predict stereo for ETH3D, run
python inference.py --dataset-name eth3d --output $output_directory SOLVER.RESUME pretrained/sceneflow.pth
Or test on your own stereo pairs
python inference.py --input $left_directory/*.png $right_directory/*.png --output $output_directory SOLVER.RESUME pretrained/$pretrained_model.pth
To evaluate on SceneFlow test set, run
python main.py --num-gpus 4 --eval-only SOLVER.RESUME pretrained/sceneflow.pth
Or for cross-domain generalization:
python main.py --num-gpus 4 --eval-only --config-file configs/zero_shot_evaluation.yaml SOLVER.RESUME pretrained/sceneflow.pth
For submission to KITTI 2012 and 2015 online test sets, you can run:
python inference.py --dataset-name kitti_2015 SOLVER.RESUME pretrained/kitti.pth
and
python inference.py --dataset-name kitti_2012 SOLVER.RESUME pretrained/kitti.pth
To train on SceneFlow, run
python main.py --checkpoint-dir checkpoints/sceneflow --num-gpus 4
To train on KITTI, run
python main.py --checkpoint-dir checkpoints/kitti --config-file configs/kitti_mix_train.yaml --num-gpus 4 SOLVER.RESUME pretrained/sceneflow.pth
We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with
tensorboard --logdir checkpoints
and then access http://localhost:6006 in your browser.
If you find our work useful in your research, please consider citing our paper:
@inproceedings{guan2024neural,
title={Neural Markov Random Field for Stereo Matching},
author={Guan, Tongfan and Wang, Chen and Liu, Yun-Hui},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5459--5469},
year={2024}
}
This project would not have been possible without relying on some awesome repos: RAFT-Stereo, Detectron2, and Swin. The code from all those third-party models is not included in the MIT license. Please refer to the original authors for the license of these third-party codes.