This repository contains the code associated with the following publications:
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV
Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden
ArXiv (ArXiv 2024)
Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV
Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden
ArXiv (ICCV 2023)
Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter
Jaime Spencer, Chris Russell, Simon Hadfield and Richard Bowden
ArXiv (TMLR 2022)
We have organized several monocular depth prediction challenges around the proposed SYNS-Patches dataset. Check the MDEC website for details on previous editions!
.git-hooks
: Dir containing a pre-commit hook for ignoring Jupyter Notebook outputs.api
: Dir containing main scripts for training, evaluating and data preparation.assets
Dir containing images used in README.cfg
Dir containing config files for training/evaluating.docker
Dir containing Dockerfile and Anaconda package requirements.data
*: (Optional) Dir containing datasets.hpc
: (Optional) Dir containing submission files to HPC clusters.models
*: (Optional) Dir containing trained model checkpoints.results
*: Dir containing the precomputed results used in the paper.src
: Dir containing source code..gitignore
: File containing patterns ignored by Git.PATHS.yaml
*: File containing additional data & model roots.README.md
: This file!
*
Not tracked by Git!
You can download the pretrained full models from the following DropBox link:
- KBR: https://www.dropbox.com/s/o8j4wyhnhgvh4o7/kbr.ckpt?dl=0
- KBR++: https://www.dropbox.com/s/j0m77vthq51aaoy/kbr%2B%2B.ckpt?dl=0
We also provide a minium-requirements script to load a pretrained model and compute predictions on a directory of images. This is probably what you want if you just want to try out the model, as opposed to training it yourself. Code illustrating how to align the predictions to a ground-truth depth map can be found here.
The only requirements for running the model are: timm
, torch
and numpy
.
You can download the val/test MapFreeReloc predictions for our public models from:
- KBR: https://www.dropbox.com/scl/fi/xy95m4xl5qlqvu6bpn9ba/mapfree_kbr_depth.tar.gz?rlkey=tjg8xbgsowd9fbkvw7uusg53m&dl=0
- KBR++: https://www.dropbox.com/scl/fi/ua3p7726r7w8ccmk01cwx/mapfree_kbr-_depth.tar.gz?rlkey=qmxkqjf00vs8t2l2e3ftrns2w&dl=0
These can be used in your own MapFreeReloc submission to replace the baseline DPT+KITTI. Please remember to cite us if doing so!
Each section of the code has its own README file with more detailed instructions. Follow them only after having carried out the remaining steps in this section.
Remember to add the path to the repo to the PYTHONPATH
in order to run the code.
# Example for `bash`. Can be added to `~/.bashrc`.
export PYTHONPATH=/path/to/slowtv_monodepth:$PYTHONPATH
First, set up a GitHub pre-commit hook that stops us from committing Jupyter Notebooks with outputs, since they may potentially contain large images.
./.git-hooks/setup.sh
chmod +x .git/hooks/pre-commit # File sometimes isn't copied as executable. This should fix it.
If using Miniconda, create the environment and run commands as
ENV_NAME=slowtv
conda env create --file docker/environment.yml
conda activate $ENV_NAME
python api/train/train.py ...
To instead build the Docker image, run
docker build -t $ENV_NAME ./docker
docker run -it \
--shm-size=24gb \
--gpus all \
-v $(pwd -P):$(pwd -P) \
-v /path/to/dataroot1:/path/to/dataroot1 \
--user $(id -u):$(id -g) \
$ENV_NAME:latest \
/bin/bash
python api/train/train.py ...
The default locations for datasets and model checkpoints are ./data
& ./models
, respectively.
If you want to store them somewhere else, you can either create symlinks to them, or add additional roots.
This is done by creating the ./PATHS.yaml
file with the following contents:
# -----------------------------------------------------------------------------
MODEL_ROOTS:
- /path/to/modelroot1
DATA_ROOTS:
- /path/to/dataroot1
- /path/to/dataroot2
- /path/to/dataroot3
# -----------------------------------------------------------------------------
NOTE: This file should not be tracked by Git, as it may contain sensitve information about your machine.
Multiple roots may be useful if training in an HPC cluster where data has to be copied locally.
Roots should be listed in order of preference, i.e. dataroot1/kitti_raw_syns
will be given preference over dataroot2/kitti_raw_syns
.
We provide the YAML files containing the precomputed results used in the paper.
These should be copied over to the ./models
directory (or any desired root) in order to follow the structure required
by the evaluation and table-generating scripts.
cp -r ./results/* ./models
If you used the code in this repository or found the papers interesting, please cite them as
@inproceedings{spencer2024cribstv,
title={Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
booktitle={ArXiv Preprint},
year={2024}
}
@inproceedings{spencer2023slowtv,
title={Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2023}
}
@article{spencer2022deconstructing,
title={Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter},
author={Jaime Spencer and Chris Russell and Simon Hadfield and Richard Bowden},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2022},
url={https://openreview.net/forum?id=GFK1FheE7F},
note={Reproducibility Certification}
}
We would also like to thank the authors of the papers below for their contributions and for releasing their code. Please consider citing them in your own work.
Tag | Title | Author | Conf | ArXiv | GitHub |
---|---|---|---|---|---|
Garg | Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue | Garg et. al | ECCV 2016 | ArXiv | GitHub |
Monodepth | Unsupervised Monocular Depth Estimation with Left-Right Consistency | Godard et. al | CVPR 2017 | ArXiv | GitHub |
Kuznietsov | Semi-Supervised Deep Learning for Monocular Depth Map Prediction | Kuznietsov et. al | CVPR 2017 | ArXiv | GitHub |
SfM-Learner | Unsupervised Learning of Depth and Ego-Motion from Video | Zhou et. al | CVPR 2017 | ArXiv | GitHub |
Depth-VO-Feat | Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction | Zhan et. al | CVPR 2018 | ArXiv | GitHub |
DVSO | Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry | Yang et. al | ECCV 2018 | ArXiv | |
Klodt | Supervising the new with the old: learning SFM from SFM | Klodt & Vedaldi | ECCV 2018 | CVF | |
MonoResMatch | Learning monocular depth estimation infusing traditional stereo knowledge | Tosi et. al | CVPR 2019 | ArXiv | GitHub |
DepthHints | Self-Supervised Monocular Depth Hints | Watson et. al | ICCV 2019 | ArXiv | GitHub |
Monodepth2 | Digging Into Self-Supervised Monocular Depth Estimation | Godard et. al | ICCV 2019 | ArXiv | GitHub |
SuperDepth | SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation | Pillai et. al | ICRA 2019 | ArXiv | GitHub |
Johnston | Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume | Johnston & Carneiro | CVPR 2020 | ArXiv | |
FeatDepth | Feature-metric Loss for Self-supervised Learning of Depth and Egomotion | Shu et. al | ECCV 2020 | ArXiv | GitHub |
CADepth | Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation | Yan et. al | 3DV 2021 | ArXiv | GitHub |
DiffNet | Self-Supervised Monocular Depth Estimation with Internal Feature Fusion | Zhou et. al | BMVC 2021 | ArXiv | GitHub |
HR-Depth | HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation | Lyu et. al | AAAI 2021 | ArXiv | GitHub |
MiDaS | Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer | Ranftl el. al | PAMI 2020 | ArXiv | GitHub |
DPT | Vision Transformers for Dense Prediction | Ranftl el. al | ICCV 2021 | ArXiv | GitHub |
NeWCRFs | NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation | Weihao el. al | CVPR 2022 | ArXiv | GitHub |
This project is licenced under the Commons Clause
and GNU GPL
licenses.
For commercial use, please contact the authors.