This repository is the official implementation of UCLID-Net: Single View Reconstruction in Object Space (NeurIPS 2020).
Code contact: Benoit Guillard
This code was tested on Ubuntu 18.04.3 LTS with CUDA 10.2 and PyTorch 1.2.0. We provide an environment description file to directly set up a conda env with:
conda env create -f conda_env.yml
conda activate uclidnet
To compile required CUDA kernels for Chamfer distance (credits to Thibault Groueix) and grid pooling, please do:
cd extensions
python setup.py install --user
Optionally, training metrics are tracked using Weights & Biases if a correct installation is detected.
Instructions to get datasets to place in the data/
directory are provided here. We also provide trained networks to place in the trained_models/
directory as explained here.
To train a UCLID-Net model, run train_UCLID_Net.py
, using the --test_split
and --train_split
options. We provide splits for the car category, and for the 13 main categories of ShapeNet in data/splits/
. For example, to train a model on the car subset, run:
python train_UCLID_Net.py --train_split data/splits/cars_train.json --test_split data/splits/cars_test.json
Hyperparameters are already set to their default value used in the main paper. Outputs (trained network, logfile and prediction samples) are stored in the output/
directory.
To evaluate the network, run test_UCLID_Net.py
specifying the path to the trained weights using the --model
option. For example, using the provided trained model on all categories
python test_UCLID_Net.py --test_split data/splits/all_13_classes_test.json --model trained_models/UCLID_Net_all_classes.pth
should approximately give the following metrics
Chamfer | Shell IoU | F-score @5% |
---|---|---|
6.5 | 37% | 95.4% |
The small discrepancy with metrics reported in the paper is due to the fact we here evaluate on our own ShapeNet RGB renderings, which have different random viewpoints as the ones provided by the authors of DISN. Latter renderings were used in the main paper.
In addition to computing metrics, the evaluation script also outputs reconstructed 3D points clouds for 10 instances of each category. These are to be found in a directory named after the trained model's weights file (trained_models/UCLID_Net_all_classes/
in the above example).
We provide the architectures, trained weights and training scripts for both auxiliary networks regressing depthmaps and cameras. They can respectively be trained using train_image2depth.py
and train_image2camera.py
. Depth estimation mainly comes from Haofeng Chen's awesome work.
Once trained, use generate_image2depth.py
and generate_image2camera.py
to regress depth and pose for all views of the dataset. For example, the following will generate cameras for all views of the 13 shapes categories using the provided pre-trained model, and store them in data/newly_inferred_camera :
python generate_image2camera.py --model trained_models/image2cam_all_classes.pth --test_split data/splits/all_13_classes_test.json --train_split data/splits/all_13_classes_train.json --output_dir data/newly_inferred_camera
These depthmaps and camera poses can be used for testing/training a UCLID-Net model in place of the ground truth ones, as described here.