Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Project Page | Paper | ArXiv | Full Dataset

Preparation

Install pytorch 1.12 or higher version, the pytorch-lighting version can be found here

conda create -n sfd python=3.9
conda activate sfd
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

# optional, if you want to use normal prior from pretrained model
pip install pytorch-lightning == 1.7.1

Install other dependencies

pip install -r requirements.txt

The sample dataset is included in /data The model works in both Linux and Windows

Data Preprocessing

Tips:

Currently we assume all instances can contribute to the reconstruction. If some instances failed during SfM, then the preprocessing pipeline will not work. You could manully mask those failed images and re-run preprocessing. Future version will consider this situation.
The original image should have big enough resolution, otherwise there may not enough keypoints for SfM.

Where to put your image

Create a new folder in /data to put custom input, like /data/your_object. Then create a /data/your_object/train folder. Put your RGB image and instance segmentation image in /data/your_object/train and rename them as "000_rgb.png" and "000_instance_seg.png".

The folder structure will be:

/data
  /airplane
  /your_object
    /raw
      -000_rgb.png
      -000_instance_seg.png

The instance seg can be obtained from Segment-anything (not provide here) or manual segmentation. Its background should be 0, then the value of each instance area is 1/N×255, 2/N×255, 3/N×255, ..., N/N×255, where N is instance numbers.

Preprocessing flow

0: crop each instance from the original image 1: find keypoints and match them for each pair 2-4: turn pair-wise matching to global matching 5: sfm 6-7: visualize and dump pose 8: dump surface normal from pretrained network, will be skipped if failed

For 5_sfm, please install colmap by 'pip install pycolmap==0.6.1'

For 8_extract_monocular_cues.py, you should download the weight from Omnidata and put the pretrained normal prediction network "omnidata_dpt_normal_v2.ckpt" to /preprocess/omnidata/omnidata_tools/torch/pretrained_models.

Start processing

First set the value to your own object

object_name = 'your_object' # set a name, same as folder name instance_num = 6 # number of instances in the image. Change it to the actual number of instances in the image

And also need to update datasets/data_info.py, add a line like

'your_object'    : [instance_num, False, 800],

Then run script:

python preprocess/run.py

Then the training data will appear in /data/your_object

Training

Take airplane as example, we train the network in 3 stages. The checkpoints will be generated under /exps.

Stage 1: Train geometry network (~10 hour)

python exp_runner.py \
  --conf configs/default.yaml \
  --data_split_dir ./data/airplane \
  --expname airplane \
  --trainstage Geo \
  --use_pretrain_normal \
  --init_method SFM

Stage 2: Train visibility network (~30 minutes)

python exp_runner.py \
  --conf configs/default.yaml \
  --data_split_dir ./data/airplane \
  --expname airplane \
  --trainstage Vis \
  --init_method SFM

Stage 3: Train material network (~1 hour)

python exp_runner.py \
  --conf configs/default.yaml \
  --data_split_dir ./data/airplane \
  --expname airplane \
  --trainstage Mat \
  --init_method SFM

Note for command:

--is_continue : load from previous checkpoint
--use_pretrain_normal : add normal constrain from MonoSDF. Model performance may decrease when pretrained normal has bad quality.
--debug: forbid visualization and run experiment in low sample numbers.

TODO

[√] release training code
[√] release sample data
[ ] release eval code
[ ] release full dataset
[√] release pre-process code
[ ] release pretrained weight
[ ] extract mesh and texture from network

Others

Coordinate System

OOM

You can decrease geo_num_pixels, vis_num_pixels or mat_num_pixels if out of memory

Training Visualization

Input

Image | Instance mask

Geometry Stage

Appearence (500iter/frame) | Surface Normal (500iter/frame) | Rendering Error (500iter/frame)

Material Stage

Diffuse (1000iter/frame) | Roughness (1000iter/frame) | Rerender (1000iter/frame)

Potential Bugs

RuntimeError: cannot import name '_compare_version' from 'torchmetrics.utilities.imports'. Solution

Acknowledgements

part of our code is inherited from InvRender. We are grateful to the authors for releasing their code.

Citation

@inproceedings{cheng2023structure,
  title={Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects},
  author={Cheng, Tianhang and Ma, Wei-Chiu and Guan, Kaiyu and Torralba, Antonio and Wang, Shenlong},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
data		data
datasets		datasets
description		description
envmaps		envmaps
model		model
preprocess		preprocess
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
exp_runner.py		exp_runner.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Preparation

Data Preprocessing

Where to put your image

Preprocessing flow

Start processing

Training

Stage 1: Train geometry network (~10 hour)

Stage 2: Train visibility network (~30 minutes)

Stage 3: Train material network (~1 hour)

TODO

Others

Coordinate System

OOM

Training Visualization

Input

Geometry Stage

Material Stage

Potential Bugs

Acknowledgements

Citation

About

Releases

Packages

Languages

Tianhang-Cheng/SfD

Folders and files

Latest commit

History

Repository files navigation

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Preparation

Data Preprocessing

Where to put your image

Preprocessing flow

Start processing

Training

Stage 1: Train geometry network (~10 hour)

Stage 2: Train visibility network (~30 minutes)

Stage 3: Train material network (~1 hour)

TODO

Others

Coordinate System

OOM

Training Visualization

Input

Geometry Stage

Material Stage

Potential Bugs

Acknowledgements

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages