PyTorch implementation for training diffusion models to iterative de-noise the pose of an object point cloud and satisfy a geometric relationship with a scene point cloud.
This is the reference implementation for our paper:
Anthony Simeonov, Ankit Goyal*, Lucas Manuelli*, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal**, Dieter Fox**
git clone --recurse git@github.com:anthonysimeonov/rpdiff.git
cd rpdiff
# make virtual environment
# conda below
conda create -n rpdiff-env python=3.8
conda activate rpdiff-env
# virtualenv/venv below
mkdir .envs
virtualenv -p `which python3.8` .envs/rpdiff-env
# python3.8 -m venv .envs/rpdiff-env
source .envs/rpdiff-env/bin/activate
# numpy (specific version), cython for compiled functions, and airobot for pybullet wrappers
pip install -r base_requirements.txt
# other packages + our repo (see pytorch install below for final installs)
pip install -e .
# install pytorch (see PyTorch website - we use v1.13.1 with CUDA 11.7, install command below)
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Some post installation steps:
Install torch-scatter
/torch-cluster
packages
# If torch version 1.13
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.0+cu117.html --no-index
pip install torch-cluster -f https://data.pyg.org/whl/torch-1.13.0+cu117.html --no-index
# If torch version 1.12 (below)
# pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+cu113.html --no-index
# pip install torch-cluster -f https://data.pyg.org/whl/torch-1.12.0+cu113.html --no-index
Install knn_cuda utils
pip install https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
Source setup script to set environment variables (from repo root directory -- must be done in each terminal, consider adding to .bashrc
/.zshrc
)
source rpdiff_env.sh
For training data generation/eval/debugging, setup the meshcat visualizer in the background (i.e., use a background tmux
terminal)
meshcat-server
Needs port 7000
to be forwarded (by default)
Download necessary .obj
. files, pre-trained model weights, and procedurally-generated demonstration data
# separately (with separate download scripts)
bash scripts/dl_model_weights.bash # small download
bash scripts/dl_objects.bash # small download
bash scripts/dl_train_data.bash # large download (~80GB), slower
# all together (slower)
bash scripts/dl_rpdiff_all.bash
Config files for evaluation are found in full_eval_cfgs
inside the config
folder
For book/bookshelf
cd src/rpdiff/eval
python evaluate_rpdiff.py -c book_on_bookshelf/book_on_bookshelf_withsc.yaml
For mug/rack-multi
cd src/rpdiff/eval
python evaluate_rpdiff.py -c mug_on_rack_multi/mug_on_rack_multi_withsc.yaml
For can/cabinet
cd src/rpdiff/eval
python evaluate_rpdiff.py -c can_on_cabinet/can_on_cabinet_withsc.yaml
There are config files for evaluating the system both with and without the additional success classifier model.
Config files for training are found in train_cfgs
inside the config
folder
For book/bookshelf
cd src/rpdiff/training
python train_full.py -c book_on_bookshelf_cfgs/book_on_bookshelf_pose_diff_with_varying_crop_fixed_noise_var.yaml
For mug/rack-multi
cd src/rpdiff/training
python train_full.py -c mug_on_rack_multi_cfgs/mug_on_rack_multi_pose_diff_with_varying_crop_fixed_noise_var.yaml
For can/cabinet
cd src/rpdiff/training
python train_full.py -c can_on_cabinet_cfgs/can_on_cabinet_pose_diff_with_varying_crop_fixed_noise_var.yaml
For book/bookshelf
cd src/rpdiff/training
python train_full.py -c book_on_bookshelf_cfgs/book_on_bookshelf_succ_cls.yaml
For mug/rack-multi
cd src/rpdiff/training
python train_full.py -c mug_on_rack_multi_cfgs/mug_on_rack_multi_succ_cls.yaml
For can/cabinet
cd src/rpdiff/training
python train_full.py -c can_on_cabinet_cfgs/can_on_cabinet_succ_cls.yaml
Config files for data generation are found in full_demo_cfgs
inside the config
folder
For book/bookshelf
cd src/rpdiff/data_gen
python object_scene_procgen_demos.py -c book_on_bookshelf/bookshelf_double_view.yaml
For mug/rack-multi
cd src/rpdiff/data_gen
python object_scene_procgen_demos.py -c mug_on_rack/mug_on_rack_multi.yaml
For can/cabinet
cd src/rpdiff/data_gen
python object_scene_procgen_demos.py -c can_on_cabinet/can_cabinet.yaml
Post-processing
These scripts are unfortunately not implemented to be easily run in parallel across many workers. Instead, we rely on the less elegant way of scaling up data generation by run the script in separate terminal windows with different seeds (which can be specified with the -s $SEED
flag). This creates multiple separate folders with the same root dataset name and different _seed$SEED
suffixes. Before training, these separate folders must be merged together. We also create training splits (even though the objects are already split in the data generation script) and provide utilities to combine the separate .npz
files that are saved together into larger ``chunked'' files that are a little easier on shared NFS filesystems (for file I/O during training).
These post-processing steps are all packaged in the post_process_demos.py
script, which takes in a --dataset_dir
flag which should use the same name as the experiment_name
parameter in the config file used for data gen. For example, if we generate demos with the name bookshelf_demos
, the resulting folders in the directory where the data is saved (data/task_demos
by default) will look like:
bookshelf_demos_seed0
bookshelf_demos_seed1
...
To combine these into a single bookshelf_demos
folder, we then run
python post_process_demos.py --dataset_dir bookshelf_demos
Config files for training, eval, and data generation are all found in the config
folder. These consist of .yaml
files that inherit from a base base.yaml
file. Utility functions for config files can be found in the config_util.py
file. Configuration parameters are loaded as nested dictionaries, which can be accessed using either standard value = dict['key']
syntax or value = dict.key
syntax.
The full rearrangement prediction pipeline is implemented in the multistep_pose_regression.py
file. This uses the trained models and the observed point clouds to iteratively update the pose of the object point cloud, while tracking the overall composed transform thus far until returning the final full transformation to execute.
When generating demonstration data, we give the set of demonstrations a name and post-process the demos into chunked files. During training, we must provide the path to the demos to load in the config file, and specify if these are the chunked demos or the original un-chunked demos. While training, model weights will be saved in the model_weights/rpdiff
folder. When we evaluate the system, we similarly must specify the path to the model weights in the corresponding config file loaded when running the eval script.
If you find our paper or this code useful in your work, please cite our paper:
@article{simeonov2023rpdiff,
author = {Simeonov, Anthony
and Goyal, Ankit
and Manuelli, Lucas
and Yen-Chen, Lin
and Sarmiento, Alina,
and Rodriguez, Alberto
and Agrawal, Pulkit
and Fox, Dieter},
title = {Shelving, Stacking, Hanging: Relational
Pose Diffusion for Multi-modal Rearrangement},
journal={arXiv preprint arXiv:2307.04751},
year={2023}
}
Parts of this code were built upon implementations found in the Relational NDF repo, the Neural Shape Mating repo, and the Convolutional Ocupancy Networks repo. Check out their projects as well!