Skip to content

Latest commit

 

History

History
150 lines (128 loc) · 7.29 KB

README.md

File metadata and controls

150 lines (128 loc) · 7.29 KB

SINGAPO

PyTorch Lightning WandB

SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects

Jiayi Liu, Denys Iliash, Angel X. Chang, Manolis Savva, Ali Mahdavi-Amiri

Preprint

Website | Arxiv

teaser

Environment Setup

We recommend to use miniconda to manage the environment. The environment was tested on Ubuntu 20.04.4 LTS.

# Create a conda environment
conda create -n singapo python=3.10
conda activate singapo

# Install Pytorch
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia

# Install other packages
pip install -r requirement.txt

# Install Pytorch3D (for evaluation)
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

To use GPT-4o for graph extraction (during inference), you need to add your OpenAI API key by creating a .env file in the root directory of the project as follows:

In the .env file
OPENAI_API_KEY=<YOUR_API_KEY>

Download Data

Data preprocessed from PartNet-Mobility dataset (train + eval)

We use the data preprocessed from PartNet-Mobility dataset for training, evaluation, and part retrieval. Please download the data here (~13GB) to run the quick demo, for evaluation, or for training.

Our augmented data (for training only)

If you're interested in training our model from scratch, please also download from here(~76GB) to use our augmented data.

Data preprocessed from ACD dataset (for eval only)

If you'd like to run evaluation on the ACD dataset, you can download our proprocessed data here(~3GB).

File structure

The default directory for loading our data is ../data, which is the same level as our project directory.

├── data
│   ├── StorageFurniture
│   ├── Table
│   │   ├── <model_id>
│   ├── ...
├── <project directory>

For each object, we preprocess the data with the following files:

<model_id>
├── imgs         # 20 renderings from random views (in the resting state)
├── features     # DinoV2 features and foreground masks on the patches
├── objs         # textured meshes for parts
├── plys         # part meshes for retrieval
├── object.json  # part hierarchy and articulation parameters

The preprocessing script for rendering, feature extraction, and mask computation can be found under scripts/preprocess.

Download Checkpoints

You can download our pretrained model here(~40MB), extract out and put it in the exps folder under the project directory.

<project directory>
├── exps
│   ├── singapo
│   │   ├── final

Usage

Quick Demo

We provide a quick demo to run the inference on an example input image located at demo/demo_input.png. This script will take the example image as input, predict part connectivity graph using GPT-4o, extract image feature using DinoV2, and generate articulated object using our model. Please make sure that the model checkpoint and preprocessed data (from PartNet-Mobility) are downloaded.

# To run the whole package
python demo/demo.py

If you don't have the OpenAI API key yet, you can opt to skip the graph prediction by using our given graph demo/example_graph.json that is parsed from the GPT response.

# To skip the graph prediction using GPT-4o
python demo/demo.py --use_example_graph

If you successfully run the script, the output will be saved at demo/demo_output. By default, there will be three objects generated out by initializing with different noises. For other configuration, please see the arguments in the script.

Evaluation

If you're interested in evaluating our model on the test set (see the data split in data/data_split.json for PartNet-Mobility, and in data/data_acd.json for ACD dataset), you can run the test script as below.

# Evaluate on the test set (given GT graph, no object category label)
python test.py \
    --config exps/singapo/final/config/parsed.yaml \
    --ckpt exps/singapo/final/ckpts/last.ckpt \ 
    --label_free \
    --which_data pm

We also share the graph prediction results here so that you can run the evaluation by taking the graph prediction from GPT-4o as input. Once downloaded, you can put it under the exps directory, as shown in the following file structure.

<project directory>
├── exps
│   ├── predict_graph
│   │   ├── acd_test
│   │   ├── pm_test

To use these recordings of the graph prediction for evaluation, you need to specify the path to one of the prediction folders --G_dir. For example,

# Evaluate on the test set (given predicted graph, no object category label)
python test.py \
    --config exps/singapo/final/config/parsed.yaml \
    --ckpt exps/singapo/final/ckpts/last.ckpt \
    --label_free \
    --which_data pm \ 
    --G_dir exps/pred_graph/pm_test

The evaluation is only supported on a single GPU, which was tested on a NVIDIA 3060 (12GB).

Training

To train our model from scratch, the preprocessed data from PartNet-Mobility (downloaded here) and our augmented data (downloaded here) is required.

We train our model on top of a CAGE model pretrained under our setting. This checkpoint can be downloaded here, which is put under pretrained folder by default.

<project directory>
├── pretrained
│   ├── cage_cfg.ckpt

Run the following command to train our model from scratch. The original model is trained on 4 NVIDIA A100s.

python train.py \
    --config configs/config.yaml \
    --pretrained_cage pretrained/cage_cfg.ckpt

Citation

@article{liu2024singapo,
  title={{SINGAPO}: Single Image Controlled Generation of Articulated Parts in Object},
  author={Liu, Jiayi and Iliash, Denys and Chang, Angel X and Savva, Manolis and Mahdavi-Amiri, Ali},
  journal={arXiv preprint arXiv:2410.16499},
  year={2024}
}