Multi-View Transformer for 3D Visual Grounding

Multi-View Transformer for 3D Visual Grounding [CVPR 2022]

Installation and Data Preparation

Please refer the installation and data preparation from referit3d.

We adopt bert-base-uncased from huggingface, which can be installed using pip as follows:

pip install transformers

you can download the pretrained weight in this page, and put them into a folder, noted as PATH_OF_BERT.

Training

To train on either Nr3d or Sr3d dataset, use the following commands

    python referit3d/scripts/train_referit3d.py \
    -scannet-file $PATH_OF_SCANNET_FILE$ \
    -referit3D-file $PATH_OF_REFERIT3D_FILE$ \
    --bert-pretrain-path $PATH_OF_BERT$ \
    --log-dir logs/MVT_nr3d \
    --n-workers 8 \
    --model 'referIt3DNet_transformer' \
    --unit-sphere-norm True \
    --batch-size 24 \
    --encoder-layer-num 3 \
    --decoder-layer-num 4 \
    --decoder-nhead-num 8 \
    --gpu "0" \
    --view_number 4 \
    --rotate_number 4 \
    --label-lang-sup True

To train nr3d in joint with sr3d, add the following argument

    --augment-with-sr3d sr3d_dataset_file.csv

Validation

After each epoch of the training, the program will automatically evaluate the performance of the current model. Our code will save the last model in the training as last_model.pth, and save the best model following the original Referit3D's repo as best_model.pth.

Test

At test time, the analyze_predictions will run following the original code of Referit3D.
The analyze_predictions will test the model multiple times, each time using a different random seed. With different random seeds, the sampled point clouds of each object are different. The average accuracy and std will be reported.
To test on either Nr3d or Sr3d dataset, use the following commands

    python referit3d/scripts/train_referit3d.py \
    --mode evaluate \
    -scannet-file $PATH_OF_SCANNET_FILE$ \
    -referit3D-file $PATH_OF_REFERIT3D_FILE$ \
    --bert-pretrain-path $PATH_OF_BERT$ \
    --log-dir logs/MVT_nr3d \
    --resume-path $the_path_to_the_model.pth$ \
    --n-workers 8 \
    --model 'referIt3DNet_transformer' \
    --unit-sphere-norm True \
    --batch-size 24 \
    --encoder-layer-num 3 \
    --decoder-layer-num 4 \
    --decoder-nhead-num 8 \
    --gpu "0" \
    --view_number 4 \
    --rotate_number 4 \
    --label-lang-sup True

To test on joint trained model, add the following argument to the above command

    --augment-with-sr3d sr3d_dataset_file.csv

For ScanRefer dataset, please refer MVT_ScanRefer.

Citation

@inproceedings{huang2022multi,
  title={Multi-View Transformer for 3D Visual Grounding},
  author={Huang, Shijia and Chen, Yilun and Jia, Jiaya and Wang, Liwei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15524--15533},
  year={2022}
}

Credits

The project is built based on the following repository:

ReferIt3D.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
referit3d		referit3d
MVT.png		MVT.png
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-View Transformer for 3D Visual Grounding

Installation and Data Preparation

Training

Validation

Test

Citation

Credits

About

Releases

Packages

Languages

sega-hsj/MVT-3DVG

Folders and files

Latest commit

History

Repository files navigation

Multi-View Transformer for 3D Visual Grounding

Installation and Data Preparation

Training

Validation

Test

Citation

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages