Skip to content

[ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.

Notifications You must be signed in to change notification settings

OpenM3D/M3DBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

M3DBench: Let's Instruct Large Models with
Multi-modal 3D Prompts

💡Project Page📃Arxiv Paper🗂Dataset🤗Checkpoint • 📖Citation


🏃 Intro M3DBench

M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts, covering a variety of fundamental abilities in real-world 3D environments.

🚩 News

  • [2024/09] Upload the code
  • [2024/08] Release the M3DBench to boost MLLM's 3D perception, reasoning, and planning. See datasets.
  • [2024/07] 🎉 M3DBench is accepted by ECCV 2024!

TODO:

  • Upload training and evaluation scripts.
  • Release pre-trained Checkpoint.
  • Scale up models.

⚡ Set up

Environment Setup

Step 1. Build Dependencies. Our code is tested with CUDA 12.2 and Python 3.8.19. It is recommended to create a virtual environment [Optional].

conda create -n m3dbench python=3.8
conda activate m3dbench

Next, you should install the following packages:

pip install h5py
pip install scipy
pip install cython
pip install plyfile
pip install trimesh==2.35.39
pip install networkx==2.2
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.44.2
pip install numpy==1.19.5

After that, build the pointnet2 and accelerated giou from source:

cd third_party/pointnet2
python setup.py install
cd ../../utils
python cython_compile.py build_ext --inplace
Data and Pre-trained Weights Preparation

Step 1. Prepare the 3D Data and Language Annotations.

Please refer to the instructions available here to download the pre-processed 3D data and language annotations from M3DBench.

Step 2. Download Pre-trained weights.

You'll need to download the following pre-trained weights for the scene encoder, image encoder, shape encoder, and LLM:

  1. Scene Encoder
    We offer two types of 3D scene encoders:

  2. Image Encoder
    Download the openai/clip-vit-large-patch14-336 checkpoint (or another image encoder) from Hugging Face.

  3. Shape Encoder
    Download the pre-trained checkpoint from 3D-VisTA.

  4. LLM If your server doesn't support auto-downloading from huggingface, manually download the meta-llama/Llama-2-7b checkpoint (or another decoder-only LLM) from Hugging Face.

💻 Train your own model

Training
Evaluation

📖 Citation

If you find our work helps, please consider starring ⭐ us and citing:

@misc{li2023m3dbench,
      title={M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts}, 
      author={Mingsheng Li and Xin Chen and Chi Zhang and Sijin Chen and Hongyuan Zhu and Fukun Yin and Gang Yu and Tao Chen},
      year={2023},
      eprint={2312.10763},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments

Thanks to DepthContrast, LL3DA, CLIP, 3D-VisTA, OPT, and Llama 2. We borrow some of their codes and checkpoints.

License

This code is distributed under an MIT LICENSE. If there are any problem regarding our project, please open an issue.

About

[ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published