This repository provides a reproduced implementation of LLMI3D, built upon the framework of LLaVA. The overall pipeline is illustrated below:
Please note that some implementation details differ from the original paper. Additionally, we have only conducted evaluation experiments on a mini-dataset. If you encounter any bugs or have questions, feel free to open an issue or reach out to me.
The dependencies are identical to those required by LLaVA. Follow the steps below to set up the environment:
-
Clone this repository:
git clone https://github.com/12e21/LLMI3D-LLaVA.git
-
Install the necessary packages:
conda create -n llmi3d python=3.10 -y conda activate llmi3d pip install --upgrade pip # Enable PEP 660 support pip install -e . pip install -e ".[train]" pip install flash-attn --no-build-isolation
-
Download the SUNRGB-D dataset and place it in the
datafolder. -
Prepare the dataset using the SUNRGB-D data:
python data_process/dataset_making/prepare_data.py
You can modify the sample count and output path in the script:
dataset_size = 100 output_dir = Path('IG3D-SUNRGBD-mini')
-
Generate descriptive phrases for each data sample:
python data_process/dataset_making/gen_phrase.py
Note: Replace the placeholders with your own MLLM platform API key and base URL:
client = OpenAI( api_key="Your-API-Key", base_url="Your-API-Base-URL" )
-
Convert annotations into the JSON format used by LLaVA:
python data_process/dataset_making/gen_llava_json.py
-
Process annotations into the prediction format:
python data_process/anno_process/anno_process_offline.py
The final dataset structure should look like this:
IG3D-SUNRGBD-mini
├── depth
├── rgb
├── rgb_2dbox
├── train_llava.json
├── train_llava_processed.json
├── train.pkl
└── train_with_caption.pkl-
Download the pre-trained models for LLaVA and ViT from Hugging Face. Place them in the
weightsfolder. -
Fine-tune the model using LoRA:
bash scripts/llmi3d/finetune_lora_py.sh
-
Merge the LoRA weights:
python scripts/merge_lora_weights.py --model-path "checkpoints/llava-v1.5-7b-task-lora" --model-base "weights/llava-v1.5-7b" --save-model-path "checkpoints/llava-v1.5-7b-task-merged"
-
Test on a single sample:
python test/eval_llava.py
-
Test on the entire dataset:
python test/eval_dataset.py
This step will generate a
results.jsonfile containing the prediction results. -
Visualize the results:
python anno_process/visualize_results_dataset.py

