Skip to content

A reproduced implementation of LLMI3D, built upon the framework of LLaVA

License

Notifications You must be signed in to change notification settings

12e21/LLMI3D-LLaVA

Repository files navigation

LLMI3D-LLaVA: A Reproduced Version of LLMI3D Based on LLaVA

This repository provides a reproduced implementation of LLMI3D, built upon the framework of LLaVA. The overall pipeline is illustrated below:

Pipeline

Please note that some implementation details differ from the original paper. Additionally, we have only conducted evaluation experiments on a mini-dataset. If you encounter any bugs or have questions, feel free to open an issue or reach out to me.


Installation

The dependencies are identical to those required by LLaVA. Follow the steps below to set up the environment:

  1. Clone this repository:

    git clone https://github.com/12e21/LLMI3D-LLaVA.git
  2. Install the necessary packages:

    conda create -n llmi3d python=3.10 -y
    conda activate llmi3d
    pip install --upgrade pip  # Enable PEP 660 support
    pip install -e .
    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation

Generating the Dataset

  1. Download the SUNRGB-D dataset and place it in the data folder.

  2. Prepare the dataset using the SUNRGB-D data:

    python data_process/dataset_making/prepare_data.py

    You can modify the sample count and output path in the script:

    dataset_size = 100
    output_dir = Path('IG3D-SUNRGBD-mini')
  3. Generate descriptive phrases for each data sample:

    python data_process/dataset_making/gen_phrase.py

    Note: Replace the placeholders with your own MLLM platform API key and base URL:

    client = OpenAI(
        api_key="Your-API-Key", 
        base_url="Your-API-Base-URL"
    )
  4. Convert annotations into the JSON format used by LLaVA:

    python data_process/dataset_making/gen_llava_json.py
  5. Process annotations into the prediction format:

    python data_process/anno_process/anno_process_offline.py

The final dataset structure should look like this:

IG3D-SUNRGBD-mini
├── depth
├── rgb
├── rgb_2dbox
├── train_llava.json
├── train_llava_processed.json
├── train.pkl
└── train_with_caption.pkl

Training

  1. Download the pre-trained models for LLaVA and ViT from Hugging Face. Place them in the weights folder.

  2. Fine-tune the model using LoRA:

    bash scripts/llmi3d/finetune_lora_py.sh
  3. Merge the LoRA weights:

    python scripts/merge_lora_weights.py --model-path "checkpoints/llava-v1.5-7b-task-lora" --model-base "weights/llava-v1.5-7b" --save-model-path "checkpoints/llava-v1.5-7b-task-merged"

Testing

  1. Test on a single sample:

    python test/eval_llava.py
  2. Test on the entire dataset:

    python test/eval_dataset.py

    This step will generate a results.json file containing the prediction results.

  3. Visualize the results:

    python anno_process/visualize_results_dataset.py

Example visualization:
example


Related Projects

About

A reproduced implementation of LLMI3D, built upon the framework of LLaVA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published