LLMI3D-LLaVA: A Reproduced Version of LLMI3D Based on LLaVA

This repository provides a reproduced implementation of LLMI3D, built upon the framework of LLaVA. The overall pipeline is illustrated below:

Please note that some implementation details differ from the original paper. Additionally, we have only conducted evaluation experiments on a mini-dataset. If you encounter any bugs or have questions, feel free to open an issue or reach out to me.

Installation

The dependencies are identical to those required by LLaVA. Follow the steps below to set up the environment:

Clone this repository:

git clone https://github.com/12e21/LLMI3D-LLaVA.git

Install the necessary packages:

conda create -n llmi3d python=3.10 -y
conda activate llmi3d
pip install --upgrade pip  # Enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Generating the Dataset

Download the SUNRGB-D dataset and place it in the data folder.

Prepare the dataset using the SUNRGB-D data:

python data_process/dataset_making/prepare_data.py

You can modify the sample count and output path in the script:

dataset_size = 100
output_dir = Path('IG3D-SUNRGBD-mini')

Generate descriptive phrases for each data sample:

python data_process/dataset_making/gen_phrase.py

Note: Replace the placeholders with your own MLLM platform API key and base URL:

client = OpenAI(
    api_key="Your-API-Key", 
    base_url="Your-API-Base-URL"
)

Convert annotations into the JSON format used by LLaVA:
```
python data_process/dataset_making/gen_llava_json.py
```

Process annotations into the prediction format:

python data_process/anno_process/anno_process_offline.py

The final dataset structure should look like this:

IG3D-SUNRGBD-mini
├── depth
├── rgb
├── rgb_2dbox
├── train_llava.json
├── train_llava_processed.json
├── train.pkl
└── train_with_caption.pkl

Training

Download the pre-trained models for LLaVA and ViT from Hugging Face. Place them in the weights folder.
Fine-tune the model using LoRA:
```
bash scripts/llmi3d/finetune_lora_py.sh
```

Merge the LoRA weights:

python scripts/merge_lora_weights.py --model-path "checkpoints/llava-v1.5-7b-task-lora" --model-base "weights/llava-v1.5-7b" --save-model-path "checkpoints/llava-v1.5-7b-task-merged"

Testing

Test on a single sample:
```
python test/eval_llava.py
```
Test on the entire dataset:
```
python test/eval_dataset.py
```
This step will generate a results.json file containing the prediction results.

Visualize the results:

python anno_process/visualize_results_dataset.py

Example visualization:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
code_test		code_test
data_process		data_process
images		images
llava		llava
playground/data/prompts		playground/data/prompts
scripts		scripts
test		test
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMI3D-LLaVA: A Reproduced Version of LLMI3D Based on LLaVA

Installation

Generating the Dataset

Training

Testing

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

12e21/LLMI3D-LLaVA

Folders and files

Latest commit

History

Repository files navigation

LLMI3D-LLaVA: A Reproduced Version of LLMI3D Based on LLaVA

Installation

Generating the Dataset

Training

Testing

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages