Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning #70

Open
RohanR04 opened this issue May 28, 2024 · 7 comments
Open

Finetuning #70

RohanR04 opened this issue May 28, 2024 · 7 comments

Comments

@RohanR04
Copy link

Is it possible to finetune VILA through hugging face with a custom image dataset? I don't see any documentation about this.

@RohanR04 RohanR04 closed this as completed Jun 4, 2024
@ZackBradshaw
Copy link

Did you ever figure this out I've been trying to do this with xtuner and I'm having some issues

@Lyken17
Copy link
Collaborator

Lyken17 commented Jul 12, 2024

You can follow the https://github.com/NVlabs/VILA?tab=readme-ov-file#step-3-supervised-fine-tuning to start. We also plan to add some tutorial docs later. Could you list some datasets that you are going to finetune with?

@Lyken17 Lyken17 reopened this Jul 12, 2024
@ZackBradshaw
Copy link

ZackBradshaw commented Jul 15, 2024

I've been working with the Supervised finetuning instructions and I'm running into some issues getting this to work.
The data mixture expects dictionary items and then finds the data path from that however I'm not sure where this is reading and getting its values (huggingface perhaps?). My custom data set is on device and I'm just trying to get it to read that path but when I pass this into data mixture I get

KeyError: './data/data.json'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 22981) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())

@Lyken17
Copy link
Collaborator

Lyken17 commented Jul 17, 2024

which data you are going to add?

@ZackBradshaw
Copy link

ZackBradshaw commented Jul 17, 2024

I have a llava data set following this format. Do I need to define the splits in the json or something?

    data = {
        "id": link,
        "image": image_path,
        "conversations": [
            {
                "from": "human",
                "value": metadata,
            },
            {
                "from": "gpt",
                "value": response,
            }
        ]
    }

the data set is a custom data set in the following format

@groccy
Copy link

groccy commented Aug 19, 2024

Hello @Lyken17 thanks for your awesome work on VILA1.5! Wondering if you have updates on the tutorial docs for finetuning? I was planning to finetune VILA on infrared videos so the docs/tutorials would be truly helpful. Any comments on this would be truly appreciated! Thanks a lot! :)

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024
@olibartfast
Copy link

Hi, I'm very new to the world of VLMs. Can I find an example of a multimodal dataset (e.g., something like image sequences + prompts + ground truth) for fine-tuning?
Also, can I use the Hugging Face SFT Trainer, as described in this guide?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants