Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Evaluation tools of the Few-shot VQA/Caption #5

Open
Li-Qingyun opened this issue Mar 6, 2024 · 6 comments
Open

Comments

@Li-Qingyun
Copy link

Li-Qingyun commented Mar 6, 2024

Hi, I'm interested in your great work.

The ./scripts/v1_5/eval/eval_all.sh is not avalilable now. Could you release the evaluation tools? Especially the few-shot VQA/Caption.

And the mmc4 pretrained weight is wished to be availiable.

dataset_mixture of new_vflan_sharegpt4v_sft is also not availiable.

Ty very much !

@Li-Qingyun
Copy link
Author

Thanks for the authors' support.
I found the ./scripts/v1_5/eval/eval_all.sh has been availiable.

The evaluation tools of the Few-shot VQA/Caption is also essential for the researchers following this work. Looking forward to the release of this part.

Ty very much !

@Lyken17
Copy link
Collaborator

Lyken17 commented Mar 7, 2024

Hi Qingyun,

Which evaluation scripts you are looking for VQA and caption? Current eval_all.sh should cover all metrics in the paper.

@Li-Qingyun
Copy link
Author

Li-Qingyun commented Mar 7, 2024

Hi Qingyun,

Which evaluation scripts you are looking for VQA and caption? Current eval_all.sh should cover all metrics in the paper.

@Lyken17

Thanks for your reply! I'm looking forward to the Few-shot OKVQA/TextVQA/CocoCaption/FlickrCaption in the ablation study of Table 1/3. 🙏🙏
Best Regards.

@Li-Qingyun
Copy link
Author

@Lyken17 I'm writing to request evaluation tools of the Few-shot VQA/Caption (Specifically, 4-shots OKVQA/TextVQA/CocoCaption/FlickrCaption in the ablation study of VILA Table 1/3).

The experimental results validated that: when used for pre-training Llava-like MLLMs, image-text interleaved data (MMC4) achieves better few-shot VQA/Caption results than image-text pairs data (COYO/LAION...). I tried to eval the Few-shot VQA scores of the open-source VILA-7B weight, but i did not get the same conclusion.

okvqa
0-shot: 61.05
1-shot: 56.93
2-shot: 56.84
4-shot: 56.47
textvqa
0-shot: 62.64
1-shot: 60.73
2-shot: 60.45
4-shot: 60.88

I realize that my implementation may not work for validating the few-shots performance, so I wish you to consider releasing the evaluation tool, since you seem to have become the major contributor of this open source repository.
It will be a great help to my research and I will be very grateful to you.

Best regards,
Qingyun.

Details of my implementation has been sent to your email.

@Li-Qingyun Li-Qingyun changed the title [Feature Request] ./scripts/v1_5/eval/eval_all.sh is not availiable [Feature Request] Evaluation tools of the Few-shot VQA/Caption Mar 16, 2024
@Lyken17
Copy link
Collaborator

Lyken17 commented Mar 21, 2024

cc' @kentang-mit and @Seerkfang who are more familar with evaluation scripts.

@Li-Qingyun
Copy link
Author

cc' @kentang-mit and @Seerkfang who are more familar with evaluation scripts.

@Lyken17
Okkk, thanks for your reply!

Dear @kentang-mit and @Seerkfang:

Could you please share few-shot evaluation scrips?

It will be a great help to my research and I will be very grateful to you.

In few-show VQA/Caption results of the VILA paper, compared to the decline of image-text pair pre-training, the promotion of interleaved image-text pre-training is an essential reason for VILA to add stage2. Stage2 seems to make SFT model better few-shot learning performance, which can also serve as a rebuttal to the point of #12 .

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants