-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Evaluation tools of the Few-shot VQA/Caption #5
Comments
Thanks for the authors' support. The evaluation tools of the Few-shot VQA/Caption is also essential for the researchers following this work. Looking forward to the release of this part. Ty very much ! |
Hi Qingyun, Which evaluation scripts you are looking for VQA and caption? Current |
Thanks for your reply! I'm looking forward to the Few-shot OKVQA/TextVQA/CocoCaption/FlickrCaption in the ablation study of Table 1/3. 🙏🙏 |
@Lyken17 I'm writing to request evaluation tools of the Few-shot VQA/Caption (Specifically, 4-shots OKVQA/TextVQA/CocoCaption/FlickrCaption in the ablation study of VILA Table 1/3). The experimental results validated that: when used for pre-training Llava-like MLLMs, image-text interleaved data (MMC4) achieves better few-shot VQA/Caption results than image-text pairs data (COYO/LAION...). I tried to eval the Few-shot VQA scores of the open-source VILA-7B weight, but i did not get the same conclusion. okvqa I realize that my implementation may not work for validating the few-shots performance, so I wish you to consider releasing the evaluation tool, since you seem to have become the major contributor of this open source repository. Best regards, Details of my implementation has been sent to your email. |
./scripts/v1_5/eval/eval_all.sh
is not availiable
cc' @kentang-mit and @Seerkfang who are more familar with evaluation scripts. |
@Lyken17 Dear @kentang-mit and @Seerkfang: Could you please share few-shot evaluation scrips? It will be a great help to my research and I will be very grateful to you. In few-show VQA/Caption results of the VILA paper, compared to the decline of image-text pair pre-training, the promotion of interleaved image-text pre-training is an essential reason for VILA to add stage2. Stage2 seems to make SFT model better few-shot learning performance, which can also serve as a rebuttal to the point of #12 . |
fix dataset path
Hi, I'm interested in your great work.
The
./scripts/v1_5/eval/eval_all.sh
is not avalilable now. Could you release the evaluation tools? Especially the few-shot VQA/Caption.And the mmc4 pretrained weight is wished to be availiable.
dataset_mixture of new_vflan_sharegpt4v_sft is also not availiable.
Ty very much !
The text was updated successfully, but these errors were encountered: