-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare with Qwen2-VL #2
Comments
Great issue! Sorry for the overlook of Qwen2-VL. However, if you really want to know the performance of Qwen2-VL-7B on our reasoning benchmark now, I can give you the answer: 65.85. Our LLaVA-o1(Llama-3.2-11B-Vision) is 65.8. To prove this, we will publish a model trained on Qwen2-VL soon. |
Great job! I would like to ask that how the LLaVA-O1 model and the O1 model based on Qwen2VL perform on Chinese language-image reasoning / understanding tasks? Thanks in advanced! |
+1 for comparition with Qwen-VL 2 |
Thanks for your interest! @zhangfaen |
Hi @XuGW-Kevin |
Yeah we've trained such a model on Qwen2-VL-7B-Instruct, and tested a few benchmarks. We find that while the performance on some benchmarks improves, the performance on other benchmarks becomes worse. The overall performance doesn't improve too much. I suspect that Qwen actually used the training questions in AI2D, ScienceQA, etc. Therefore, if we further finetune Qwen, the model will overfit to these questions. I'm not sure whether my suspicions are reasonable, and I think perhaps finetuning on the base Qwen model may help with this issue. I'm also happy to hear about any ideas you have! |
No description provided.
The text was updated successfully, but these errors were encountered: