-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification evaluation for LLaVA #4
Comments
Hi, thanks for asking. We demonstrate zero-shot classification only for the CLIP models on their own and consider LLaVA and OpenFlamingo for captioning/VQA tasks. |
Thank you for the clarification. I have another question: Why is the batch size hardcoded to 1? Is it just to avoid padding text tokens? Or am I missing something? |
You're right, it should definitely be possible to run with larger batch sizes, it's just hardcoded to batch_size 1 in a few places since we couldn't fit much more on our devices anyway for adversarial evaluations |
Hi, thank you so much for clarifying everything. Just one last question: does the code use beam search to generate the outputs? |
No problem :) We basically stick to how the models are evaluated in their respective papers, so greedy decoding without beam-search for LLaVA, and beam search with 3 beams for OpenFlamingo. |
Hi, currently, the code throws a NotImplementedError for LLaVA, but I believe the paper demonstrates zero-shot classification on LLaVA. When will the code be updated to include this feature? Alternatively, could you point out the main parts that would need significant changes to incorporate LLaVA?
Thank you.
The text was updated successfully, but these errors were encountered: