-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference Time Issue #32
Comments
|
I see. Thank you for your answer! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Appreciate your efforts in maintaining this project!
While I ran the zero-shot VQA inference (generating results) on the MSRVTT dataset, it took 28 hours (using 4 A5000) to finish. I recognize that it is caused by too many video-question pairs (~70K), but have you solved this problem by implementing a better dataloader? Otherwise, have you experimented with small subset during the development?
Also, I have a minor question why zero2 setting is used for fine-tuning, instead of zero3, compared to the pre-training stage used zero3. This is reserve setting of LLaVA which used zero2 for pre-training and zeor3 for fine-tuning.
And, may I ask for memory consumption during fine-tuning the 7B model since even a batch size of 1 is not enough with using 4 A100 40GB. In the case of using lora for fine-tuning, may I know the configuration that you used, (e.g., lora_r, lora_alpha, etc., and whether the same learning rate was used for mm_projector)?
Thanks!
The text was updated successfully, but these errors were encountered: