-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated paper on the latest model (video understanding, etc.) #38
Comments
The main benefits comes from training data improvement during the pre-training. We are working on techinical papers and plan to reveal more details once ready :) |
@Lyken17, Great work! Looking forward to the technical paper! |
@Lyken17 Hi, I noticed that the paper was updated a few days ago, but it still does not mention the capability for video understanding. After comparing VILA's initial submission and version 1.5, I found that the pre-training dataset only added ShareGPT4v, while in SFT, video-related datasets such as shot2story/ShareGPT4Video were added. Moreover, the model was switched from llama2 + clip to llama3 + siglip/internvit. Could you elaborate on that in more detail? |
We will release the arxiv in sometime in the July. Stay tuned :) |
Congrats on adding support for video understanding to VILA, looks super cool!
Just curious, is there an updated or new paper with more technical details on how improved video understanding was added to the VILA model?
Thanks!
The text was updated successfully, but these errors were encountered: