Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated paper on the latest model (video understanding, etc.) #38

Open
thecooltechguy opened this issue May 5, 2024 · 4 comments
Open

Comments

@thecooltechguy
Copy link

Congrats on adding support for video understanding to VILA, looks super cool!

Just curious, is there an updated or new paper with more technical details on how improved video understanding was added to the VILA model?

Thanks!

@Lyken17
Copy link
Collaborator

Lyken17 commented May 7, 2024

Hi @thecooltechguy

The main benefits comes from training data improvement during the pre-training.

We are working on techinical papers and plan to reveal more details once ready :)

@hkunzhe
Copy link

hkunzhe commented May 10, 2024

@Lyken17, Great work! Looking forward to the technical paper!

@hkunzhe
Copy link

hkunzhe commented May 23, 2024

@Lyken17 Hi, I noticed that the paper was updated a few days ago, but it still does not mention the capability for video understanding. After comparing VILA's initial submission and version 1.5, I found that the pre-training dataset only added ShareGPT4v, while in SFT, video-related datasets such as shot2story/ShareGPT4Video were added. Moreover, the model was switched from llama2 + clip to llama3 + siglip/internvit. Could you elaborate on that in more detail?

@Lyken17
Copy link
Collaborator

Lyken17 commented Jun 21, 2024

We will release the arxiv in sometime in the July. Stay tuned :)

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants