Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about pretraining #43

Open
Chuan-shanjia opened this issue Jul 10, 2024 · 3 comments
Open

Some questions about pretraining #43

Chuan-shanjia opened this issue Jul 10, 2024 · 3 comments

Comments

@Chuan-shanjia
Copy link

Hello! I'm very interested in your great work! I have two questions about pretraining.
Does the generalization ability of UMT come from CLIP? With this in mind, regardless of what kind of pre-training dataset is used, it is all about approaching the effectiveness of the weights of the open-source CLIP. So do is the choice of pretraing dataset in stage1 important?
Here's another question. Is the pre-training in stage2 helpful for visual-only tasks? If we finetune visual-only dataset on stage2 pretrained model, will it outperform stage1 pretrained model?
Looking forward to you reply!

@Andy1621
Copy link
Collaborator

  1. The high-quality video will be better since I have used Webvid which is ~10x more than K400 with 1/10 epochs, but the result is worse. That's why I only use videos from action recognition datasets, see InternVideo2.
  2. Good question! Under a full-tuning setting, stage 2's checkpoint performs similarly to stage 1's checkpoint. But under a frozen-tuning setting, the multi-modal training helps and performs much better.

@Chuan-shanjia
Copy link
Author

Your answer is really helpful, thank youI!
If I want to utilize the model to other video domains rather than action recognition. Will it be helpful to perform continue pretrain(stage1) on those videos? Or do you have any suggestions for improving performance in other video domains?
Looking forward to your reply!

@Andy1621
Copy link
Collaborator

Sorry for late response. You can use the models with masked pretraining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants