Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation loss on pretraining?[Feature request] #20

Open
james20141606 opened this issue Jun 21, 2024 · 6 comments
Open

Validation loss on pretraining?[Feature request] #20

james20141606 opened this issue Jun 21, 2024 · 6 comments

Comments

@james20141606
Copy link

feature

Hi, I am trying to redo the pretrain step as you described in the readme doc. The training loss converges pretty fast. I find the logs in wandb and it turned out to be only containing the training loss. I wonder if you could add other metrics, like validation loss and perplexity.

Thanks a lot!

@mu-cai
Copy link
Collaborator

mu-cai commented Jun 24, 2024

Thanks for the question. However, I do not have validation dataset incorporated during training. Feel free to try it by your own!

@james20141606
Copy link
Author

Thanks for your reply! By the way do you have validation data in the finetuning stage?

@james20141606
Copy link
Author

And I have two extra questions which I am confused with:

  • I tried to pretrain the vip-llava using either your provided data or my custom data and they both converge very fast. The loss plateaus within 5 hrs on one single A100. Does that happen to your experiments as well?
  • To create a vip-llava model on a specific domain, for example satellite data. Do you think we should pretrain vip-llava using satellite data and then ft with instructions? Or do you think it is enough to load your pretrained checkpoint and ft on custom data? Do you have any intuitions on it?
    I would appreciate it a lot if you could answer my questions. Thanks!

@mu-cai
Copy link
Collaborator

mu-cai commented Jun 27, 2024

  1. Yes, LLMs's loss decrease very fast.
  2. I think either works, and all of those depends on the quality and quantity of your data!

@james20141606
Copy link
Author

Thanks for your reply! I'd like to confirm again that for pretraining stage, do you freeze the LLM weights?

@mu-cai
Copy link
Collaborator

mu-cai commented Jul 17, 2024

For pretraining, I never freeze the LLM weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants