Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support evaluation data during ppo training #504

Closed
akk-123 opened this issue Jul 7, 2023 · 3 comments
Closed

support evaluation data during ppo training #504

akk-123 opened this issue Jul 7, 2023 · 3 comments

Comments

@akk-123
Copy link

akk-123 commented Jul 7, 2023

support evaluation data add save best checkpoint(have highest reward) during ppo training

@younesbelkada
Copy link
Contributor

Hi @akk-123
Thanks for the issue, currently you can push to hub the best model (highest reward) since: #275
Let us know if this works for your usecase or no

@akk-123
Copy link
Author

akk-123 commented Jul 10, 2023

thanks for you reply, can you supprt pass evaluation data during ppo training like trlx, and save the best model(highest reward in evaluation data) in the local machine

@younesbelkada
Copy link
Contributor

@akk-123 I believe this is not supported yet, however you can implement your own loop to do so. After calling ppo_trainer.step() you can manually compute the reward of the model on the evaluation data and save it if the reward is higher than in the previous steps.

@akk-123 akk-123 closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants