You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @akk-123
Thanks for the issue, currently you can push to hub the best model (highest reward) since: #275
Let us know if this works for your usecase or no
thanks for you reply, can you supprt pass evaluation data during ppo training like trlx, and save the best model(highest reward in evaluation data) in the local machine
@akk-123 I believe this is not supported yet, however you can implement your own loop to do so. After calling ppo_trainer.step() you can manually compute the reward of the model on the evaluation data and save it if the reward is higher than in the previous steps.
support evaluation data add save best checkpoint(have highest reward) during ppo training
The text was updated successfully, but these errors were encountered: