support evaluation data during ppo training #504

akk-123 · 2023-07-07T06:55:30Z

support evaluation data add save best checkpoint(have highest reward) during ppo training

younesbelkada · 2023-07-07T10:36:23Z

Hi @akk-123
Thanks for the issue, currently you can push to hub the best model (highest reward) since: #275
Let us know if this works for your usecase or no

akk-123 · 2023-07-10T02:07:27Z

thanks for you reply, can you supprt pass evaluation data during ppo training like trlx, and save the best model(highest reward in evaluation data) in the local machine

younesbelkada · 2023-07-10T07:05:35Z

@akk-123 I believe this is not supported yet, however you can implement your own loop to do so. After calling ppo_trainer.step() you can manually compute the reward of the model on the evaluation data and save it if the reward is higher than in the previous steps.

akk-123 closed this as completed Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support evaluation data during ppo training #504

support evaluation data during ppo training #504

akk-123 commented Jul 7, 2023

younesbelkada commented Jul 7, 2023

akk-123 commented Jul 10, 2023

younesbelkada commented Jul 10, 2023

support evaluation data during ppo training #504

support evaluation data during ppo training #504

Comments

akk-123 commented Jul 7, 2023

younesbelkada commented Jul 7, 2023

akk-123 commented Jul 10, 2023

younesbelkada commented Jul 10, 2023