Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expose remove_previous_ckpt option to training entry point an… #274

Merged
merged 5 commits into from
Feb 15, 2025

Conversation

zwhe99
Copy link
Contributor

@zwhe99 zwhe99 commented Feb 14, 2025

Related issue: #273

  • Add remove_previous_ckpt_in_save and del_local_ckpt_after_load configuration option in ppo_trainer.yaml
  • Update RayPPOTrainer to support optional checkpoint deletion during loading
  • Modify ActorRolloutRefWorker and CriticWorker to pass checkpoint removal flag

…d user configuration

- Add `remove_previous_ckpt` configuration option in `ppo_trainer.yaml`
- Update `RayPPOTrainer` to support optional checkpoint deletion during loading
- Modify `ActorRolloutRefWorker` and `CriticWorker` to pass checkpoint removal flag
- Rename `remove_previous_ckpt` in config file to more specific `remove_previous_ckpt_in_save`
- Add new `del_local_ckpt_after_load` configuration option in config file
- Update checkpoint loading and saving methods to use new configuration names
- Change default value of `remove_previous_ckpt_in_save` to `False` in `ppo_trainer.yaml`
- Change default value of `del_local_ckpt_after_load` to `False` in `ppo_trainer.yaml`
- Update `FSDPCheckpointManager` and `ActorRolloutRefWorker` to use new default values
…point saving methods

- Modify `ActorRolloutRefWorker` and `CriticWorker` to accept optional `remove_previous_ckpt` parameter
@PeterSH6 PeterSH6 merged commit f3afdb3 into volcengine:main Feb 15, 2025
12 checks passed
as12138 pushed a commit to as12138/verl that referenced this pull request Feb 20, 2025
volcengine#274)

Related issue: volcengine#273

- Add `remove_previous_ckpt_in_save` and `del_local_ckpt_after_load`
configuration option in `ppo_trainer.yaml`
- Update `RayPPOTrainer` to support optional checkpoint deletion during
loading
- Modify `ActorRolloutRefWorker` and `CriticWorker` to pass checkpoint
removal flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants