-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deepspeed experiment #795
Conversation
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
Benchmark on Comment: succeeded ✅ |
The documentation is not available anymore as the PR was closed or merged. |
/benchmark-trl-experiments benchmark/benchmark_level2.sh |
Benchmark on Comment: succeeded ✅ |
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
/benchmark-trl-experiments benchmark/benchmark_level2.sh |
Benchmark on Comment: succeeded ✅ |
Benchmark on Comment: succeeded ✅ |
Cerebras results are expected — it's training against a random reward model, so it's reward learning curve should be more chaotic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this sweet benchmark 🚀 ! I left a comment about adding a benchmark for ZeRO-3 but that can also be a separate PR if you prefer
benchmark/benchmark_level2.sh
Outdated
@@ -1,4 +1,4 @@ | |||
# compound | |||
# compound: gpt2xl + grad_accu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding, is this compound
arg documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compound
comment simply means we are using more features at once (e.g., in this case, we are using a larger model and gradiant accumulation at the same time) :)
# compound: Cerebras-GPT-6.7B + deepspeed zero2 + grad_accu | ||
python benchmark/benchmark.py \ | ||
--command "accelerate launch --config_file examples/accelerate_configs/deepspeed_zero2.yaml examples/scripts/sentiment_tuning.py --ppo_config.exp_name sentiment_tuning_Cerebras-GPT-6.7B_grad_accu_deepspeed_stage2 --ppo_config.batch_size 32 --ppo_config.mini_batch_size 32 --ppo_config.log_with wandb --ppo_config.model_name cerebras/Cerebras-GPT-6.7B --ppo_config.reward_model sentiment-analysis:cerebras/Cerebras-GPT-6.7B" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually I think we should do the "proper" thing and fine-tune these models on IMDB so we have a genuine good policy / reward model. Of course, not necessary for this PR, but perhaps good to be as realistic as possible for the benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that sounds good. Perhaps we can set up an end-to-end example where we train the reward model and then the policy model at the same time.
benchmark/benchmark_level2.sh
Outdated
--slurm-template-path benchmark/trl.slurm_template | ||
|
||
# compound: Cerebras-GPT-6.7B + deepspeed zero2 + grad_accu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also benchmark ZeRO-3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's probably do this in a separate PR.
* Add deepspeed experiment * add deepspeed pip install * update hello world.sh * update comments * remove cleanup
No description provided.