-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardise example scripts #842
Conversation
The documentation is not available anymore as the PR was closed or merged. |
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
Benchmark on Comment: succeeded ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the standardization! Very nice change. I assume multi_adapter_rl.py
is deprecated in favor of multi_adapter_rl_v2.py
(the now run_ppo_multi_adapter.py
)?
Yes, that's correct! |
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
Benchmark on Comment: failed ❌ |
/benchmark-trl-experiments benchmark/benchmark_level1.sh |
Benchmark on Comment: succeeded ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great, thanks! Small nit: I don't like the run_xxx.py
naming that much, I think just xxx.py
would do the job and be less redundant.
Good idea! Done in a6d1d90 I'll merge if all the tests still pass |
LG! |
* Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>
* enable xpu support * fix bug * review commits * fix style * add xou decorator * refactor review commit * fix test * review commit * fix test * Update benchmark.yml (#856) * Standardise example scripts (#842) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> * Fix version check in import_utils.py (#853) * dont use get_peft_model if model is already peft (#857) * merge conflict * add xou decorator * resolve * resolves * upstream * refactor and precommit * fix new tests * add device mapping for xpu --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Adam Pauls <adpauls@gmail.com> Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>
* Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com>
* enable xpu support * fix bug * review commits * fix style * add xou decorator * refactor review commit * fix test * review commit * fix test * Update benchmark.yml (huggingface#856) * Standardise example scripts (huggingface#842) * Standardise example scripts * fix plotting script * Rename run_xxx to xxx * Fix doc --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> * Fix version check in import_utils.py (huggingface#853) * dont use get_peft_model if model is already peft (huggingface#857) * merge conflict * add xou decorator * resolve * resolves * upstream * refactor and precommit * fix new tests * add device mapping for xpu --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Adam Pauls <adpauls@gmail.com> Co-authored-by: abhishek thakur <1183441+abhishekkrthakur@users.noreply.github.com>
This PR standardises all the example scripts to follow the
run_xxx.py
convention, wherexxx
typically refers to the algorithm instead of the task (i.e. have just 1 PPO example instead of calling it "sentiment tuning"). The resulting structure is as follows:IMO this makes it a bit easier for newcomers to know what each script does by filename instead of guessing whether e.g. multi adapter RL refers to PPO or something else.
I also deleted an old and duplicate multi adapter RL script
multi_adapter_rl.py
which seems to be outdated.Eventually, we could harmonize the scripts so that the SFT and reward models produced by
run_sft.py
andrun_reward_modeling.py
are the same ones that feed intorun_ppo.py
andrun_dpo.py
. This would give a true end to end pipeline that is maintained & solid for many people to work from :)