-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextEnvironments #424
TextEnvironments #424
Conversation
The documentation is not available anymore as the PR was closed or merged. |
* add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: leandro <leandro.vonwerra@spoud.io>
* add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console * fix batched generation * improve stopping criteria * improve error handling in tool call --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Costa Huang <costa.huang@outlook.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Thanks @younesbelkada for the comments, I have addressed most of them. @lvwerra the PR looks good. Quick question: do we want to merge |
I thought we could have Also @vwxyzjn would you mind fixing the scripts so they pass the quality checks. In my opinion we can also exclude them from the quality tests, no strong opinion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great ! Thanks for all your efforts, left 4 nits, otherwise LGTM !
* WIP skeleton * minimal working poc * cleanup * rename variables * quick typo fix * add v1 masking (huggingface#429) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: leandro <leandro.vonwerra@spoud.io> * Add masking (huggingface#461) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console * fix batched generation * improve stopping criteria * improve error handling in tool call --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> * fix uknown tool * fix rewards and increase bs * remove unused script * ugly WIP fix * do not return modified obj for in-place operations * do not return modified obj for in-place operations * clean up stopping criterium * push updates * push update * format, add docs * rename file * add kwargs to reward fn * simplify example * simplify example * bug fix * add a trivia example * pre-commit * max tool response length * fix regex for multi-line * refactor tool exceptions * fix exceptions in tool * add docs * fix style * make rich optional * add docstrings * add tests * add TextEnv tests (WIP) * update triviaqa code * update docs * refactor text env * update tests (WIP) * add end2end test * update docs * upload tool demo * refactor * customizable system prompt * add text env docs * update index and toc * fix `TextHistory` show methods * add max length * fix style * fix typo * refactor to kwargs in init and tasks to queries * kwargs for reward docs * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/tool_demo.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/text_environments.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * move to tool folder * remove assets * remove tool demo * move rich import test to import utils * add copyright * fixes for masks in ppo trainer * add text env api docs * make precommit + add ppo test with mask * move examples and add python * fix style * update triviaqa example * add more docs * update docs * Update docs/source/learning_tools.mdx * Apply suggestions from code review * precommit --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: leandro von werra <leandro@hf.co>
* WIP skeleton * minimal working poc * cleanup * rename variables * quick typo fix * add v1 masking (huggingface#429) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: leandro <leandro.vonwerra@spoud.io> * Add masking (huggingface#461) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console * fix batched generation * improve stopping criteria * improve error handling in tool call --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> * fix uknown tool * fix rewards and increase bs * remove unused script * ugly WIP fix * do not return modified obj for in-place operations * do not return modified obj for in-place operations * clean up stopping criterium * push updates * push update * format, add docs * rename file * add kwargs to reward fn * simplify example * simplify example * bug fix * add a trivia example * pre-commit * max tool response length * fix regex for multi-line * refactor tool exceptions * fix exceptions in tool * add docs * fix style * make rich optional * add docstrings * add tests * add TextEnv tests (WIP) * update triviaqa code * update docs * refactor text env * update tests (WIP) * add end2end test * update docs * upload tool demo * refactor * customizable system prompt * add text env docs * update index and toc * fix `TextHistory` show methods * add max length * fix style * fix typo * refactor to kwargs in init and tasks to queries * kwargs for reward docs * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/tool_demo.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/text_environments.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * move to tool folder * remove assets * remove tool demo * move rich import test to import utils * add copyright * fixes for masks in ppo trainer * add text env api docs * make precommit + add ppo test with mask * move examples and add python * fix style * update triviaqa example * add more docs * update docs * Update docs/source/learning_tools.mdx * Apply suggestions from code review * precommit --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: leandro von werra <leandro@hf.co>
This PR adds multi-turn text environment to TRL.
Target API
Todos
PPOTrainer
after fwd pass | add v1 masking #429Current working example
Result:
Where the
88236.0<response>
segement was generated by the tool call.