Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* WIP skeleton * minimal working poc * cleanup * rename variables * quick typo fix * add v1 masking (huggingface#429) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by: leandro <leandro.vonwerra@spoud.io> * Add masking (huggingface#461) * add v1 masking * working v1 * adapt from suggestion * avoid warning `Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.` * fix masking - mask the responses from API call only * quality * address comments * Update trl/environment/base.py Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * adapt a bit * wip on tokenization/masking in textenv * small fixes * update viz * add example * print debug text and pass masks * style * format and move tensor to device * update example * update example * This seems to work * fix masking * fix rich output to console * fix batched generation * improve stopping criteria * improve error handling in tool call --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Costa Huang <costa.huang@outlook.com> * fix uknown tool * fix rewards and increase bs * remove unused script * ugly WIP fix * do not return modified obj for in-place operations * do not return modified obj for in-place operations * clean up stopping criterium * push updates * push update * format, add docs * rename file * add kwargs to reward fn * simplify example * simplify example * bug fix * add a trivia example * pre-commit * max tool response length * fix regex for multi-line * refactor tool exceptions * fix exceptions in tool * add docs * fix style * make rich optional * add docstrings * add tests * add TextEnv tests (WIP) * update triviaqa code * update docs * refactor text env * update tests (WIP) * add end2end test * update docs * upload tool demo * refactor * customizable system prompt * add text env docs * update index and toc * fix `TextHistory` show methods * add max length * fix style * fix typo * refactor to kwargs in init and tasks to queries * kwargs for reward docs * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/tool_demo.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/learning_tools.mdx Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/text_environments.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update examples/triviaqa.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * move to tool folder * remove assets * remove tool demo * move rich import test to import utils * add copyright * fixes for masks in ppo trainer * add text env api docs * make precommit + add ppo test with mask * move examples and add python * fix style * update triviaqa example * add more docs * update docs * Update docs/source/learning_tools.mdx * Apply suggestions from code review * precommit --------- Co-authored-by: Costa Huang <costa.huang@outlook.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: leandro von werra <leandro@hf.co>
- Loading branch information