Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

DDPO checkpoint ú· 🐛 bug Something isn't working 🏋 DPPO Related to DDPO 🙋 help from community wanted Open invitation for community members to contribute ⏳ needs more info Additional information or clarification is required to proceed
#2505 opened Dec 20, 2024 by nguyenhoa-uit
5 of 9 tasks
Spectrum training support ✨ enhancement New feature or request 🏋 SFT Related to SFT
#2504 opened Dec 19, 2024 by ggbetz
[bug] objective/entropy < 0 when using rlootrainer and ppotrainer 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2496 opened Dec 17, 2024 by macheng6
[Tracking issue] Integrate native liger-kernel losses ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity
#2495 opened Dec 17, 2024 by qgallouedec
5 tasks
DeepSpeed with trl 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 DPO Related to DPO ⏳ needs more info Additional information or clarification is required to proceed
#2490 opened Dec 16, 2024 by sagie-dekel
7 of 9 tasks
RewardConfig's max_length argument docstring should indicate that it filters out dataset, rather than truncating it 📚 documentation Improvements or additions to documentation 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute 🏋 Reward Related to Reward modelling
#2488 opened Dec 16, 2024 by Kallinteris-Andreas
Trainer forces the use of a specific collator 🏋 GKD Related to GKD ❓ question Seeking clarification or more information
#2481 opened Dec 14, 2024 by hteague-qti
KeyError in DPO Trainer, evaluation_loop 🐛 bug Something isn't working 🏋 DPO Related to DPO
#2473 opened Dec 13, 2024 by qingjianbuyi
7 of 9 tasks
A question about rlootrainer 🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
#2472 opened Dec 13, 2024 by macheng6
1 of 3 tasks
Provide Descriptions (READMEs) for trl-lib/dataset 🗃️ data Related to data 📚 documentation Improvements or additions to documentation ✨ enhancement New feature or request 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute
#2470 opened Dec 13, 2024 by Kallinteris-Andreas
Packing in DPOTrainer 🏋 DPO Related to DPO ✨ enhancement New feature or request
#2469 opened Dec 13, 2024 by zhc7
DPOTrainer log metrics are not gathered and meaned across ranks 🐛 bug Something isn't working 🏋 DPO Related to DPO
#2468 opened Dec 13, 2024 by zhc7
Probably a more reasonable method of packing ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help from community wanted Open invitation for community members to contribute 🏋 SFT Related to SFT
#2466 opened Dec 12, 2024 by AIR-hl
Why isn't Soft-Actor Critic (SAC) Available for RLHF? ❓ question Seeking clarification or more information
#2465 opened Dec 11, 2024 by AMindToThink
3 tasks
Evaluation with OnlineDPO 🐛 bug Something isn't working 🏋 Online DPO Related to Online DPO
#2464 opened Dec 11, 2024 by MohamedAliRashad
7 of 9 tasks
Probaly mistake in DPOTrainer when compute/log grad_norm 🏋 DPO Related to DPO ❓ question Seeking clarification or more information
#2456 opened Dec 10, 2024 by AIR-hl
7 of 9 tasks
Out of Memory Error: DPO Trainer 🏋 DPO Related to DPO ❓ question Seeking clarification or more information
#2452 opened Dec 9, 2024 by gp-1108
7 of 9 tasks
Custom reward model for PPOTrainer 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO ❓ question Seeking clarification or more information
#2451 opened Dec 8, 2024 by hwhyyds
Loading a previous PPO checkpoint in middle of training 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO ❓ question Seeking clarification or more information
#2444 opened Dec 6, 2024 by kooryan
ProTip! Updated in the last three days: updated:>2024-12-18.