Skip to content

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented Oct 20, 2025

What does this PR do?

Simplified version of #3469 which exposes a rollout_func to enable users to define custom logic for tool calling, environments like those from OpenEnv etc. Very much WIP :)

Example scripts included in examples/scripts/openenv

TODO

  • Propagate reward from envs in rollout_func to _calculate_rewards
  • Streamline rollout_func signature
  • Find a better way to handle text environments in rollout_func because /generate endpoint only returns prompt_ids and completion_ids
  • Mask tool calls for multi-step

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

)
# Extract required fields and collect any extra fields for reward functions
required_keys = {"prompt_ids", "completion_ids", "logprobs"}
extra_fields = {k: v for k, v in output.items() if k not in required_keys}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how I propagate the reward from the environment in rollout_func to the reward functions. I could have done it separately for the rollout_func branch, but then I would have to duplicate all the broadcasting logic

completion_ids = all_completion_ids
logprobs = all_logprobs

extra_fields = {} # No extra fields for colocate mode
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I just focus on server mode. I can extend to the other modes later

Comment on lines 1132 to 1138
if self.rollout_func is not None:
output = self.rollout_func(
prompts=ordered_set_of_prompts,
images=ordered_set_of_images,
args=self.args,
processing_class=self.processing_class,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollout_func then requires use_vllm=True and vllm_mode="server", right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +219 to +227
def reward_from_env(completions, **kwargs):
"""Reward function that uses the environment reward from the catch game."""
# Extract environment rewards from kwargs (propagated via extra_fields)
env_rewards = kwargs.get("env_reward", [])
if env_rewards:
return [float(reward) for reward in env_rewards]
else:
# Fallback if env_reward is not available
return [0.0] * len(completions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, we shouldn't have this. We could

  • allow reward_func=None in GRPOTrainer.__init__
  • check if the rollout function returns a "reward" key, and if so, we use this as a column in rewards_per_func (I might be missing something here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this for follow up refactoring.

@lewtun lewtun marked this pull request as ready for review October 22, 2025 21:45
@lewtun lewtun changed the title [WIP] Add rollout function for multi-step RL 🕹️ Add rollout function for OpenEnv integration Oct 22, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 2819a8f into main Oct 23, 2025
10 of 12 checks passed
@qgallouedec qgallouedec deleted the add-rollouts branch October 23, 2025 07:36
qgallouedec added a commit that referenced this pull request Oct 30, 2025
commit 9925469
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Wed Oct 29 22:09:47 2025 +0000

    Support chat_template_kwargs (#4350)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 4e9ab9f
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Wed Oct 29 18:20:15 2025 +0100

    👑 [experimental] GOLD Trainer (#4349)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit b82a8f4
Author: Kamran Bigdely <kamran@rapidfire.ai>
Date:   Wed Oct 29 10:16:22 2025 -0700

    🔥 docs: Add RapidFire AI integration guide (#4340)

    Co-authored-by: kamran bigdely <kamranbigdely@gmail.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 29fb69f
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 17:45:06 2025 +0100

    Align make test_experimental with make test (#4371)

commit ac6cea8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 17:25:16 2025 +0100

    Fix add_generation_prompt arg for paged transformers in GRPO and RLOO trainers (#4370)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 1e39eb6
Author: Taha Yassine <40228615+taha-yassine@users.noreply.github.com>
Date:   Wed Oct 29 16:59:49 2025 +0100

    Add support for Trackio completions logging in GRPOTrainer (#4359)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 97830a3
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 11:13:54 2025 +0100

    Replace deprecated list with tuple indexing in PPOTrainer (#4356)

commit d275418
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 11:13:33 2025 +0100

    Remove ignored max_length parameter from PRMTrainer data collator (#4355)

commit 61bf96c
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 11:13:04 2025 +0100

    Move tests of BCO trainer args to tests/experimental (#4354)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit b8f23ef
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 08:00:50 2025 +0100

    Replace deprecated AutoModelForVision2Seq with AutoModelForImageTextToText (#4353)

commit f8073cb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 07:53:13 2025 +0100

    Implement CI test workflow for experimental module (#4330)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 55854c8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 07:42:46 2025 +0100

    Move tests of experimental GRPO with replay buffer to tests/experimental (#4329)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 4352074
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 29 07:41:43 2025 +0100

    Use explicit tiny-Qwen2_5_VL model_id parameter in CI tests (#4325)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 928f589
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Oct 28 18:12:24 2025 -0600

    Fix: `add_generation_prompt=True` for conversational only (#4362)

commit b0889d2
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Oct 28 18:00:27 2025 -0600

    Add `add_generation_prompt` to processor_kwargs in GRPO and RLOO trainer (#4361)

commit a9d33d0
Author: kaixuanliu <kaixuan.liu@intel.com>
Date:   Wed Oct 29 05:13:59 2025 +0800

    fix CI issue for vlm_gemma_3n model (#4278)

    Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 34fdb61
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Tue Oct 28 20:51:22 2025 +0100

    Fixed links inside Tips in docs (#4360)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit a23e91c
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date:   Tue Oct 28 19:48:42 2025 +0000

    Add missing license in `tests/experimental/__init__.py`

commit 5e691d1
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 27 22:01:31 2025 +0100

    Fix GRPO and RLOO trainers for continuous batching (#4348)

commit fa644b1
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Mon Oct 27 14:01:34 2025 +0100

    [vllm] update comment about communication group host ip (#4337)

commit fda88c6
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 27 10:29:24 2025 +0100

    Added custom `prepare_model_for_kbit_training` to save VRAM (#4335)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit 2a138c7
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 27 10:26:09 2025 +0100

    Update Reducing Memory Consumption guide with more details (#4332)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit 05a1feb
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Fri Oct 24 11:48:30 2025 -0700

    🗞️ Update "What's New" (#4338)

commit d8543c0
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Fri Oct 24 11:27:25 2025 +0200

    Add OpenEnv blog to landing (#4333)

commit 23c0062
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 24 09:48:37 2025 +0200

    Hotfix: Fall back to config.text_config._name_or_path if missing config._name_or_path (#4324)

commit 47b1aa7
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Oct 23 12:04:46 2025 +0200

    Move BCO tests to tests/experimental (#4326)

commit a4872d9
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Oct 23 11:42:13 2025 +0200

    Update OpenEnv docs (#4328)

commit 3f66564
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Thu Oct 23 10:45:23 2025 +0200

    Highlight OpenEnv in landing docs (#4327)

commit 9b80e33
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date:   Thu Oct 23 07:45:54 2025 +0000

    Update documentation openenv

commit 2819a8f
Author: lewtun <lewis.c.tunstall@gmail.com>
Date:   Thu Oct 23 09:36:35 2025 +0200

    🕹️ Add rollout function for OpenEnv integration (#4310)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit e1c87e3
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 22 18:21:44 2025 +0200

    Fix attn_implementation name in OnlineDPO for transformers v5 (#4322)

commit 7c547a3
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Wed Oct 22 09:16:25 2025 +0200

    Add notebooks to Examples docs and restructure (#4317)

    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit bfd6f49
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 22 08:43:31 2025 +0200

    Replace unittest skipTest from transformers with pytest.skip (#4297)

commit 712f6a9
Author: Hsiang-Yu Tsou <rjun0729@gmail.com>
Date:   Wed Oct 22 12:04:13 2025 +0800

    💤 Switch to sleep level=2 and split wake-ups in GRPO and RLOO trainers (#4296)

commit 1382e56
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Oct 21 15:41:42 2025 -0700

    🧺 [5/N] Refactor `_generate` in GRPO/RLOO: Insert images in the prompt (#4155)

commit cb9bc2a
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Oct 21 12:51:48 2025 -0700

    🚚 Move BCO to `trl.experimental` (#4312)

commit 475c732
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Tue Oct 21 17:17:07 2025 +0200

    Update notebooks README with latest additions (#4316)

commit 0dc4d53
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Oct 21 15:54:59 2025 +0200

    Remove parameterized as test extra dependency (#4315)

commit e2ab435
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Tue Oct 21 12:34:18 2025 +0200

    [Activation-checkpointing] add tensor dedup and param offloading (#4247)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

commit 46a53cd
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Oct 21 10:23:00 2025 +0200

    Filter expected setup_chat_format deprecation warning in CI (#4306)

commit 6105040
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Oct 21 10:22:42 2025 +0200

    Silence TRL experimental warnings in CI (#4307)

commit 5eae44a
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Mon Oct 20 13:27:21 2025 -0600

    ⚰️ Remove deprecated (#4301)

commit 28bba8c
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Mon Oct 20 11:24:54 2025 +0200

    Added SFT LoRA notebook (#4244)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

commit 2f1802b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 20 08:03:48 2025 +0200

    Fix missing CI slow tests: ImportError: vLLM is not installed (#4304)

commit e0eec05
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Fri Oct 17 15:36:13 2025 -0600

    🧺 [4/N] Refactor `_generate` in GRPO/RLOO: Move `forward_kwargs` outside generation method (#4154)

    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
    Co-authored-by: YonatanGideoni <yonatan.gideoni@gmail.com>
    Co-authored-by: burtenshaw <ben.burtenshaw@gmail.com>
    Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
    Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit f4c554d
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Fri Oct 17 16:06:40 2025 +0200

    Update links to docs in README to latest packaged version (#4084)

commit a932e27
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Oct 15 18:11:52 2025 -0600

    ⬆️ Bump dev version (#4293)

commit 04fd120
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Oct 15 18:10:10 2025 -0600

    Release: v0.24 (#4292)

commit 19d2f97
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Oct 15 18:06:34 2025 -0600

    Deprecate `BestOfNSampler` (#4291)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
    Co-authored-by: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>

commit 31caf64
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Wed Oct 15 17:01:50 2025 -0700

    Remove unused commands directory (#4258)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>

commit 8e2d551
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Thu Oct 16 01:01:07 2025 +0100

    Add accuracy reward (#4270)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 94aac4a
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Wed Oct 15 16:49:04 2025 -0700

    Remove how_to_train.md: outdated training FAQ (#4267)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>

commit 26b7c25
Author: Alexander Weers <mail@aweers.de>
Date:   Thu Oct 16 01:33:35 2025 +0200

    Add support for `token_type_ids` in `DPOTrainer` (#4285)

    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit aa25c26
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Wed Oct 15 14:13:27 2025 -0700

    Remove using_llama_models.md: outdated Llama2-specific documentation (#4268)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>

commit 93c7d88
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Wed Oct 15 14:12:32 2025 -0700

    Remove logging.md: trainer-specific metrics documentation (#4269)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>

commit c7c041e
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 15 18:15:36 2025 +0200

    Fix CI slow tests: ImportError: vLLM is not installed (#4287)

commit ef40c04
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 15 18:15:28 2025 +0200

    Replace unittest skipTest with pytest.skip (#4263)

commit 7e0adbc
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 15 18:14:49 2025 +0200

    Fix CI dev test TypeError: unexpected keyword argument 'load_in_4bit' (#4262)

commit 773afd9
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Oct 15 09:39:17 2025 -0600

    💰 `RichProgressCallback` enhancement (#4245)

commit 966b397
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 15 16:11:11 2025 +0200

    Fix CI slow test OSError: You are trying to access a gated repo (#4283)

commit 927cf6b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 15 10:39:12 2025 +0200

    Fix docstrings with Sphinx 'deprecated' directive (#4279)

commit 56cb6cc
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Tue Oct 14 18:51:17 2025 +0200

    Fix typo in Colab link (#4276)

commit 49c8f14
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Tue Oct 14 18:45:01 2025 +0200

    Add Qwen3-VL notebooks (SFT, GRPO) (#4275)

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit cefbacb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Oct 14 12:13:15 2025 +0200

    Fix style with make precommit (#4265)

commit fae245a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue Oct 14 12:12:03 2025 +0200

    Use FutureWarning instead of DeprecationWarning (#4266)

commit 2aa9506
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 13 13:40:24 2025 +0200

    Fix docstring interlinks (#4221)

commit d6eeb29
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Mon Oct 13 11:06:09 2025 +0200

    Raise deprecation warning for Python 3.9 (#4226)

commit 1684ef2
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 10 17:41:24 2025 +0200

    Fix Python version check for skipping tests on Python 3.13.8 (#4246)

commit aab21eb
Author: Carlos Miguel Patiño <carlos.patino@huggingface.co>
Date:   Fri Oct 10 17:39:29 2025 +0200

    Include `chat_template_kwargs` in `apply_chat_template` (#4233)

    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit b997a31
Author: Kashif Rasul <kashif.rasul@gmail.com>
Date:   Fri Oct 10 17:21:01 2025 +0200

    [Online-DPO] fix the completion_len == max_new_tokens crash (#4193)

    Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
    Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

commit 86d1963
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 10 17:19:53 2025 +0200

    Fix CI slow test AttributeError: 'TestSFTTrainerSlow' object has no attribute 'addCleanup' (#4255)

commit 039d526
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Fri Oct 10 08:16:18 2025 -0700

    Deprecate unused dataset_formatting module (#4242)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit bcd059a
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Fri Oct 10 08:15:47 2025 -0700

    Remove obsolete research_projects directory (#4243)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>
    Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 0e57b4a
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Fri Oct 10 10:02:11 2025 -0500

    🧺 [3/N] Refactor `_generate` in GRPO/RLOO: Rely on generator for prompt truncation (#4153)

commit 98488e0
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 10 16:37:02 2025 +0200

    Fix CI slow test ValueError: Unknown loss type: dapo (#4254)

commit f45e865
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 10 16:13:22 2025 +0200

    Fix CI ImportError for 'require_torch_gpu_if_bnb_not_multi_backend_enabled' (#4253)

commit f582792
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 10 16:12:15 2025 +0200

    Install peft from main for CI tests with dev dependencies (#4250)

commit f853e09
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Fri Oct 10 09:49:45 2025 +0200

    Fix CI CUDA out of memory errors by improving GPU memory management (#4238)

commit 803ec0d
Author: Wang, Yi <yi.a.wang@intel.com>
Date:   Fri Oct 10 15:28:34 2025 +0800

    Fix CI slow test ValueError: Backward pass should have cleared tracker of all tensors (#4236)

    Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
    Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

commit 7a0a615
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Thu Oct 9 17:05:36 2025 -0600

    Warnings pointing to RFC (#4224)

commit c38cb69
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Thu Oct 9 12:49:44 2025 -0600

    🧘 Enhance markdown style (#4235)

commit 68ef15c
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date:   Thu Oct 9 09:18:48 2025 -0700

    Remove unused log_example_reports.py script (#4241)

    Co-authored-by: behroozazarkhalili <ermiaazarkhalili>

commit 3dd7fc2
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Oct 9 15:46:41 2025 +0200

    Fix CI IndentationError for Python 3.13.8 (#4240)

commit 51ced65
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Oct 9 09:35:08 2025 +0200

    Replace setup with pyproject in CI tests paths (#4230)

commit 4bb883a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Thu Oct 9 08:09:15 2025 +0200

    Update CI Docker image to pytorch/pytorch:2.8.0 (#4232)

commit f784632
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 8 21:30:54 2025 +0200

    Remove unused Path import in __init__.py (#4227)

commit a944890
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 8 21:21:21 2025 +0200

    Fix callable annotations (#4216)

commit 521db35
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 8 21:18:41 2025 +0200

    Fix CI unittest asserts (#4234)

commit e2c97a8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Wed Oct 8 18:14:23 2025 +0200

    Exclude vllm dependencies from dev extra (#4229)

commit d1d0407
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Wed Oct 8 09:34:48 2025 -0600

    🏷️ Account for `token_type_ids` in `DataCollatorForVisionLanguageModeling` (#4190)

commit 824ff8c
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date:   Wed Oct 8 12:59:04 2025 +0200

    Add Efficient Online Training with GRPO and vLLM in TRL to community tutorials (#4219)

commit f15399d
Author: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Date:   Wed Oct 8 09:42:19 2025 +0100

    Fix entropy and accuracy calculation for prompt_tuning techniques. (#4196)

commit cc578b6
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Oct 7 12:11:34 2025 -0600

    🧺 [2/N] Refactor `_generate` in GRPO/RLOO: Use `prompt_ids` from generation (#4152)

commit 30cf68a
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date:   Tue Oct 7 10:21:10 2025 -0600

    🎨 Support mixing image+text and text-only examples (#4203)

    Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants