generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
🏋 GRPORelated to GRPORelated to GRPO🐛 bugSomething isn't workingSomething isn't working📚 documentationImprovements or additions to documentationImprovements or additions to documentation
Description
Reproduction
The code example in doc for "GRPO with replay buffer" is kind of buggy.
- It imports
GRPOWithReplayBufferTrainerbut never used. - It uses
GRPOWithReplayBufferConfigbut never imported - The code is apparently not executable.
Below is the code example given in the doc:
from trl.experimental.grpo_with_replay_buffer import GRPOWithReplayBufferTrainer
from datasets import load_dataset
dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")
# Guarantee that some rewards have 0 std
def custom_reward_func(completions, **kwargs):
if torch.rand(1).item() < 0.25:
return [0] * len(completions) # simulate some None rewards
else:
return torch.rand(len(completions)).tolist()
training_args = GRPOWithReplayBufferConfig(
output_dir=self.tmp_dir,
learning_rate=1e-4,
per_device_train_batch_size=4,
num_generations=4,
max_completion_length=8,
replay_buffer_size=8,
report_to="none",
)
trainer = GRPOTrainer(
model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
reward_funcs=[custom_reward_func],
args=training_args,
train_dataset=dataset,
)
previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}
trainer.train()System Info
NA
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete
Metadata
Metadata
Assignees
Labels
🏋 GRPORelated to GRPORelated to GRPO🐛 bugSomething isn't workingSomething isn't working📚 documentationImprovements or additions to documentationImprovements or additions to documentation