Refactor RewardTrainer hyperparameters into dedicated dataclass #726

lewtun · 2023-09-01T12:47:52Z

This PR migrates the max_length arg of the RewardTrainer into a dedicated RewardTrainingArguments class that can also be used for storing future hyperparameters.

To be backwards compatible, I've left the variable in the trainer's init, with a warning that this will be removed in some future version.

Tested with:

accelerate launch --multi_gpu --num_processes 2 examples/scripts/reward_trainer.py --batch_size 1

lewtun · 2023-09-01T12:48:36Z

trl/trainer/reward_trainer.py

@@ -94,18 +95,21 @@ def __init__(
                The optimizer and scheduler to use for training.
            preprocess_logits_for_metrics (`Callable[[torch.Tensor, torch.Tensor], torch.Tensor]`):
                The function to use to preprocess the logits before computing the metrics.
-            max_length (`int`, defaults to `None`):


I followed the transformers convention to remove deprecated args from public doc strings

lewtun · 2023-09-01T13:01:28Z

trl/trainer/reward_trainer.py

            peft_config (`Dict`, defaults to `None`):
                The PEFT configuration to use for training. If you pass a PEFT configuration, the model will be wrapped in a PEFT model.
        """
+        if max_length is not None:
+            warnings.warn(
+                "The `max_length` argument is deprecated and will be removed in a future version. Please use the `RewardTrainingArguments` to set `max_length` instead.",


Should we specify a precise version, e.g. 0.9.0?

No let's just add a warning like it is :)

HuggingFaceDocBuilderDev · 2023-09-01T13:05:22Z

The documentation is not available anymore as the PR was closed or merged.

lewtun · 2023-09-01T13:34:12Z

trl/trainer/training_args.py

+
+
+@dataclass
+class RewardTrainingArguments(TrainingArguments):


Another name for this could be RewardConfig to be more aligned with PPOConfig

No strong opinion, whatever you prefer.

I opted for RewardConfig in e6e346d

I've also placed this in a training_configs.py module - let me know if you'd prefer a dedicated module per config (currently PPOConfig lives in ppo_config.py)

lewtun · 2023-09-04T07:00:18Z

I think this is good to go, so gently pinging @lvwerra @younesbelkada for a final review 🙏

younesbelkada

Thanks a lot @lewtun ! This looks great to me, I just have one suggestion about the name of the class, what RewardTrainingArguments ? I am also happy with the current naming if @lvwerra agrees

lvwerra · 2023-09-04T10:39:36Z

trl/trainer/reward_trainer.py

-                    "When using RewardDataCollatorWithPadding, you should set `max_length` in the RewardTrainer's init"
-                    " it will be set to `512` by default, but you should do it yourself in the future.",
+                    "When using RewardDataCollatorWithPadding, you should set `max_length` in RewardConfig."
+                    " It will be set to `512` by default, but you should do it yourself in the future.",
                    UserWarning,
                )
                max_length = 512
+            elif args.max_length is None:
+                warnings.warn(
+                    "When using RewardDataCollatorWithPadding, you should set `max_length` in RewardConfig."
+                    " It will be set to `512` by default, but you should do it yourself in the future.",
+                    UserWarning,
+                )
+                max_length = 512
+            else:
+                max_length = args.max_length


I think in the case where max_length is not None then it is still overwritten when args.max_length is None, right? I think in that case we should keep the original value to be backwards compatible.

Oh yes, good catch - I'll fix that

Fixed in 5b67e0d

lewtun · 2023-09-04T10:59:22Z

Thanks a lot @lewtun ! This looks great to me, I just have one suggestion about the name of the class, what RewardTrainingArguments ? I am also happy with the current naming if @lvwerra agrees

Are you suggesting we use RewardTrainingArguments instead of RewardConfig? I originally had the former, but switched to the latter to be aligned with PPOConfig. I don't have a strong opinion, but whatever choice we make should stay consistent for all the other trainers, e.g. SFTTrainingArguments vs SFTConfig etc.

One advantage of XConfig classes is that it's less keystrokes :D

lvwerra

LGTM! 🚀

younesbelkada

Thanks @lewtun , makes sense, let's merge it! 🚀

…ingface#726) * Refactor RewardTrainer hyperparameters into dedicated dataclass * Revert * Add doc string * Fix warning * Handle backwards compat * Fix tests * Add docs * Refactor to RewardConfig * Fix case conditions * Fix

Refactor RewardTrainer hyperparameters into dedicated dataclass

fa80a07

lewtun commented Sep 1, 2023

View reviewed changes

lewtun added 4 commits September 1, 2023 12:49

Revert

1f2e1a6

Add doc string

c7589cd

Fix warning

1a4668d

Handle backwards compat

406a1e8

lewtun commented Sep 1, 2023

View reviewed changes

lewtun added 2 commits September 1, 2023 13:30

Fix tests

29ae019

Add docs

cd2e2ab

lewtun commented Sep 1, 2023

View reviewed changes

lewtun mentioned this pull request Sep 1, 2023

Enable gradient checkpointing to be disabled for reward modelling #725

Merged

Refactor to RewardConfig

e6e346d

mnoukhov mentioned this pull request Sep 2, 2023

update to prepare_model_for_kbit_training #728

Merged

younesbelkada approved these changes Sep 4, 2023

View reviewed changes

lvwerra reviewed Sep 4, 2023

View reviewed changes

lewtun added 2 commits September 4, 2023 12:38

Fix case conditions

5b67e0d

Fix

f47b4df

lvwerra approved these changes Sep 4, 2023

View reviewed changes

younesbelkada approved these changes Sep 4, 2023

View reviewed changes

lewtun merged commit d484dc2 into main Sep 5, 2023

lewtun deleted the wrap-rm-args branch September 5, 2023 07:05

lewtun mentioned this pull request Sep 8, 2023

Ensure RewardConfig is backwards compatible #748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor RewardTrainer hyperparameters into dedicated dataclass #726

Refactor RewardTrainer hyperparameters into dedicated dataclass #726

lewtun commented Sep 1, 2023 •

edited

Loading

lewtun Sep 1, 2023

lewtun Sep 1, 2023

lvwerra Sep 1, 2023

HuggingFaceDocBuilderDev commented Sep 1, 2023 •

edited

Loading

lewtun Sep 1, 2023

lvwerra Sep 1, 2023

lewtun Sep 1, 2023

lewtun commented Sep 4, 2023

younesbelkada left a comment

lvwerra Sep 4, 2023

lewtun Sep 4, 2023

lewtun Sep 4, 2023

lewtun commented Sep 4, 2023

lvwerra left a comment

younesbelkada left a comment

Refactor RewardTrainer hyperparameters into dedicated dataclass #726

Refactor RewardTrainer hyperparameters into dedicated dataclass #726

Conversation

lewtun commented Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewtun commented Sep 4, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewtun commented Sep 4, 2023

lvwerra left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

lewtun commented Sep 1, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 1, 2023 •

edited

Loading