Check tokenize params on DPOTrainer #1197

pablovicente · 2024-01-09T01:38:32Z

Given that we tokenize the dataset on the DPOTrainer.__init__ regardless of having a custom data_collator, we need to ensure that a tokenizer is passed and warm about the need to set max_length, max_prompt_length and max_target_length . Hence, moving those checks outside of the data_collator check.

On older implementations these parameters were only needed if the user did not provide a data_collator but they are now required for all DPO runs.

@kashif

HuggingFaceDocBuilderDev · 2024-01-09T12:19:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Check if tokenizer and max len params are None * Update warning messages for missing parameters

pablovicente added 2 commits January 8, 2024 20:25

Check if tokenizer and max len params are None

f0db54b

Update warning messages for missing parameters

693ad40

younesbelkada requested a review from kashif January 9, 2024 05:50

kashif approved these changes Jan 9, 2024

View reviewed changes

lvwerra approved these changes Jan 9, 2024

View reviewed changes

lvwerra merged commit 26da9e8 into huggingface:main Jan 9, 2024
9 checks passed

lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024

Check tokenize params on DPOTrainer (huggingface#1197)

36d47cc

* Check if tokenizer and max len params are None * Update warning messages for missing parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check tokenize params on DPOTrainer #1197

Check tokenize params on DPOTrainer #1197

pablovicente commented Jan 9, 2024

HuggingFaceDocBuilderDev commented Jan 9, 2024

Check tokenize params on DPOTrainer #1197

Check tokenize params on DPOTrainer #1197

Conversation

pablovicente commented Jan 9, 2024

HuggingFaceDocBuilderDev commented Jan 9, 2024