Fix handling of f_divergence_type in DPO #4171

albertvillanova · 2025-09-30T05:44:18Z

Fix handling f_divergence_type in DPO.

This PR improves the handling of the f_divergence_type configuration in the DPO trainer by consistently using the FDivergenceType enum internally, while allowing flexibility in input types. It also adds new tests to ensure correct normalization and serialization of the f_divergence_type field.

Note that currently:

The expected type of the provided f_divergence_type in DPOConfig is a str, since the trainer logic compares its value against the string representations (.value) of the FDivergenceType enum:

trl/trl/trainer/dpo_trainer.py

Line 1068 in 864e593

if self.f_divergence_type == FDivergenceType.ALPHA_DIVERGENCE.value:

trl/trl/trainer/dpo_trainer.py

Line 1090 in 864e593

if self.f_divergence_type == FDivergenceType.JS_DIVERGENCE.value:
However, if no value is explicitly passed, the config defaults to FDivergenceType.REVERSE_KL, of FDivergenceType enum type:

trl/trl/trainer/dpo_config.py

Lines 399 to 400 in 864e593

f_divergence_type: FDivergenceType = field(

default=FDivergenceType.REVERSE_KL,

This inconsistency can lead to confusion and potential type mismatches during usage. The proposed changes aim to standardize the handling of f_divergence_type, ensuring consistent type normalization and comparison throughout the codebase.

Follow-up to:

Add missing FDivergenceType docstring #4165

Changes

Config normalization and input handling:

Updated the f_divergence_type field in DPOConfig to accept both FDivergenceType enum members and strings, improving flexibility for users. (trl/trainer/dpo_config.py)
Added normalization in the __post_init__ method of DPOConfig to always convert f_divergence_type to an FDivergenceType enum member, ensuring consistent internal usage. (trl/trainer/dpo_config.py)

Loss function logic update:

Modified checks in the dpo_loss function to compare f_divergence_type directly with enum members instead of their string values, leveraging the normalized config.

Testing improvements:

Added a new test class TestDPOConfig to verify normalization and serialization of f_divergence_type, including parameterized tests for both enum and string inputs. (tests/test_dpo_trainer.py)

HuggingFaceDocBuilderDev · 2025-09-30T05:48:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-09-30T17:32:25Z

I was worried that it wouldn't work with the cli, but it seems to work:

trl dpo   --model_name_or_path Qwen/Qwen2.5-0.5B   --dataset_name anthropic/hh-rlhf   --f_divergence_type reverse_kl

tests/test_dpo_trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

albertvillanova added 3 commits September 29, 2025 20:48

Test DPOConfig f_divergence_type

dbf030b

Use FDivergenceType Enum internally

d9cbb0b

Align trainer code

b999f88

albertvillanova added 2 commits September 30, 2025 07:58

Fix style

7edfaf8

Update test

8fe208c

qgallouedec reviewed Sep 30, 2025

View reviewed changes

tests/test_dpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 3 commits September 30, 2025 17:51

fix docstring

a678aca

Use unittest instead of pytest

9afafa6

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

Fix style

03bb27b

kashif approved these changes Oct 1, 2025

View reviewed changes

albertvillanova merged commit 5a4021f into huggingface:main Oct 1, 2025
9 of 10 checks passed

albertvillanova mentioned this pull request Oct 1, 2025

Replace unittest with pytest #4188

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix handling of f_divergence_type in DPO #4171

Fix handling of f_divergence_type in DPO #4171

albertvillanova commented Sep 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

qgallouedec commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	f_divergence_type: FDivergenceType = field(
	default=FDivergenceType.REVERSE_KL,

Fix handling of f_divergence_type in DPO #4171

Fix handling of f_divergence_type in DPO #4171

Conversation

albertvillanova commented Sep 30, 2025

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

qgallouedec commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants