PEFT support for Online DPO #2041

qgallouedec · 2024-09-09T09:42:23Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This reverts commit 96ae02a.

This reverts commit 65990de.

HuggingFaceDocBuilderDev · 2024-09-09T09:48:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2024-09-09T19:09:21Z

accelerate launch examples/scripts/dpo_online.py
    --model_name_or_path trl-lib/pythia-1b-deduped-tldr-sft \
    --reward_model_path trl-lib/pythia-1b-deduped-tldr-rm \
    --dataset_name trl-lib/tldr \
    --output_dir pythia-1b-tldr-online-dpo \
    --learning_rate 5.0e-7 \
    --logging_first_step \
    --logging_steps 10 \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 8 \
    --warmup_ratio 0.1 \
    --missing_eos_penalty 1.0 \
    --push_to_hub \
    --dataset_num_proc 32 \
    --use_peft

PEFT stabilises training a lot!
Note that orange training suffer from overoptimization here

kashif · 2024-09-13T08:29:57Z

can you kindly add the above peft/lora usage command in the script?

trl/trainer/online_dpo_trainer.py

qgallouedec · 2024-09-13T08:51:14Z

Particular attention for reviewing this one. I didn't use the code from DPO mostly because I don't understand it. So I'm afraid to have missed something.

Why do you need to merge and unload?

trl/trl/trainer/dpo_trainer.py

Lines 533 to 535 in e51a5ac

    
           # if model is a peft model and we have a peft_config, we merge and unload it first 
        
           if isinstance(model, PeftModel): 
        
               model = model.merge_and_unload()

I don't support k-bit training yet

trl/trl/trainer/dpo_trainer.py

Lines 544 to 556 in e51a5ac

    
           if getattr(model, "is_loaded_in_8bit", False) or getattr(model, "is_loaded_in_4bit", False): 
        
               _support_gc_kwargs = hasattr( 
        
                   args, "gradient_checkpointing_kwargs" 
        
               ) and "gradient_checkpointing_kwargs" in list( 
        
                   inspect.signature(prepare_model_for_kbit_training).parameters 
        
               ) 
        
               prepare_model_kwargs = {"use_gradient_checkpointing": args.gradient_checkpointing} 
        
               if _support_gc_kwargs: 
        
                   prepare_model_kwargs["gradient_checkpointing_kwargs"] = args.gradient_checkpointing_kwargs 
        
               model = prepare_model_for_kbit_training(model, **prepare_model_kwargs)

Can we drop it for new trainer?

trl/trl/trainer/dpo_trainer.py

Lines 556 to 565 in e51a5ac

    
               model = prepare_model_for_kbit_training(model, **prepare_model_kwargs) 
        
           elif getattr(args, "gradient_checkpointing", False): 
        
               # For backward compatibility with older versions of transformers 
        
               if hasattr(model, "enable_input_require_grads"): 
        
                   model.enable_input_require_grads() 
        
               else: 
        
                   def make_inputs_require_grad(module, input, output): 
        
                       output.requires_grad_(True)

Why do we need to cast? And why only when loaded in 4-bits. Btw why always using the getattr? We know that this arg exists in TrainingArguments

trl/trl/trainer/dpo_trainer.py

Lines 570 to 574 in e51a5ac

    
           if args.bf16 and getattr(model, "is_loaded_in_4bit", False): 
        
               peft_module_casting_to_bf16(model) 
        
               # If args.bf16 we need to explicitly call `generate` with torch amp autocast context manager 
        
               self._peft_has_been_casted_to_bf16 = True

Overall I've choose to go for a code that I understand, even if it implies introducing bug that have been fixed in the past for other trainers.

lewtun

Thanks for adding PEFT support! Overall LGTM - have you run an experiment on e.g. TLDR to see if it looks OK?

Edit: sorry, just saw your comment!

examples/scripts/dpo_online.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

lewtun · 2024-09-13T12:13:51Z

Why do you need to merge and unload?

This is needed if we have e.g. an SFT LoRA that needs to be merged into the base model, before finally inserting an adapter for DPO. See here for some considerations.

I don't support k-bit training yet

Sure, let's leave this as follow-up PR

Can we drop it for new trainer?

Fine with me. We can add it later if people open an issue.

Why do we need to cast? And why only when loaded in 4-bits. Btw why always using the getattr? We know that this arg exists in TrainingArguments

Good question, this was added by Younes but maybe @BenjaminBossan can help clarify here why PEFT models need upcasting in 4-bit?

BenjaminBossan · 2024-09-13T13:10:29Z

7. Why do we need to cast? And why only when loaded in 4-bits. Btw why always using the getattr? We know that this arg exists in TrainingArguments

Good question, this was added by Younes but maybe @BenjaminBossan can help clarify here why PEFT models need upcasting in 4-bit?

The original addition of this function was from #1110 (which in turn references this repo). I don't know the exact context, but I think the reason is to avoid indiscriminately casting all layers to bf16 -- specifically, layer norm stays in float32. This is probably based on some empirical finding that this is better for training, but after skimming the QLoRA paper, I could not find any mention of that, so I'm unsure.

qgallouedec · 2024-09-13T14:32:05Z

qgallouedec · 2024-09-13T15:49:26Z

Thanks a lot @BenjaminBossan and @lewtun. I'll further investigate in a dedicated branch.

qgallouedec and others added 6 commits September 4, 2024 19:16

Promote PPOv2Trainer and PPOv2Config to top-level import

96ae02a

Deprecate PPOTrainer and PPOConfig

65990de

changes

74e5302

Revert "Promote PPOv2Trainer and PPOv2Config to top-level import"

3739e8d

This reverts commit 96ae02a.

Revert "Deprecate PPOTrainer and PPOConfig"

34dc5b8

This reverts commit 65990de.

Merge branch 'main' into kaist

4412f35

qgallouedec and others added 10 commits September 9, 2024 21:09

Merge branch 'main' into kaist

58b16db

peft

8ee7b0c

peft

cff0ddc

try to simplify

60a45fc

revert utils changes

72b1e89

Merge branch 'main' into kaist

7842c34

update dpo script

9f5b9f8

peft

7926092

style

cfcaa74

revert gitignore

12f909e

qgallouedec changed the title ~~Further support of Online DPO~~ PEFT support for Online DPO Sep 13, 2024

test_online_dpo_peft

f930112

qgallouedec mentioned this pull request Sep 13, 2024

Always allow ref_model=None #2047

Open

ref model

ff148b0

qgallouedec marked this pull request as ready for review September 13, 2024 08:25

qgallouedec requested review from kashif, edbeeching and lewtun September 13, 2024 08:26

kashif approved these changes Sep 13, 2024

View reviewed changes

edbeeching reviewed Sep 13, 2024

View reviewed changes

trl/trainer/online_dpo_trainer.py Outdated Show resolved Hide resolved

edbeeching reviewed Sep 13, 2024

View reviewed changes

trl/trainer/online_dpo_trainer.py Outdated Show resolved Hide resolved

edbeeching reviewed Sep 13, 2024

View reviewed changes

trl/trainer/online_dpo_trainer.py Outdated Show resolved Hide resolved

edbeeching reviewed Sep 13, 2024

View reviewed changes

trl/trainer/online_dpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec added 6 commits September 13, 2024 08:52

peft example command

992805d

typo

e9b07bd

remove param.requires_grad = False for the reward model

cf39657

make model required arg

c66414f

update example script

3928009

update xpo trainer

b15ddea

lewtun approved these changes Sep 13, 2024

View reviewed changes

examples/scripts/dpo_online.py Outdated Show resolved Hide resolved

examples/scripts/dpo_online.py Show resolved Hide resolved

examples/scripts/dpo_online.py Show resolved Hide resolved

examples/scripts/dpo_online.py Show resolved Hide resolved

qgallouedec and others added 3 commits September 13, 2024 12:24

Update examples/scripts/dpo_online.py

3c85b93

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update examples/scripts/dpo_online.py

4a38809

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Merge branch 'main' into kaist

a30589c

Merge branch 'main' into kaist

37c7fca

Merge branch 'main' into kaist

44c1750

qgallouedec and others added 2 commits September 13, 2024 18:26

Merge branch 'main' into kaist

2a61af1

merge and unload

0246c5f

qgallouedec merged commit ebc85b2 into main Sep 13, 2024
10 checks passed

qgallouedec deleted the kaist branch September 13, 2024 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEFT support for Online DPO #2041

PEFT support for Online DPO #2041

qgallouedec commented Sep 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 9, 2024

qgallouedec commented Sep 9, 2024 •

edited

Loading

kashif commented Sep 13, 2024

qgallouedec commented Sep 13, 2024

lewtun left a comment •

edited

Loading

lewtun commented Sep 13, 2024 •

edited

Loading

BenjaminBossan commented Sep 13, 2024

qgallouedec commented Sep 13, 2024

qgallouedec commented Sep 13, 2024

PEFT support for Online DPO #2041

PEFT support for Online DPO #2041

Conversation

qgallouedec commented Sep 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 9, 2024

qgallouedec commented Sep 9, 2024 • edited Loading

kashif commented Sep 13, 2024

qgallouedec commented Sep 13, 2024

lewtun left a comment • edited Loading

Choose a reason for hiding this comment

lewtun commented Sep 13, 2024 • edited Loading

BenjaminBossan commented Sep 13, 2024

qgallouedec commented Sep 13, 2024

qgallouedec commented Sep 13, 2024

qgallouedec commented Sep 9, 2024 •

edited

Loading

qgallouedec commented Sep 9, 2024 •

edited

Loading

lewtun left a comment •

edited

Loading

lewtun commented Sep 13, 2024 •

edited

Loading