[DPO] use ref model logprobs if it exists in the data #885

kashif · 2023-10-17T18:40:34Z

Refactor the trainer so that it can use the logprobs from the data rather than a reference model if it exists in the dataset
Added a flag that adds reference model logprobs to the dataset before training in the dataloader creation phase
fix confusion about padding_value and label_pad_token_id
fix from fix DPO data collator #932
fix for DPO models generate multiple / corrupted responses #1025

HuggingFaceDocBuilderDev · 2023-10-19T08:32:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

lvwerra

I understand this only affects internals except the new flag? Looks good to me, maybe @lewtun wants to have a look too since he's been using this quite a bit recently.

edbeeching · 2023-10-31T09:50:56Z

Hey @kashif , a couple of questions:

Is the user expected to pass ref_model=None and precompute_logprobs=True to the DPOTrainer? As in this case isn't another instance of the model instantiated through a deepcopy?:

trl/trl/trainer/dpo_trainer.py

Line 166 in 5096ff2

self.ref_model = create_reference_model(model)
Will the pre-computation of logprobs work correctly in a distributed setting, as if we have 8 GPUs and DDP=8, I think the dataset.map will run on all of the dataset on all GPUs?

trl/trl/trainer/dpo_trainer.py

Line 307 in 5096ff2

self.train_dataset = self.train_dataset.map(

lewtun

Thanks a lot for optimising the DPO trainer @kashif 🔥 !

Overall the PR looks great, but I have a few questions about whether the log probs are precomputed on CPU vs GPU.

It would also be great to see some docs on how to use this feature, e.g. showing the 3 main cases:

Log probs are already precomputed and stored in a dataset
Log probs are precomputed at the start of training
Log probs are computed on the fly (previous behaviour)

Would it also make sense if I run this PR through a DPO training run to check we don't have any major regressions?

trl/trainer/dpo_trainer.py

lewtun · 2023-10-31T09:44:12Z

trl/trainer/dpo_trainer.py

+        # tokenize the dataset and compute reference logps for training datasets
+        self.train_dataset = self.train_dataset.map(self.tokenize_batch_element)
+        if self.precompute_ref_logps:
+            self.train_dataset = self.train_dataset.map(


Does the map() run on CPU or GPU? I seem to recall that batched maps like this were best done with a torch dataloader, but perhaps this is no longer true. I'm mostly worried that running Llama 70B on CPU will blow up the RAM :)

Do you happen to know what the answer to the above question is (ie do we run inference on CPU or GPU)?

It should run wherever the ref_model is at this point, which is already prepared via accelerate no?

yes however if there is explicitly a ref_model then (since its no being passed to the accelerate we have to move it to the accelerate device) which is what i am doing

trl/trainer/dpo_trainer.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

younesbelkada

LGTM, what do you think @lvwerra @lewtun @edbeeching ?

trl/trainer/dpo_trainer.py

…llama

…ted for llama" This reverts commit dd07a10.

lewtun · 2023-12-12T16:14:10Z

This looks good to go - let merge it 🔥 !

@nrailg

* use logprobs if it exists in the batch * add features to tokenized batch if in data * make get_batch_logps a static method * add tokenize_batch_element dataset mapper * Remove tokenize_batch method from DPODataCollator * Initial sketch to precompute reference_logps * run ref model via pytorch dataloader * add a padding helper * clean up the helper * use logprob item() * default behaviour * clean up collator * add docstring * copy data back to cpu if needed * use get_train_dataloader methods * fix tests * rename: more explicit variable name precompute_ref_log_probs * improve comment * update comment * Update trl/trainer/dpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * refactor models into setup parameters * parametrize precompute_ref_log_probs flag * remove useless test * Update trl/trainer/dpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update tests/test_dpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update tests/test_dpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update trl/trainer/dpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update trl/trainer/dpo_trainer.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * update function arg name * distinguish between pad token_id and mask values * fix tokenization huggingface#932 by @nrailg * fix test * undo test refactor * new line * undo breaking change * Update token counter condition to allow Llama tokenizer * Acount for merged tokens on certain tokenizers such Llama-2 tokenizer * Update variable name to match list value when truncating response * map function on multi-gpu and gather * Add test cases for DPOTrainer tokenization step * revert since we need the prepeared model * Use gather_with_metrics on ref_logps precomputation to keep original dataset size * Add flag to keep track of when ref_logps are precomputed * make variable names private * formatting * if precompute_ref_log_probs is true one can use non-peft to populate log-probs * Use tokenizer padding token unless padding_value is set * Move dataset.map(tokenize_batch) outside dataloader to avoid serialization errors * eval can be none * move to cpu to avoid gpu oom * remove unneeded cast to float32 * remove unneeded * fix merge * fix merge * fix merge * add precompute log-prob status via tqdm * Truncate answer if too longer once prompt has been truncated * Add prompt_input_ids to batch to enable generation * formatting and add lora example * fix formatting * Tokenize row now expects sample to have space on chosen/rejected for llama * Revert "Tokenize row now expects sample to have space on chosen/rejected for llama" This reverts commit dd07a10. * raise error when using zero-3 with precompute_ref_log_probs --------- Co-authored-by: Pablo Vicente Juan <p.vicente.juan@gmail.com> Co-authored-by: Shoaib Burq <saburq@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

kashif marked this pull request as draft October 17, 2023 18:40

kashif changed the title ~~[DPO] use logprobs if it exists in the data~~ [DPO] use ref model logprobs if it exists in the data Oct 17, 2023

kashif and others added 14 commits October 27, 2023 11:33

use logprobs if it exists in the batch

0110d87

add features to tokenized batch if in data

25d726b

make get_batch_logps a static method

920a605

add tokenize_batch_element dataset mapper

ab09bc0

Remove tokenize_batch method from DPODataCollator

3471d89

Initial sketch to precompute reference_logps

4e6ccb9

run ref model via pytorch dataloader

ba35bca

add a padding helper

4f967d1

clean up the helper

2168cae

use logprob item()

6631c9f

default behaviour

f06b9b0

clean up collator

16cfc87

add docstring

0bd2058

copy data back to cpu if needed

e1acfb3

kashif force-pushed the reference-logprobs branch from 90c470a to e1acfb3 Compare October 27, 2023 09:33

kashif added 2 commits October 27, 2023 11:50

use get_train_dataloader methods

1e50685

fix tests

5096ff2

kashif marked this pull request as ready for review October 27, 2023 10:38

lvwerra approved these changes Oct 31, 2023

View reviewed changes

lvwerra requested a review from lewtun October 31, 2023 09:22

lewtun requested a review from edbeeching October 31, 2023 09:31

lewtun reviewed Oct 31, 2023

View reviewed changes

sabman and others added 4 commits November 1, 2023 18:27

rename: more explicit variable name precompute_ref_log_probs

1d0145c

improve comment

d3bc976

update comment

64e06d9

Update trl/trainer/dpo_trainer.py

5281f3c

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

kashif added 2 commits November 17, 2023 10:23

eval can be none

619a170

move to cpu to avoid gpu oom

1575da8

kashif mentioned this pull request Nov 20, 2023

DPO: Multi-GPU training does not start, but works on single GPU #1011

Closed

kashif added 4 commits November 20, 2023 18:36

remove unneeded cast to float32

1c9d770

remove unneeded

206248c

Merge branch 'main' into reference-logprobs

190c2f0

fix merge

db2eec4

kashif mentioned this pull request Nov 26, 2023

DPO models generate multiple / corrupted responses #1025

Open

kashif added 2 commits December 1, 2023 10:44

Merge remote-tracking branch 'upstream/main' into reference-logprobs

7fb8846

fix merge

cbd13c0

kashif added the 🏋 DPO Related to DPO label Dec 2, 2023

kashif added 3 commits December 5, 2023 12:59

Merge remote-tracking branch 'upstream/main' into reference-logprobs

158ecfd

fix merge

1aa5c38

add precompute log-prob status via tqdm

9c52b21

younesbelkada approved these changes Dec 6, 2023

View reviewed changes

lewtun reviewed Dec 7, 2023

View reviewed changes

trl/trainer/dpo_trainer.py Show resolved Hide resolved

pablovicente and others added 3 commits December 7, 2023 19:05

Truncate answer if too longer once prompt has been truncated

36b80b0

Add prompt_input_ids to batch to enable generation

7f0bf14

formatting and add lora example

5d1dd2d

kashif mentioned this pull request Dec 10, 2023

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

Closed

kashif and others added 5 commits December 11, 2023 11:44

Merge branch 'main' into reference-logprobs

9a65cc3

fix formatting

b40c264

Tokenize row now expects sample to have space on chosen/rejected for …

dd07a10

…llama

Revert "Tokenize row now expects sample to have space on chosen/rejec…

94207ba

…ted for llama" This reverts commit dd07a10.

raise error when using zero-3 with precompute_ref_log_probs

0d9a4c0

kashif merged commit 48b3ef0 into huggingface:main Dec 12, 2023
9 checks passed

kashif deleted the reference-logprobs branch December 12, 2023 18:12

AjayP13 mentioned this pull request Dec 24, 2023

Bug with: precompute_train_ref_log_probs #1139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] use ref model logprobs if it exists in the data #885

[DPO] use ref model logprobs if it exists in the data #885

kashif commented Oct 17, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 19, 2023

lvwerra left a comment

edbeeching commented Oct 31, 2023 •

edited

Loading

lewtun left a comment

lewtun Oct 31, 2023

lewtun Nov 3, 2023

lvwerra Nov 3, 2023

kashif Nov 3, 2023

younesbelkada left a comment

lewtun commented Dec 12, 2023

[DPO] use ref model logprobs if it exists in the data #885

[DPO] use ref model logprobs if it exists in the data #885

Conversation

kashif commented Oct 17, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Oct 19, 2023

lvwerra left a comment

Choose a reason for hiding this comment

edbeeching commented Oct 31, 2023 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

lewtun Oct 31, 2023

Choose a reason for hiding this comment

lewtun Nov 3, 2023

Choose a reason for hiding this comment

lvwerra Nov 3, 2023

Choose a reason for hiding this comment

kashif Nov 3, 2023

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

lewtun commented Dec 12, 2023

kashif commented Oct 17, 2023 •

edited

Loading

edbeeching commented Oct 31, 2023 •

edited

Loading