Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

Closed
xzqxnet0990 opened this issue Dec 8, 2023 · 6 comments
Closed

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

xzqxnet0990 opened this issue Dec 8, 2023 · 6 comments

Comments

@xzqxnet0990
Copy link

xzqxnet0990 commented Dec 8, 2023

The problem happened in trl/trl/trainer /utils.py in line 456

else:
    # adapted from https://stackoverflow.com/questions/73256206
    if "prompt" in k:
        to_pad = [torch.LongTensor(ex[k][::-1]) for ex in batch]
    else:
456        to_pad = [torch.LongTensor(ex[k]) for ex in batch]
    if k.endswith("_input_ids"):
        padding_value = self.tokenizer.pad_token_id

I am using Qwen/Qwen-1_8B-Chat model and official finetune.py to do the DPOTrain.
My training datasets are like this:

{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}

If I direclty run the DPO code will meet the problem:

File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 490, in call
return self.collate(tokenized_batch)
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 449, in collate
to_pad = [torch.LongTensor(ex[k]) for ex in batch]
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 449, in
to_pad = [torch.LongTensor(ex[k]) for ex in batch]
TypeError: an integer is required (got type NoneType)

If I debug the code in line 483:

for feature in features:
    prompt = feature["prompt"]
    chosen = feature["chosen"]
    rejected = feature["rejected"]

483    batch_element = self.tokenize_batch_element(prompt, chosen, rejected)
    print(batch_element)
    tokenized_batch.append(batch_element)

If I print the batch_element out, there will be another extra None at the end of the array:

batch_element:{'chosen_input_ids': [16, 10, 17, 28, 19, None], 'chosen_attention_mask': [1, 1, 1, 1, 1, 1], 'chosen_labels': [-100, -100, -100, -100, 19, None], 'rejected_input_ids': [16, 10, 17, 28, 18, None], 'rejected_attention_mask': [1, 1, 1, 1, 1, 1], 'rejected_labels': [-100, -100, -100, -100, 18, None], 'prompt_input_ids': [16, 10, 17, 28], 'prompt_attention_mask': [1, 1, 1, 1], 'prompt': '1+2=', 'chosen': '1+2=4', 'rejected': '1+2=3', 'chosen_response_only': '4', 'rejected_response_only': '3'}

My chosen_input_ids 1+2=4 length should be 5, but after self.tokenize_batch_element the 'chosen_input_ids': [16, 10, 17, 28, 19, None] length is 6, and there is another extra None lead the TypeError: an integer is required (got type NoneType) problem.
So, I changed the line 456 to_pad = [torch.LongTensor(ex[k]) for ex in batch] to 456 to_pad = [torch.LongTensor(ex[k][:-1]) for ex in batch] and It worked

{'loss': 0.2599, 'learning_rate': 0.0003, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.21053116023540497, 'logps/chosen': -4.585531234741211, 'logits/rejected': -2.686852216720581, 'logits/chosen': -2.6731910705566406, 'epoch': 1.0}
{'loss': 0.2599, 'learning_rate': 0.00015, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.21053116023540497, 'logps/chosen': -4.585531234741211, 'logits/rejected': -2.686852216720581, 'logits/chosen': -2.6731910705566406, 'epoch': 2.0}
{'loss': 0.1227, 'learning_rate': 0.0, 'rewards/chosen': 0.27905863523483276, 'rewards/rejected': -0.17719139158725739, 'rewards/accuracies': 1.0, 'rewards/margins': 0.45625001192092896, 'logps/rejected': -1.9824450016021729, 'logps/chosen': -1.7949450016021729, 'logits/rejected': -2.546565055847168, 'logits/chosen': -2.5510566234588623, 'epoch': 2.67}
{'train_runtime': 2.826, 'train_samples_per_second': 3.185, 'train_steps_per_second': 1.062, 'train_loss': 0.2141884664694468, 'epoch': 2.67}
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s]
***** train metrics *****
epoch = 2.67
train_loss = 0.2142
train_runtime = 0:00:02.82
train_samples = 3
train_samples_per_second = 3.185
train_steps_per_second = 1.062
Training metrics: {'train_runtime': 2.826, 'train_samples_per_second': 3.185, 'train_steps_per_second': 1.062, 'train_loss': 0.2141884664694468, 'epoch': 2.67, 'train_samples': 3}

I do not know whether am I right, or I did not use it the right way.
I think the problem may happened because Qwen has it own tokenizer.

My prompt dict :

return {
            "prompt": ["Question: " + question + "\n\nAnswer: " + for question in examples["question"]],
            "chosen": examples["response_chosen"],
            "rejected": examples["response_rejected"],
        }

DPOTrainer :

trainer = DPOTrainer(
        model,
        ref_model=deepcopy(model),
        args=training_args,
        beta=0.1,
        tokenizer=tokenizer,
        peft_config=lora_config,
        max_prompt_length=training_args.model_max_length,
        max_length=training_args.model_max_length,
        train_dataset=data_module['train_dataset'],
        eval_dataset=data_module['eval_dataset'],
    )

tokenizer :

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        model_max_length=training_args.model_max_length,
        padding_side="right",
        use_fast=False,
        trust_remote_code=True,
    )
    tokenizer.pad_token_id = tokenizer.eod_id
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token_id = 0  # set as the <unk> token
xzqxnet0990 pushed a commit to xzqxnet0990/trl that referenced this issue Dec 8, 2023
@xzqxnet0990 xzqxnet0990 changed the title DPOTrainer Problem: trl/trl/trainer /utils.py:456 DPOTrainer Problem: trl/trainer/utils.py:456 Dec 8, 2023
@kashif
Copy link
Collaborator

kashif commented Dec 10, 2023

@xzqxnet0990 I believe we have fixed the tokenization in the PR #885 if you want to give that branch a try?

@xzqxnet0990
Copy link
Author

@kashif Thanks,I will try.

Copy link

github-actions bot commented Jan 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@lvwerra lvwerra closed this as completed Jan 8, 2024
@wen2cheng
Copy link

wen2cheng commented Jan 18, 2024

The main reason for the error is that “tokenize_row” in dpo_trainer will add bos and eos to prompt and answer. If these two toekn are not configured for the Qwen-chat model in advance, the default is None, and errors will occur later.

@LuJunru
Copy link

LuJunru commented Jan 24, 2024

The main reason for the error is that “tokenize_row” in dpo_trainer will add bos and eos to prompt and answer. If these two toekn are not configured for the Qwen-chat model in advance, the default is None, and errors will occur later.

Hi, @wen2cheng

Your suggestion is correct. I fix this issue by adding:

tokenizer.add_special_tokens({"bos_token": tokenizer.eos_token})
tokenizer.bos_token_id = tokenizer.eos_token_id

When the model belongs to Qwen series.

Not sure about the final results since I am still training. However, the issue do fixed.


Update:
Another way would be to simply add conditions here: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L632

if self.tokenizer.bos_token is not None:

Junru

@RonanKMcGovern
Copy link

Thanks, just had the same issue.

Since the chosen completions (and rejected) are tokenized on their own, is there a risk that a bos_token is being added there? (which wouldn't happen if tokenizing a complete prompt+completion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants