DPOTrainer Problem: trl/trainer/utils.py:456 #1073

xzqxnet0990 · 2023-12-08T09:12:18Z

The problem happened in trl/trl/trainer /utils.py in line 456

else:
    # adapted from https://stackoverflow.com/questions/73256206
    if "prompt" in k:
        to_pad = [torch.LongTensor(ex[k][::-1]) for ex in batch]
    else:
456        to_pad = [torch.LongTensor(ex[k]) for ex in batch]
    if k.endswith("_input_ids"):
        padding_value = self.tokenizer.pad_token_id

I am using Qwen/Qwen-1_8B-Chat model and official finetune.py to do the DPOTrain.
My training datasets are like this:

{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}
{"question": "1+2=", "response_chosen": "4", "response_rejected": "3"}

If I direclty run the DPO code will meet the problem:

File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 490, in call
return self.collate(tokenized_batch)
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 449, in collate
to_pad = [torch.LongTensor(ex[k]) for ex in batch]
File "/data/ketadb/condaenv/envs/qwen/lib/python3.9/site-packages/trl/trainer/utils.py", line 449, in
to_pad = [torch.LongTensor(ex[k]) for ex in batch]
TypeError: an integer is required (got type NoneType)

If I debug the code in line 483:

for feature in features:
    prompt = feature["prompt"]
    chosen = feature["chosen"]
    rejected = feature["rejected"]

483    batch_element = self.tokenize_batch_element(prompt, chosen, rejected)
    print(batch_element)
    tokenized_batch.append(batch_element)

If I print the batch_element out, there will be another extra None at the end of the array:

batch_element:{'chosen_input_ids': [16, 10, 17, 28, 19, None], 'chosen_attention_mask': [1, 1, 1, 1, 1, 1], 'chosen_labels': [-100, -100, -100, -100, 19, None], 'rejected_input_ids': [16, 10, 17, 28, 18, None], 'rejected_attention_mask': [1, 1, 1, 1, 1, 1], 'rejected_labels': [-100, -100, -100, -100, 18, None], 'prompt_input_ids': [16, 10, 17, 28], 'prompt_attention_mask': [1, 1, 1, 1], 'prompt': '1+2=', 'chosen': '1+2=4', 'rejected': '1+2=3', 'chosen_response_only': '4', 'rejected_response_only': '3'}

My chosen_input_ids 1+2=4 length should be 5, but after self.tokenize_batch_element the 'chosen_input_ids': [16, 10, 17, 28, 19, None] length is 6, and there is another extra None lead the TypeError: an integer is required (got type NoneType) problem.
So, I changed the line 456 to_pad = [torch.LongTensor(ex[k]) for ex in batch] to 456 to_pad = [torch.LongTensor(ex[k][:-1]) for ex in batch] and It worked

{'loss': 0.2599, 'learning_rate': 0.0003, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.21053116023540497, 'logps/chosen': -4.585531234741211, 'logits/rejected': -2.686852216720581, 'logits/chosen': -2.6731910705566406, 'epoch': 1.0}
{'loss': 0.2599, 'learning_rate': 0.00015, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -0.21053116023540497, 'logps/chosen': -4.585531234741211, 'logits/rejected': -2.686852216720581, 'logits/chosen': -2.6731910705566406, 'epoch': 2.0}
{'loss': 0.1227, 'learning_rate': 0.0, 'rewards/chosen': 0.27905863523483276, 'rewards/rejected': -0.17719139158725739, 'rewards/accuracies': 1.0, 'rewards/margins': 0.45625001192092896, 'logps/rejected': -1.9824450016021729, 'logps/chosen': -1.7949450016021729, 'logits/rejected': -2.546565055847168, 'logits/chosen': -2.5510566234588623, 'epoch': 2.67}
{'train_runtime': 2.826, 'train_samples_per_second': 3.185, 'train_steps_per_second': 1.062, 'train_loss': 0.2141884664694468, 'epoch': 2.67}
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.06it/s]
***** train metrics *****
epoch = 2.67
train_loss = 0.2142
train_runtime = 0:00:02.82
train_samples = 3
train_samples_per_second = 3.185
train_steps_per_second = 1.062
Training metrics: {'train_runtime': 2.826, 'train_samples_per_second': 3.185, 'train_steps_per_second': 1.062, 'train_loss': 0.2141884664694468, 'epoch': 2.67, 'train_samples': 3}

I do not know whether am I right, or I did not use it the right way.
I think the problem may happened because Qwen has it own tokenizer.

My prompt dict :

return {
            "prompt": ["Question: " + question + "\n\nAnswer: " + for question in examples["question"]],
            "chosen": examples["response_chosen"],
            "rejected": examples["response_rejected"],
        }

DPOTrainer :

trainer = DPOTrainer(
        model,
        ref_model=deepcopy(model),
        args=training_args,
        beta=0.1,
        tokenizer=tokenizer,
        peft_config=lora_config,
        max_prompt_length=training_args.model_max_length,
        max_length=training_args.model_max_length,
        train_dataset=data_module['train_dataset'],
        eval_dataset=data_module['eval_dataset'],
    )

tokenizer :

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        model_max_length=training_args.model_max_length,
        padding_side="right",
        use_fast=False,
        trust_remote_code=True,
    )
    tokenizer.pad_token_id = tokenizer.eod_id
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token_id = 0  # set as the <unk> token

The text was updated successfully, but these errors were encountered:

kashif · 2023-12-10T11:22:26Z

@xzqxnet0990 I believe we have fixed the tokenization in the PR #885 if you want to give that branch a try?

xzqxnet0990 · 2023-12-10T11:31:31Z

@kashif Thanks，I will try.

github-actions · 2024-01-07T15:05:10Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

wen2cheng · 2024-01-18T09:38:51Z

The main reason for the error is that “tokenize_row” in dpo_trainer will add bos and eos to prompt and answer. If these two toekn are not configured for the Qwen-chat model in advance, the default is None, and errors will occur later.

LuJunru · 2024-01-24T03:17:50Z

The main reason for the error is that “tokenize_row” in dpo_trainer will add bos and eos to prompt and answer. If these two toekn are not configured for the Qwen-chat model in advance, the default is None, and errors will occur later.

Hi, @wen2cheng

Your suggestion is correct. I fix this issue by adding:

tokenizer.add_special_tokens({"bos_token": tokenizer.eos_token})
tokenizer.bos_token_id = tokenizer.eos_token_id

When the model belongs to Qwen series.

Not sure about the final results since I am still training. However, the issue do fixed.

Update:
Another way would be to simply add conditions here: https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L632

if self.tokenizer.bos_token is not None:

Junru

RonanKMcGovern · 2024-04-04T12:08:49Z

Thanks, just had the same issue.

Since the chosen completions (and rejected) are tokenized on their own, is there a risk that a bos_token is being added there? (which wouldn't happen if tokenizing a complete prompt+completion?

check huggingface/trl#1073 for detail

xzqxnet0990 pushed a commit to xzqxnet0990/trl that referenced this issue Dec 8, 2023

fix dpo trainer bug (huggingface#1073)

ac8fee3

xzqxnet0990 changed the title ~~DPOTrainer Problem: trl/trl/trainer /utils.py:456~~ DPOTrainer Problem: trl/trainer/utils.py:456 Dec 8, 2023

xzqxnet0990 mentioned this issue Dec 10, 2023

fix dpo trainer bug (#1073) #1074

Closed

lvwerra closed this as completed Jan 8, 2024

Tomsawyerhu added a commit to Tomsawyerhu/Qwen2.5-Coder that referenced this issue Nov 21, 2024

fix: TypeError: 'NoneType' object cannot be interpreted as an integer

21c77ff

check huggingface/trl#1073 for detail

This was referenced Nov 21, 2024

fix: TypeError: 'NoneType' object cannot be interpreted as an integer Tomsawyerhu/Qwen2.5-Coder#1

Merged

fix: TypeError: 'NoneType' object cannot be interpreted as an integer QwenLM/Qwen2.5-Coder#177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

xzqxnet0990 commented Dec 8, 2023 •

edited

Loading

kashif commented Dec 10, 2023

xzqxnet0990 commented Dec 10, 2023

github-actions bot commented Jan 7, 2024

wen2cheng commented Jan 18, 2024 •

edited

Loading

LuJunru commented Jan 24, 2024 •

edited

Loading

RonanKMcGovern commented Apr 4, 2024

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

DPOTrainer Problem: trl/trainer/utils.py:456 #1073

Comments

xzqxnet0990 commented Dec 8, 2023 • edited Loading

kashif commented Dec 10, 2023

xzqxnet0990 commented Dec 10, 2023

github-actions bot commented Jan 7, 2024

wen2cheng commented Jan 18, 2024 • edited Loading

LuJunru commented Jan 24, 2024 • edited Loading

RonanKMcGovern commented Apr 4, 2024

xzqxnet0990 commented Dec 8, 2023 •

edited

Loading

wen2cheng commented Jan 18, 2024 •

edited

Loading

LuJunru commented Jan 24, 2024 •

edited

Loading