-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPOTrainer Problem: trl/trainer/utils.py:456 #1073
Comments
@xzqxnet0990 I believe we have fixed the tokenization in the PR #885 if you want to give that branch a try? |
@kashif Thanks,I will try. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
The main reason for the error is that “tokenize_row” in dpo_trainer will add bos and eos to prompt and answer. If these two toekn are not configured for the Qwen-chat model in advance, the default is None, and errors will occur later. |
Hi, @wen2cheng Your suggestion is correct. I fix this issue by adding:
When the model belongs to Qwen series. Not sure about the final results since I am still training. However, the issue do fixed. Update:
Junru |
Thanks, just had the same issue. Since the chosen completions (and rejected) are tokenized on their own, is there a risk that a bos_token is being added there? (which wouldn't happen if tokenizing a complete prompt+completion? |
The problem happened in trl/trl/trainer /utils.py in line 456
I am using Qwen/Qwen-1_8B-Chat model and official finetune.py to do the DPOTrain.
My training datasets are like this:
If I direclty run the DPO code will meet the problem:
If I debug the code in line 483:
If I print the batch_element out, there will be another extra None at the end of the array:
My chosen_input_ids 1+2=4 length should be 5, but after self.tokenize_batch_element the 'chosen_input_ids': [16, 10, 17, 28, 19, None] length is 6, and there is another extra None lead the TypeError: an integer is required (got type NoneType) problem.
So, I changed the line
456 to_pad = [torch.LongTensor(ex[k]) for ex in batch]
to456 to_pad = [torch.LongTensor(ex[k][:-1]) for ex in batch]
and It workedI do not know whether am I right, or I did not use it the right way.
I think the problem may happened because Qwen has it own tokenizer.
My prompt dict :
DPOTrainer :
tokenizer :
The text was updated successfully, but these errors were encountered: