-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlenderBot RuntimeError: CUDA error: device-side assert triggered #9046
Comments
Hey @manzar96, It would be awesome if you could provide a full code snippet that I can copy paste and run to reproduce the error. I am not able to do so with your code above. Thanks a lot! |
I made an example:
If you try printing the outputs['loss'] the error occurs. However, if you replace the |
This is a bug, in bart transformers/src/transformers/models/bart/modeling_bart.py Lines 65 to 73 in 6587cf9
In T5 we automatically replace -100 with transformers/src/transformers/models/t5/modeling_t5.py Lines 740 to 756 in 6587cf9
|
You're right @patil-suraj - do you want to open a PR to fix it in Bart? :-) |
Yeah! |
Environment info
transformers
version: 4.0.0Who can help
@patrickvonplaten
Information
Model I am using (Bert, XLNet ...): I am using the BlenderbotForConditionalGeneration ('facebook/blenderbot-90M') along with the relevant small tokenizer.
The problem arises when using:
I am using my own trainer implementation. I think that the problem has to do with the indexes of the labels. More specifically when I am using:
outputs = self.model(input_ids=inputs, attention_mask=inputs_att, labels=pad_targets, return_dict=True)
everything works fine as the "pad_targets" are the targets using 0 as the index for masked (padded) tokens.
However when I am using:
outputs = self.model(input_ids=inputs, attention_mask=inputs_att, labels=repl_targets, return_dict=True)
and then printing the outputs['loss'] the following error is occurred:
RuntimeError: CUDA error: device-side assert triggered
as the "repl_targets" are the targets using the -100 as the index for masked (padded) tokens.
The aforementioned error also occurs when using the argument:
decoder_input_ads=repl_targets
The tasks I am working on is:
Dialogue generation in Empathetic Dialogues dataset.
Expected behavior
I think that there is a problem with the -100 padding token. But I am not sure :)
The text was updated successfully, but these errors were encountered: