-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistral with flash attention 2 and right padding #26877
Comments
Having the same issues |
Indeed forward should be supported but not generation, will raise a patch soon for this |
Hey @younesbelkada, I am still seeing this error after set Tokenizer already set with left padding Still have: Not sure is this because of trl or something wrong within transformer? transformers 4.36.2 |
I am having the same issue. Even after I set the tokenizer padding side = left, this error still occurs during training. |
Thanks everyone for reporting, might be an issue with TRL I think, let me have a deeper look and get back ASAP |
I opened a issue in trl as well: huggingface/trl#1217 (comment) |
You need to set |
I try your solution and it work like a charm but does set use_cache to False make tokenizer.padding_side = 'left' during evaluation and 'right' during training. I read the doc about use cache here: link but seem like it just reduce the inference time. Can you explain why it work like a charm sir ? |
System Info
transformers
version: 4.34.0Who can help?
@younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
If you run a batch through mistral with flash attention 2 with right padding, you get
I am not doing generation, just calling forward. Is the error message incorrect and you actually meant to prevent all usage of right padding here? Or is the implementation wrong and this was meant to only prevent generate usage of right padding? Or perhaps I am missing something else. Thanks!
Expected behavior
Either right padding is ok for calling forward, or the error message correctly states the problem.
The text was updated successfully, but these errors were encountered: