-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trainer save_model ValueError You are trying to save a non contiguous tensor #28293
Comments
hmmm do you know what might be happening here @Narsil ? With mt5 |
Might be fixed by #28414 ? |
Happy to take a look if I can have acces either to the finetune (even dummy I just need to look at those tensors) or a reproducer. I have no idea what makes some tensors non contiguous and what kind of non contiguous those are |
Non-contiguous parameters/buffers can be saved with |
I was try to ask more, what lib is actually creating non contiguous tensors ? Seems odd to me that we need to create non contiguous tensors for training. Deepspeed for isntant it's not non contiguous it' s more that they abuse the storage system to force several matmul locality (which I think it to optimize network transport), therefore it was easy to fix once identified (because that's a condition where it's easy to rework the tensors on behalf of users since the non contiguity is not really important for the model). |
I ran into this issue due to a custom weight tying scheme (output layer is a transpose of the vocabulary embedding, so the former is not contiguous). I got around the error by turning off safe serialization as noted above. |
Hi, thanks all for the comments. I have no idea why there are even non-contiguous tensors. I think make them contiguous makes more sense? And it solves the problem and the model training seems to be well. I found it odd that the error doesn't occur for trains T5 models, only for MT5 models, since MT5 is built upon T5 in transformers scripts. |
This is related to the PR that was reverted #28898 I believe! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@ZhanGHanG9991 This is crude but you can add this to your code just when you initialise your model. |
Amazing it work me! I add this in the init() just after the init of the model |
It worked ! thanks |
It works! Thank you very much! |
Thanks!! |
System Info
Transformers version: 4.36.2
pytorch version: 2.1.1
Python version: 3.10.13
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Fine-tuning mt5 model on a task using transformers trainer, and try to save the model, then the following error occurs.
Expected behavior
Fine-tune mt5 model, and try to save the fine-tuned model, it renders the above error, and modifying
transformers/modeling_utils.py
file withstate_dict= {k:v.contiguous() for k,v in state_dict.items()}
solves the problem.The text was updated successfully, but these errors were encountered: