-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: cannot pickle '_LazyModule' object #12549
Comments
Could you please attach the final script you used or a branch that we can use to reproduce your code exactly? Thanks. note: I took the liberty to edit your OP to use code formatting which is much easier to read. If possible use a similar approach in future reports. Thank you! |
this is my scripts, thanks very much! |
Thank you. The attached script fails for me. You also didn't supply the data, but I assume it doesn't matter. In the future please supply everything or adapt your runtime so that we could run it out of the box and not need to spend a lot of time to try to make things work.
same failure with distributed. |
So sorry, it's my fault, I gave you the wrong version. |
I'm able to reproduce the problem - great! Let's see what the culprit is. |
So the trigger is: and the minimal reproduction cmd is:
It happens only with your version of the script. I tested with the one in The problem is unrelated to the change in #11168 as you have discovered yourself, since your code removed my changes and you're just passing:
So need to look elsewhere for the cause. |
From a quick look I suspect that perhaps this is an issue in
inside the multi-proc modifications you made. e.g. the above is enough to trigger the same error in the script so removing most of the code should |
OK, here is the minimal reproducible script. Totally unrelated to
this still fails with the same error.
But if you either:
@lhoestq, @albertvillanova - does this ring any bells? Clearly |
Thank you so much for your time, and hope other experts can give some tips about this problem. |
Hi @stas00, thanks for pinging. I'm having a look and after a first search, I think you are right and the problem comes from the fact that cc: @lhoestq |
hi albertvillanova, I removed import of transformers according to the following code, it still can't work.
|
Note that we can easily make EDIT: here it is: #12552 This is just a way to easily fix this issue, but I think we should definitely keep trying to figure out why it tried to pickle |
Linking to the new PR: #12567 |
Should be closed by #12567, please let us know if the problem persists. |
Hi, a new problem has arisen Traceback (most recent call last): |
@stas00 edit: please see #12549 (comment) for the short reproduction script.
Environment info
transformers
version: 4.9.0.dev0Who can help
@stas00 @patrickvonplaten, @LysandreJik
Information
Model I am using (Bert, XLNet ...): GPT2
The problem arises when using:
The tasks I am working on is:
To reproduce
I am running the minimal command:
and I modified the following parts of the script ‘run_clm.py’, and the parameter rank passed in training_args.local_rank
the traceback informations are:
I run the following command based on the original script, it works well. The reason why I don't use this command is that our cluster doesn't support this way of passing parameters: "-m torch.distributed.launch --nproc_per_node=4 "
Expected behavior
The text was updated successfully, but these errors were encountered: