-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd (compilation?) error #2376
Comments
Hi, Looks like the real error is here:
The 236 limit should be higher iirc (though not entirely sure if 787 within limits). I don't know how you're updating your TPU runtime but can you follow what our example colab notebooks have? Ex. #1963 (comment) |
@jysohn23 Sorry, I forgot to put in the information about my environment. This was done with a GCP TPU VM. So everything should be set up automatically in the So if I do have to run the snippet in the linked issue, do I run it inside the conda environment? |
What TPU did you create? |
Yes... Just tested this again right now with same error using torch-xla-nightly conda environment on an n1-standard-16 with a v3-8 with pytorch-nightly... |
@jysohn23 Let me know if you need more information about the bug from me... |
Hmm I believe that "(limit 236)" seems like you're somehow using our old runtime. Do you have the TPU name and zone of the TPU you're currently using? I can try take a look on verifying whether the runtime is the newer one. Also, however 787 parameters actually looks like it goes over our hard current limit, which is an issue @taylanbil had also reproduced. First let's try to find our why the limit is so low. |
@jysohn23 Here is the information: |
I'll take a look once I get back to my laptop, but did you run any `curl -X
POST http://TPU_IP:8475/requestversion/...` command from the GCE VM? If so
what was the exact command you ran?
…On Mon, Jul 27, 2020, 7:59 PM Tanishq Abraham ***@***.***> wrote:
@jysohn23 <https://github.com/jysohn23> Here is the information:
[image: image]
<https://user-images.githubusercontent.com/37097934/88614029-b2d65a80-d043-11ea-90ed-b95f9f505db4.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2376 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEUXZQQJHHCL6GK5G3DAW3TR5Y5KVANCNFSM4PINAFNQ>
.
|
@jysohn23 I haven't run any command of that sort. Is that a required command? I have not heard of it being required for PyTorch XLA before... |
No you shouldn't. In case you did the runtime could have been swapped to an incorrect one which is why I was asking. Looking at logs. |
Ah it looks like your TPU was created a very very long time ago. Could you try recreating your TPU? That should at least bump the limit of parameters higher past 236. |
@jysohn23 Yes, I created a new TPU node and tried with that. The error is gone. So I think this is solved. Is there a reason this had come from the old TPU node? |
Yeah so we updated our runtime entirely on the TPU side around February and your TPU seems to have been created back in January. |
@jysohn23 Thank you for the clarification! |
I was running some PyTorch XLA code (trying to add TPU support to fastai2) and obtained the attached error. What could this be due to?
odd_error.txt
The text was updated successfully, but these errors were encountered: