Odd (compilation?) error #2376

tmabraham · 2020-07-27T06:04:25Z

I was running some PyTorch XLA code (trying to add TPU support to fastai2) and obtained the attached error. What could this be due to?
odd_error.txt

jysohn23 · 2020-07-27T16:20:28Z

Hi,

Looks like the real error is here:

  (0) Invalid argument: Computation requires more parameters (787) than supported (limit 236).
	 [[{{node XRTCompile}}]]
  (1) Invalid argument: Computation requires more parameters (787) than supported (limit 236).
	 [[{{node XRTCompile}}]]
	 [[XRTCompile_G3]]

The 236 limit should be higher iirc (though not entirely sure if 787 within limits). I don't know how you're updating your TPU runtime but can you follow what our example colab notebooks have? Ex. #1963 (comment)

tmabraham · 2020-07-27T17:40:57Z

@jysohn23 Sorry, I forgot to put in the information about my environment. This was done with a GCP TPU VM. So everything should be set up automatically in the torch-xla-nightly environment, no? I had run . ./scripts/update_nightly_torch_wheels.sh because I had accidentally downgraded the PyTorch version and it was not working, but I don't think that would affect this, would it?

So if I do have to run the snippet in the linked issue, do I run it inside the conda environment?

jysohn23 · 2020-07-27T17:41:56Z

What TPU did you create? pytorch-nightly as the TPU software version?

tmabraham · 2020-07-27T17:48:14Z

Yes... Just tested this again right now with same error using torch-xla-nightly conda environment on an n1-standard-16 with a v3-8 with pytorch-nightly...

tmabraham · 2020-07-27T19:20:14Z

@jysohn23 Let me know if you need more information about the bug from me...

jysohn23 · 2020-07-28T02:11:59Z

Hmm I believe that "(limit 236)" seems like you're somehow using our old runtime. Do you have the TPU name and zone of the TPU you're currently using? I can try take a look on verifying whether the runtime is the newer one.

Also, however 787 parameters actually looks like it goes over our hard current limit, which is an issue @taylanbil had also reproduced. First let's try to find our why the limit is so low.

tmabraham · 2020-07-28T02:59:40Z

@jysohn23 Here is the information:

jysohn23 · 2020-07-28T04:09:38Z

I'll take a look once I get back to my laptop, but did you run any `curl -X POST http://TPU_IP:8475/requestversion/...` command from the GCE VM? If so what was the exact command you ran?

…

On Mon, Jul 27, 2020, 7:59 PM Tanishq Abraham ***@***.***> wrote: @jysohn23 <https://github.com/jysohn23> Here is the information: [image: image] <https://user-images.githubusercontent.com/37097934/88614029-b2d65a80-d043-11ea-90ed-b95f9f505db4.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2376 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEUXZQQJHHCL6GK5G3DAW3TR5Y5KVANCNFSM4PINAFNQ> .

tmabraham · 2020-07-28T04:50:17Z

@jysohn23 I haven't run any command of that sort. Is that a required command? I have not heard of it being required for PyTorch XLA before...

jysohn23 · 2020-07-28T05:37:42Z

No you shouldn't. In case you did the runtime could have been swapped to an incorrect one which is why I was asking. Looking at logs.

jysohn23 · 2020-07-28T06:27:25Z

Ah it looks like your TPU was created a very very long time ago. Could you try recreating your TPU? That should at least bump the limit of parameters higher past 236.

tmabraham · 2020-07-28T06:51:49Z

@jysohn23 Yes, I created a new TPU node and tried with that. The error is gone. So I think this is solved. Is there a reason this had come from the old TPU node?

jysohn23 · 2020-07-28T07:04:42Z

Yeah so we updated our runtime entirely on the TPU side around February and your TPU seems to have been created back in January.

tmabraham · 2020-07-28T21:58:10Z

@jysohn23 Thank you for the clarification!

jysohn23 self-assigned this Jul 28, 2020

tmabraham closed this as completed Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd (compilation?) error #2376

Odd (compilation?) error #2376

tmabraham commented Jul 27, 2020

jysohn23 commented Jul 27, 2020

tmabraham commented Jul 27, 2020

jysohn23 commented Jul 27, 2020

tmabraham commented Jul 27, 2020

tmabraham commented Jul 27, 2020

jysohn23 commented Jul 28, 2020

tmabraham commented Jul 28, 2020

jysohn23 commented Jul 28, 2020 via email

tmabraham commented Jul 28, 2020

jysohn23 commented Jul 28, 2020

jysohn23 commented Jul 28, 2020

tmabraham commented Jul 28, 2020

jysohn23 commented Jul 28, 2020

tmabraham commented Jul 28, 2020

Odd (compilation?) error #2376

Odd (compilation?) error #2376

Comments

tmabraham commented Jul 27, 2020

jysohn23 commented Jul 27, 2020

tmabraham commented Jul 27, 2020

jysohn23 commented Jul 27, 2020

tmabraham commented Jul 27, 2020

tmabraham commented Jul 27, 2020

jysohn23 commented Jul 28, 2020

tmabraham commented Jul 28, 2020

jysohn23 commented Jul 28, 2020 via email

tmabraham commented Jul 28, 2020

jysohn23 commented Jul 28, 2020

jysohn23 commented Jul 28, 2020

tmabraham commented Jul 28, 2020

jysohn23 commented Jul 28, 2020

tmabraham commented Jul 28, 2020