Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd (compilation?) error #2376

Closed
tmabraham opened this issue Jul 27, 2020 · 14 comments
Closed

Odd (compilation?) error #2376

tmabraham opened this issue Jul 27, 2020 · 14 comments
Assignees

Comments

@tmabraham
Copy link

I was running some PyTorch XLA code (trying to add TPU support to fastai2) and obtained the attached error. What could this be due to?
odd_error.txt

@jysohn23
Copy link
Collaborator

Hi,

Looks like the real error is here:

  (0) Invalid argument: Computation requires more parameters (787) than supported (limit 236).
	 [[{{node XRTCompile}}]]
  (1) Invalid argument: Computation requires more parameters (787) than supported (limit 236).
	 [[{{node XRTCompile}}]]
	 [[XRTCompile_G3]]

The 236 limit should be higher iirc (though not entirely sure if 787 within limits). I don't know how you're updating your TPU runtime but can you follow what our example colab notebooks have? Ex. #1963 (comment)

@tmabraham
Copy link
Author

@jysohn23 Sorry, I forgot to put in the information about my environment. This was done with a GCP TPU VM. So everything should be set up automatically in the torch-xla-nightly environment, no? I had run . ./scripts/update_nightly_torch_wheels.sh because I had accidentally downgraded the PyTorch version and it was not working, but I don't think that would affect this, would it?

So if I do have to run the snippet in the linked issue, do I run it inside the conda environment?

@jysohn23
Copy link
Collaborator

What TPU did you create? pytorch-nightly as the TPU software version?

@tmabraham
Copy link
Author

Yes... Just tested this again right now with same error using torch-xla-nightly conda environment on an n1-standard-16 with a v3-8 with pytorch-nightly...

@tmabraham
Copy link
Author

@jysohn23 Let me know if you need more information about the bug from me...

@jysohn23
Copy link
Collaborator

Hmm I believe that "(limit 236)" seems like you're somehow using our old runtime. Do you have the TPU name and zone of the TPU you're currently using? I can try take a look on verifying whether the runtime is the newer one.

Also, however 787 parameters actually looks like it goes over our hard current limit, which is an issue @taylanbil had also reproduced. First let's try to find our why the limit is so low.

@tmabraham
Copy link
Author

@jysohn23 Here is the information:
image

@jysohn23
Copy link
Collaborator

jysohn23 commented Jul 28, 2020 via email

@tmabraham
Copy link
Author

@jysohn23 I haven't run any command of that sort. Is that a required command? I have not heard of it being required for PyTorch XLA before...

@jysohn23
Copy link
Collaborator

No you shouldn't. In case you did the runtime could have been swapped to an incorrect one which is why I was asking. Looking at logs.

@jysohn23 jysohn23 self-assigned this Jul 28, 2020
@jysohn23
Copy link
Collaborator

Ah it looks like your TPU was created a very very long time ago. Could you try recreating your TPU? That should at least bump the limit of parameters higher past 236.

@tmabraham
Copy link
Author

@jysohn23 Yes, I created a new TPU node and tried with that. The error is gone. So I think this is solved. Is there a reason this had come from the old TPU node?

@jysohn23
Copy link
Collaborator

Yeah so we updated our runtime entirely on the TPU side around February and your TPU seems to have been created back in January.

@tmabraham
Copy link
Author

@jysohn23 Thank you for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants