-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Colab TPU support #189
Comments
Hi @dsmic ! Thanks for posting this. On a practical level, my first suspicion would be this
Set up backend probably calls this code , leading to a call of
If that is not it, just to be sure, are you sure the problem is within torchquad? Not sure if you can use a different torch / torch_xla version to check if you get a more verbose feedback there? |
Thx for the response. I did some digging and it seems it is just awfully slow. Taking 20seconds for preparing the next call to my function:
My function returns nearly immediately :( So I am not sure, what is so expensive with TPU... |
Can you check which device your tensors are on? I suspect you are using the CPU and not TPU because If neither works, we might need a dedicated backend type for TPUs. Not sure if we ever tried them before. |
Yes, the tensors are on the device. (my debug print prints the device. I increased the log level and the time seems to be spend within torchquad:
|
Hmmmm, okay that's good. Then, it could be that the problem is specific to vegas. I noticed you are using a comparatively small number of evaluation points, that is usually quite inefficient with VEGAS (as those evaluation are split between a number of iterations, so you parallelize over a small number of points in the end). Could you try a different integrator to see if that is better? |
Yeah, the small number of evaluations was just during testing, as I thought it might have falling back to cpu. Usually I use much much bigger numbers.... MonteCarlo is also not convincing. It is also very slow (much slower than CPU) and did even throw some wired exceptions some times .... As I have to pay for the TPU usage, I might not test to much. I am using it with the V100 NVIDIA card at the moment, which is quite fine.... Thanks for your support... |
Okay, one final thought maybe: I noticed you are using float64, could this be the problem? TPUs are targeted at float16 if I am not mistaken? |
Good tip, but does not help :( |
Feature
Desired Behavior / Functionality
Setting default device of torch to TPU should work, but it hangs
How Can It Be Tested
I have a not totally minimal example, which can be tested in google colab. If you you run it, it shows, that the TPU is set up correctly, but the integration hangs. If you interrupt you get :
In google colab there are two cells, the first installs TPU support for torch and the needed libs
The second runs the program
The text was updated successfully, but these errors were encountered: