Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout for tpu check #4340

Merged
merged 10 commits into from
Nov 1, 2020
Merged

timeout for tpu check #4340

merged 10 commits into from
Nov 1, 2020

Conversation

lezwon
Copy link
Contributor

@lezwon lezwon commented Oct 25, 2020

What does this PR do?

Fixes #4324
This PR adds a timeout of 10 secs to the process which is used to fetch the type of XLA device. This ensures that the process does not hang indefinitely and does not block the importing of lightning as in the case of #4324.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Oct 25, 2020

Codecov Report

Merging #4340 into master will decrease coverage by 0%.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #4340   +/-   ##
======================================
- Coverage      93%     93%   -0%     
======================================
  Files         113     113           
  Lines        8192    8192           
======================================
- Hits         7624    7621    -3     
- Misses        568     571    +3     

@lezwon lezwon force-pushed the bugfix/4324_xla_timeout branch from 183023e to 30d7a94 Compare October 25, 2020 06:04
@lezwon lezwon self-assigned this Oct 25, 2020
@lezwon lezwon added bug Something isn't working accelerator: tpu Tensor Processing Unit labels Oct 25, 2020
@lezwon lezwon added this to the 1.0.x milestone Oct 25, 2020
pytorch_lightning/utilities/xla_device_utils.py Outdated Show resolved Hide resolved
tests/utilities/test_xla_device_utils.py Outdated Show resolved Hide resolved
tests/utilities/test_xla_device_utils.py Outdated Show resolved Hide resolved
tests/utilities/test_xla_device_utils.py Show resolved Hide resolved
@mergify mergify bot requested a review from a team October 25, 2020 09:31
@mergify
Copy link
Contributor

mergify bot commented Oct 25, 2020

This pull request is now in conflict... :(

@edenlightning
Copy link
Contributor

@lezwon mind rebasing?

@lezwon lezwon force-pushed the bugfix/4324_xla_timeout branch from 7999b31 to 7283d25 Compare October 27, 2020 16:50
@awaelchli awaelchli merged commit 839813e into master Nov 1, 2020
@awaelchli awaelchli deleted the bugfix/4324_xla_timeout branch November 1, 2020 00:04
Borda pushed a commit that referenced this pull request Nov 4, 2020
* timeout for tpu check

* added tests

* updated CHANGELOG.md

* fixed windows tests

* Update pytorch_lightning/utilities/xla_device_utils.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* requested changes

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
(cherry picked from commit 839813e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hanging when importing pytorch_lightning on google cloud vm.
6 participants