Added check to verify xla device is TPU #3274

lezwon · 2020-08-30T19:00:26Z

What does this PR do?

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-08-30T19:13:55Z

Codecov Report

Merging #3274 into master will increase coverage by 4%.
The diff coverage is 59%.

@@           Coverage Diff           @@
##           master   #3274    +/-   ##
=======================================
+ Coverage      84%     87%    +4%     
=======================================
  Files         119     118     -1     
  Lines        9764    9184   -580     
=======================================
- Hits         8169    8013   -156     
+ Misses       1595    1171   -424

tests/models/test_tpu.py

pep8speaks · 2020-08-31T15:50:28Z

Hello @lezwon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-05 17:56:08 UTC

lezwon · 2020-09-01T13:48:15Z

@Borda @justusschock @awaelchli I need some help fixing the windows tests. Has to do something with pickling.

awaelchli · 2020-09-03T22:06:16Z

@lezwon I think this is a common subprocess thing, when you launch it, it needs to import everything first before it runs it, and so we have to guard the call from the import, i.e., you would have to add if __name__ == "__main__" around the TPU_AVAILABLE constant, and also remove the decorator and apply it directly to the call. This will work:

def tpu_device_exists():
    if xm is not None:
        device = xm.xla_device()
        device_type = fetch_xla_device_type(device)
        return device_type == "TPU"
    else:
        return False


if __name__ == "__main__":
    TPU_AVAILABLE = pl_multi_process(tpu_device_exists)()
    print(TPU_AVAILABLE)

but it's obviously not how we want it.

My suggestions is to move towards a functional check, like I propose here #2877. Then, we shouldn't get the pickle error since the value is only computed at runtime.

edenlightning · 2020-09-16T17:19:22Z

hwy @lezwon mind taking a look at this again? we would love to have this issue resolved

lezwon · 2020-09-16T17:35:19Z

@edenafek will continue on it by the weekend.
Might need some help from @awaelchli on this.

Borda · 2020-09-16T22:45:39Z

@lezwon mind resolve conflicts so we know what is the actual state...

awaelchli · 2020-09-19T14:38:44Z

@lezwon do you still need help with this? is there a problem with windows?

lezwon · 2020-09-19T14:39:55Z

@lezwon do you still need help with this? is there a problem with windows?

hey, I got it working with a minor workaround. Not sure about the cause of that issue though :)

awaelchli · 2020-09-19T14:59:14Z

is it ready for review @lezwon ?

lezwon · 2020-09-19T15:16:06Z

is it ready for review @lezwon ?

yep. it is

tests/utilities/test_xla_device_utils.py

pytorch_lightning/utilities/xla_device_utils.py

awaelchli · 2020-09-19T15:21:50Z

pytorch_lightning/utilities/xla_device_utils.py

+        queue.put(None)
+
+
+def pl_multi_process(func):


there is a similar, if not identical function in tests/base/develop_utils.py
do we need both? I cannot see an obvious difference besides the inner_f being defined outside.

I basically used the other function itself as reference. the one in develop_utils is just for tests right?

yeah, but couldn't you delete the one from tests and then import from here?

The pl_multi_process_test varies a bit compared to this function. It has an assert statement within and returns either 1 or -1 for the test. This one is meant to return the device type on None.

lezwon · 2020-09-30T01:44:24Z

@lezwon I have rebased it, mind check if it is correct... :]

Thank you :) It looks good to me 👍

mergify · 2020-09-30T02:15:47Z

This pull request is now in conflict... :(

mergify · 2020-10-01T08:37:30Z

This pull request is now in conflict... :(

tests/utilities/test_xla_device_utils.py

rohitgr7

LGTM

mergify · 2020-10-04T11:50:07Z

This pull request is now in conflict... :(

# Conflicts: # CHANGELOG.md # pytorch_lightning/accelerators/tpu_backend.py # pytorch_lightning/trainer/data_loading.py # tests/models/test_tpu.py

mergify · 2020-10-05T03:27:14Z

This pull request is now in conflict... :(

mergify bot requested a review from a team August 30, 2020 19:00

lezwon marked this pull request as draft August 30, 2020 19:01

Borda added bug Something isn't working accelerator: tpu Tensor Processing Unit labels Aug 30, 2020

Borda reviewed Aug 30, 2020

View reviewed changes

tests/models/test_tpu.py Outdated Show resolved Hide resolved

lezwon force-pushed the TPU_device_check branch from 0cf6eaa to f52321a Compare August 31, 2020 15:50

lezwon force-pushed the TPU_device_check branch 4 times, most recently from a4fa477 to f315261 Compare September 1, 2020 13:32

lezwon force-pushed the TPU_device_check branch 3 times, most recently from dabcf7f to 17867e2 Compare September 6, 2020 02:29

lezwon force-pushed the TPU_device_check branch 2 times, most recently from ef11533 to 8460e0d Compare September 19, 2020 06:55

lezwon marked this pull request as ready for review September 19, 2020 09:05

mergify bot requested a review from a team September 19, 2020 09:06

awaelchli self-requested a review September 19, 2020 14:54

awaelchli reviewed Sep 19, 2020

View reviewed changes

lezwon added 3 commits September 30, 2020 06:55

fixed bug

aa1dff0

updated CHANGELOG.md

28bd155

added todo

051df9b

lezwon force-pushed the TPU_device_check branch from 0ceeb30 to 051df9b Compare September 30, 2020 01:26

williamFalcon and others added 3 commits September 30, 2020 08:35

Merge branch 'master' into TPU_device_check

ab8a8c3

added type hints

e9b8c63

isort and black

f024a69

Borda force-pushed the TPU_device_check branch 2 times, most recently from 0a1a435 to 3a71c87 Compare October 2, 2020 10:58

rohitgr7 reviewed Oct 2, 2020

View reviewed changes

tests/utilities/test_xla_device_utils.py Show resolved Hide resolved

mergify bot requested a review from a team October 2, 2020 11:44

rohitgr7 reviewed Oct 2, 2020

View reviewed changes

tests/utilities/test_xla_device_utils.py Show resolved Hide resolved

mergify bot requested a review from a team October 2, 2020 11:48

rohitgr7 approved these changes Oct 2, 2020

View reviewed changes

mergify bot requested a review from a team October 2, 2020 14:40

Borda requested review from awaelchli, nateraw, tullie and yukw777 October 2, 2020 15:06

Merge remote-tracking branch 'origin/master' into TPU_device_check

7732fc5

# Conflicts: # CHANGELOG.md # pytorch_lightning/accelerators/tpu_backend.py # pytorch_lightning/trainer/data_loading.py # tests/models/test_tpu.py

lezwon force-pushed the TPU_device_check branch from 3a71c87 to 7732fc5 Compare October 4, 2020 14:38

SkafteNicki approved these changes Oct 5, 2020

View reviewed changes

Merge branch 'master' into TPU_device_check

8641d70

Borda merged commit 69833da into Lightning-AI:master Oct 6, 2020

Borda added this to the 0.10.0 milestone Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added check to verify xla device is TPU #3274

Added check to verify xla device is TPU #3274

lezwon commented Aug 30, 2020 •

edited

Loading

codecov bot commented Aug 30, 2020 •

edited

Loading

pep8speaks commented Aug 31, 2020 •

edited

Loading

lezwon commented Sep 1, 2020

awaelchli commented Sep 3, 2020

edenlightning commented Sep 16, 2020

lezwon commented Sep 16, 2020

Borda commented Sep 16, 2020

awaelchli commented Sep 19, 2020

lezwon commented Sep 19, 2020

awaelchli commented Sep 19, 2020 •

edited

Loading

lezwon commented Sep 19, 2020

awaelchli Sep 19, 2020

lezwon Sep 20, 2020

justusschock Sep 25, 2020

lezwon Sep 30, 2020

lezwon commented Sep 30, 2020

mergify bot commented Sep 30, 2020

mergify bot commented Oct 1, 2020

rohitgr7 left a comment

mergify bot commented Oct 4, 2020

mergify bot commented Oct 5, 2020

Added check to verify xla device is TPU #3274

Added check to verify xla device is TPU #3274

Conversation

lezwon commented Aug 30, 2020 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Aug 30, 2020 • edited Loading

Codecov Report

pep8speaks commented Aug 31, 2020 • edited Loading

Comment last updated at 2020-10-05 17:56:08 UTC

lezwon commented Sep 1, 2020

awaelchli commented Sep 3, 2020

edenlightning commented Sep 16, 2020

lezwon commented Sep 16, 2020

Borda commented Sep 16, 2020

awaelchli commented Sep 19, 2020

lezwon commented Sep 19, 2020

awaelchli commented Sep 19, 2020 • edited Loading

lezwon commented Sep 19, 2020

awaelchli Sep 19, 2020

Choose a reason for hiding this comment

lezwon Sep 20, 2020

Choose a reason for hiding this comment

justusschock Sep 25, 2020

Choose a reason for hiding this comment

lezwon Sep 30, 2020

Choose a reason for hiding this comment

lezwon commented Sep 30, 2020

mergify bot commented Sep 30, 2020

mergify bot commented Oct 1, 2020

rohitgr7 left a comment

Choose a reason for hiding this comment

mergify bot commented Oct 4, 2020

mergify bot commented Oct 5, 2020

lezwon commented Aug 30, 2020 •

edited

Loading

codecov bot commented Aug 30, 2020 •

edited

Loading

pep8speaks commented Aug 31, 2020 •

edited

Loading

awaelchli commented Sep 19, 2020 •

edited

Loading