Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPU Spawn Rank & root device Error #7074

Merged
merged 4 commits into from
Apr 18, 2021
Merged

Conversation

kaushikb11
Copy link
Contributor

@kaushikb11 kaushikb11 commented Apr 17, 2021

What does this PR do?

Fixes #7086
Fixes #7088

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Apr 17, 2021

Codecov Report

❗ No coverage uploaded for pull request base (master@7b0b0d2). Click here to learn what that means.
The diff coverage is 25%.

@@           Coverage Diff            @@
##             master   #7074   +/-   ##
========================================
  Coverage          ?     87%           
========================================
  Files             ?     196           
  Lines             ?   12573           
  Branches          ?       0           
========================================
  Hits              ?   10942           
  Misses            ?    1631           
  Partials          ?       0           

@kaushikb11 kaushikb11 marked this pull request as ready for review April 18, 2021 15:25
@kaushikb11 kaushikb11 self-assigned this Apr 18, 2021
@kaushikb11 kaushikb11 added accelerator: tpu Tensor Processing Unit bug Something isn't working priority: 0 High priority task labels Apr 18, 2021
@awaelchli awaelchli added this to the v1.3 milestone Apr 18, 2021
@kaushikb11 kaushikb11 changed the title TPU Spawn Rank Error TPU Spawn Rank & root device Error Apr 18, 2021
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the fast fix!

@@ -65,6 +65,10 @@ def local_rank(self) -> int:
def world_size(self) -> int:
return self.num_processes

@property
def root_device(self) -> torch.device:
return self.device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct, we aren't allowed to call xm.xla_device() here before we have spawned the processes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli I don't think it would be required to call root_device outside the new processes.

@awaelchli awaelchli added the ready PRs ready to be merged label Apr 18, 2021
Copy link

@georgesoteriou georgesoteriou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me! Thanks

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🐰

@Borda Borda merged commit 30b7440 into Lightning-AI:master Apr 18, 2021
kaushikb11 added a commit to kaushikb11/pytorch-lightning that referenced this pull request Apr 20, 2021
* TPU Spawn Rank Error

* Update tpu spawn

* Fix root device property for tpu spawn

* Update changelog
lexierule pushed a commit that referenced this pull request Apr 22, 2021
* TPU Spawn Rank Error

* Update tpu spawn

* Fix root device property for tpu spawn

* Update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
5 participants