Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLA cluster environment plugin #11321

Closed
awaelchli opened this issue Jan 5, 2022 · 1 comment · Fixed by #11330
Closed

XLA cluster environment plugin #11321

awaelchli opened this issue Jan 5, 2022 · 1 comment · Fixed by #11330
Assignees
Labels
accelerator: tpu Tensor Processing Unit environment feature Is an improvement or enhancement
Milestone

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Jan 5, 2022

🚀 Feature

Add an environment for the TPU.
The TPUSpawnStrategy will use this cluster environment by default.

Motivation

Decoupling the environment lookup from the strategy will reduce the code divergence of the generic ddpspawn strategy and tpu spawn strategy.

Pitch

class TPUEnvironment(ClusterEnvironment):
    # here access the env variables set by XLA directly

List of environment variables: https://github.com/pytorch/xla/blob/master/torch_xla/core/xla_env_vars.py

Name? XLAEnvironment? TPUEnvironment?

Alternatives

Additional context

Discussion sparked in #11163
Proposed by @tchaton


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @kaushikb11 @rohitgr7 @awaelchli @ananthsub

@awaelchli awaelchli added feature Is an improvement or enhancement accelerator: tpu Tensor Processing Unit environment labels Jan 5, 2022
@awaelchli awaelchli added this to the 1.6 milestone Jan 5, 2022
@awaelchli awaelchli changed the title TPU cluster environment TPU cluster environment plugin Jan 5, 2022
@awaelchli awaelchli changed the title TPU cluster environment plugin XLA cluster environment plugin Jan 5, 2022
@kaushikb11
Copy link
Contributor

Sounds good to me!

XLAEnvironment would be the ideal name. As it could be further extended to other hardwares and we have a checkpointing plugin called XLACheckpointIO

@awaelchli awaelchli self-assigned this Jan 5, 2022
@carmocca carmocca assigned kaushikb11 and unassigned awaelchli Feb 16, 2022
@carmocca carmocca moved this to In Progress in Frameworks Planning Feb 16, 2022
@Borda Borda modified the milestones: 1.6, 1.7 Mar 21, 2022
Repository owner moved this from In Progress to Done in Frameworks Planning Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit environment feature Is an improvement or enhancement
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants