Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: FSDP for TPUs #422

Open
OhadRubin opened this issue Jun 2, 2022 · 7 comments
Open

Feature request: FSDP for TPUs #422

OhadRubin opened this issue Jun 2, 2022 · 7 comments
Assignees
Labels
feature request Request for a new feature to be added to Accelerate TPU Bug or feature on TPU platforms

Comments

@OhadRubin
Copy link

OhadRubin commented Jun 2, 2022

A recent contribution to the pytorch_xla repo allows using FSDP in PyTorch XLA for sharding Module parameters across data-parallel workers. pytorch/xla#3431
Some motivation behind this: It may be possible perform inference with OPT 30B on Google Colab without needing a Pro subscription, which I think many people will appreciate.
What will be needed to add it to accelerate?

@muellerzr
Copy link
Collaborator

Once the next release of PyTorch XLA is out, we'll start taking a look at this

@muellerzr muellerzr self-assigned this Jun 14, 2022
@muellerzr muellerzr added the TPU Bug or feature on TPU platforms label Jun 21, 2022
@muellerzr muellerzr added the feature request Request for a new feature to be added to Accelerate label Jul 16, 2022
@huggingface huggingface deleted a comment from github-actions bot Jul 16, 2022
@Vatshank
Copy link

Vatshank commented Nov 3, 2022

Hey @muellerzr, is there ongoing work for adding XLA support to FSDP? We, on the AWS SageMaker training compiler side, have started looking into XLA-FSDP and might be able to contribute to adding such support to accelerate.

@muellerzr
Copy link
Collaborator

@Vatshank not yet! It's the next thing on my list to get to after TPU pod support, so would love the help if you guys can! 🙏

@Vatshank
Copy link

Vatshank commented Nov 3, 2022

Okay cool @muellerzr! Although our focus is on GPUs, I am sure there will be significant overlap in the code for adding support for either device type.

What do you think would be a good way to discuss some of these implementation details? If you guys have a shared Slack group for development, for instance. Also happy to continue to bug you on GitHub, if that's preferred :)

@muellerzr
Copy link
Collaborator

@Vatshank this gh issue should be fine!

@JackCaoG
Copy link

@AlexWertheim With your recent pr can we call this request done?

@AlexWertheim
Copy link

@AlexWertheim With your recent pr can we call this request done?

Yeah, I think so. For reference, the PR in question can be seen here. @muellerzr can say better than I can whether this fulfills all requirements where accelerate is concerned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature to be added to Accelerate TPU Bug or feature on TPU platforms
Projects
None yet
Development

No branches or pull requests

5 participants