Feature request: FSDP for TPUs #422

OhadRubin · 2022-06-02T10:27:14Z

A recent contribution to the pytorch_xla repo allows using FSDP in PyTorch XLA for sharding Module parameters across data-parallel workers. pytorch/xla#3431
Some motivation behind this: It may be possible perform inference with OPT 30B on Google Colab without needing a Pro subscription, which I think many people will appreciate.
What will be needed to add it to accelerate?

muellerzr · 2022-06-02T11:44:54Z

Once the next release of PyTorch XLA is out, we'll start taking a look at this

Vatshank · 2022-11-03T00:29:28Z

Hey @muellerzr, is there ongoing work for adding XLA support to FSDP? We, on the AWS SageMaker training compiler side, have started looking into XLA-FSDP and might be able to contribute to adding such support to accelerate.

muellerzr · 2022-11-03T00:32:20Z

@Vatshank not yet! It's the next thing on my list to get to after TPU pod support, so would love the help if you guys can! 🙏

Vatshank · 2022-11-03T01:01:32Z

Okay cool @muellerzr! Although our focus is on GPUs, I am sure there will be significant overlap in the code for adding support for either device type.

What do you think would be a good way to discuss some of these implementation details? If you guys have a shared Slack group for development, for instance. Also happy to continue to bug you on GitHub, if that's preferred :)

muellerzr · 2022-11-03T15:07:38Z

@Vatshank this gh issue should be fine!

JackCaoG · 2023-06-16T22:50:30Z

@AlexWertheim With your recent pr can we call this request done?

AlexWertheim · 2023-06-16T22:56:02Z

@AlexWertheim With your recent pr can we call this request done?

Yeah, I think so. For reference, the PR in question can be seen here. @muellerzr can say better than I can whether this fulfills all requirements where accelerate is concerned.

muellerzr self-assigned this Jun 14, 2022

muellerzr added the TPU Bug or feature on TPU platforms label Jun 21, 2022

muellerzr added the feature request Request for a new feature to be added to Accelerate label Jul 16, 2022

huggingface deleted a comment from github-actions bot Jul 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: FSDP for TPUs #422

Feature request: FSDP for TPUs #422

OhadRubin commented Jun 2, 2022 •

edited

Loading

muellerzr commented Jun 2, 2022

Vatshank commented Nov 3, 2022

muellerzr commented Nov 3, 2022

Vatshank commented Nov 3, 2022 •

edited

Loading

muellerzr commented Nov 3, 2022

JackCaoG commented Jun 16, 2023

AlexWertheim commented Jun 16, 2023

Feature request: FSDP for TPUs #422

Feature request: FSDP for TPUs #422

Comments

OhadRubin commented Jun 2, 2022 • edited Loading

muellerzr commented Jun 2, 2022

Vatshank commented Nov 3, 2022

muellerzr commented Nov 3, 2022

Vatshank commented Nov 3, 2022 • edited Loading

muellerzr commented Nov 3, 2022

JackCaoG commented Jun 16, 2023

AlexWertheim commented Jun 16, 2023

OhadRubin commented Jun 2, 2022 •

edited

Loading

Vatshank commented Nov 3, 2022 •

edited

Loading