Raise if fsdp plugin is unset #30

alex-jw-brooks · 2024-02-01T15:21:53Z

When we try to run multigpu training, we explicitly set:

trainer.accelerator.state.fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(model)

before launching the trainer. However, the Accelerator state doesn't always define the fsdp_plugin attribute, so this can crash in a cryptic way if we launch with Torchrun for multigpu with the wrong settings.

The fsdp_plugin of the Accelerator state for multigpu configurations is set here in the Accelerate source. If you don't set ACCELERATE_USE_FSDP, it won't be defined.

This PR explicitly checks to see if the accelerator state has an fsdp_plugin before trying to update the auto wrap policy; if it doesn't, it raises an attribute error with a hint that you probably need to set ACCELERATE_USE_FSDP to run with a multigpu configuration.

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

fabianlim · 2024-03-08T00:39:08Z

@alex-jw-brooks @Ssukriti I dont think his change is needed if #53 is merged. This is because we now explicitly check trainer.is_fsdp_enabled, which will be True only if FSDP is enabled.

alex-jw-brooks · 2024-03-08T23:01:53Z

Hey @fabianlim, that sounds good - I'm not super familiar with the Accelerate codebase around the plugin, but I assume that will handle the current attribute error and that the fsdp_plugin will always be defined.

I know that a lot of default args are set to None in Accelerate, including the fsdp_plugin in the AcceleratorState, so it might be a good idea to take a closer look at some point to make sure that we can't end up in a situation where fsdp_plugin is a defined state attribute that is set to None, since that would break similarly if it were possible. For now, let's close this PR in favor of the other one though!

Raise if fsdp plugin is unset

dcab584

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks marked this pull request as ready for review February 1, 2024 15:21

Ssukriti requested a review from lchu6 February 1, 2024 17:20

alex-jw-brooks closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise if fsdp plugin is unset #30

Raise if fsdp plugin is unset #30

alex-jw-brooks commented Feb 1, 2024 •

edited

Loading

fabianlim commented Mar 8, 2024

alex-jw-brooks commented Mar 8, 2024

Raise if fsdp plugin is unset #30

Raise if fsdp plugin is unset #30

Conversation

alex-jw-brooks commented Feb 1, 2024 • edited Loading

fabianlim commented Mar 8, 2024

alex-jw-brooks commented Mar 8, 2024

alex-jw-brooks commented Feb 1, 2024 •

edited

Loading