-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix fsdp_auto_wrap_policy #2167
fix fsdp_auto_wrap_policy #2167
Conversation
Thanks for working on this fix. Could you give an example of a model that would currently fail but work with this fix? Ideally, we can build a unit test based on this. |
|
I could not replicate for Donut: >>> from peft.utils.other import fsdp_auto_wrap_policy
>>> model = DonutSwinPreTrainedModel.from_pretrained('naver-clova-ix/donut-base')
>>> model._no_split_modules
['DonutSwinStage']
>>> fsdp_auto_wrap_policy(model) # works For Pix2struct, I do get: >>> model = Pix2StructForConditionalGeneration.from_pretrained("google/pix2struct-base")
>>> model._no_split_modules
None
>>> fsdp_auto_wrap_policy(model)
Exception: Could not find the transformer layer class to wrap in the model. but of course I don't need to use PEFT's |
In the Transformers |
Thanks a lot for the pointer, it makes sense that Let's add a small test to ensure that the function does not fail in such cases. We already have a test class here: peft/tests/test_gpu_examples.py Line 3737 in e8259ff
We can just add the new test in there. As to the model, I think it's sufficient to just create a custom model and check that calling def test_fsdp_auto_wrap_policy_does_not_raise_on_custom_model(self):
# See #2167
# Avoid raising on custom models since Trainer uses fsdp_auto_wrap_policy automatically for PEFT + FSDP
class MyModule(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(2, 3)
fsdp_auto_wrap_policy(MyModule()) # does not raise |
You already have a toy model called |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating the issue and providing a fix. LGTM.
You already have a toy model called SimpleModel. I am using it for testing.
Good catch.
fix the issue that fsdp_auto_wrap_policy is not working when FSDP_TRANSFORMER_CLS_TO_WRAP and the model's _no_split_modules are None
@BenjaminBossan @sayakpaul
Fixes #2166