Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use duck-typing to ensure underlying optimizer supports schedulefree hooks #3055

Merged
merged 2 commits into from
Sep 2, 2024

Conversation

tmm1
Copy link
Contributor

@tmm1 tmm1 commented Aug 27, 2024

the new duck-typing approach in huggingface/transformers#30079 is causing build failures over there, because of the path through accelerate

/usr/local/lib/python3.10/site-packages/transformers/trainer.py:3448: in training_step
    self.optimizer.train()
/usr/local/lib/python3.10/site-packages/accelerate/optimizer.py:128: in train
    return self.optimizer.train()
E   AttributeError: 'AdamW' object has no attribute 'train'

we need to replicate that logic here, ensuring this code path is safe to call for all optimizers

cc @amyeroberts @muellerzr @winglian #2631

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tmm1
Copy link
Contributor Author

tmm1 commented Aug 29, 2024

friendly ping cc @amyeroberts @muellerzr

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, solution makes sense to me. cc @BenjaminBossan

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

Just for my understanding: This is necessary because in transformers, we want to do stuff like:

    if hasattr(self.optimizer, "eval") and callable(self.optimizer.eval):
        self.optimizer.eval()

In accelerate, it was assumed that optimizer.train() and optimizer.eval() are only called if the underlying optimizer supports it, but with the proposed change to transformers, they are called if the method exists, which breaks with the accelerate assumption.

IMO this could be confusing to debug and it would be better if there were a dedicated method to check this, like: if optimizer.supports_train_eval_mode() or so. But overall, the proposed solution is also okay.

@amyeroberts
Copy link

IMO this could be confusing to debug and it would be better if there were a dedicated method to check this, like: if optimizer.supports_train_eval_mode() or so. But overall, the proposed solution is also okay.

Agreed - this would be a nicer way to handle!

@muellerzr
Copy link
Collaborator

@BenjaminBossan while that's good, there's the problem of minimum accelerate versions. We can do this, in a follow-up I'll include the supports_train_eval(), and then on the Trainer side we can do some if/else versions until the minimum Trainer requirement is accelerate 1.0+ (since that's probably when that'd be slammed onto)

@muellerzr muellerzr merged commit 1d09a20 into huggingface:main Sep 2, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants