-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended Adafactor settings for T5 cause error #7789
Comments
I think the doc should recommend Adafactor(model.parameters(), relative_step=True, warmup_init=True, lr=None) want to fix it? |
I think what corresponds to the original T5 training code is |
Hello @OyvindTafjord, have you been able to fine-tune T5 with Adafactor? Thanks, Sonali |
No, I haven't investigated further regarding the slowness and NaN's I was getting. |
This issue persists (i.e. the suggested defaults still produce the error). I can confirm that |
Environment info
transformers
version: 3.3.1Who can help
@sshleifer (from activity on Adafactor PRs)
Information
Model I am using (Bert, XLNet ...): T5
The problem arises when using:
The tasks I am working on is:
To reproduce
The Adafactor docs recommend the following for T5 :
Adafactor(model.parameters(), lr=1e-3, relative_step=False, warmup_init=True)
However, the init code then has:
which makes this setting impossible (as well as just changing to
relative_step=True
). So something seems to be missing either in the recommendations or in the implementation.Thanks!
The text was updated successfully, but these errors were encountered: