-
Notifications
You must be signed in to change notification settings - Fork 31k
Fix Gemma RMSNorm weight init #40449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Gemma RMSNorm weight init #40449
Conversation
|
Should this be here instead of in the Gemma modeling files? Leaving it to @ArthurZucker |
I asked exactly about that in #40224 and @ArthurZucker suggested to do it that way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups! Sorry @albertz ! The best way to do it:
- you just update the
_init_weightof gemma3 and gemma - normally this should set module._is_hf_initialized = True
thus preventing doing it twice
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma |
|
I updated the code for Gemma, i.e. I did not update the code in Similarly, I also did not update Similarly, I also did not update As far as I understand the code, I don't need to set |
…com/albertz/transformers into albert-fix-gemma-rmsnorm-init-40224
|
I just noticed that the changes I proposed have already been done in the meantime? |
|
Sorry about that @albertz, didnt know that there was already a different PR for this :/ got to it due to a different issue I noticed Thanks nonetheless for the PR 🤗 |
What does this PR do?
Gemma RMSNorm weight is used additively in
...*(1+weight),thus it should be initialized with zero.
Fixes #40224
Who can review?
@ArthurZucker