Fix Gemma RMSNorm weight init #40449

albertz · 2025-08-26T07:44:35Z

What does this PR do?

Gemma RMSNorm weight is used additively in ...*(1+weight),
thus it should be initialized with zero.

Fixes #40224

Who can review?

@ArthurZucker

Fix huggingface#40224

Rocketknight1 · 2025-08-26T12:59:59Z

Should this be here instead of in the Gemma modeling files? Leaving it to @ArthurZucker

albertz · 2025-08-26T13:07:44Z

Should this be here instead of in the Gemma modeling files? Leaving it to @ArthurZucker

I asked exactly about that in #40224 and @ArthurZucker suggested to do it that way.

ArthurZucker

Oups! Sorry @albertz ! The best way to do it:

you just update the _init_weight of gemma3 and gemma
normally this should set module._is_hf_initialized = True
thus preventing doing it twice

github-actions · 2025-09-30T11:55:13Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma

albertz · 2025-09-30T12:05:26Z

I updated the code for Gemma, i.e. GemmaModel in .../gemma/modular_gemma.py.

I did not update the code in .../gemma/modular_gemma.py as this is automatically generated? How would I update this?

Similarly, I also did not update Gemma2Model, as this derives from GemmaModel, and should already be correct? Or not?

Similarly, I also did not update Gemma3TextModel.

As far as I understand the code, I don't need to set module._is_hf_initialized = True, as the _initialize_weights method already does this?

…com/albertz/transformers into albert-fix-gemma-rmsnorm-init-40224

albertz · 2025-09-30T12:14:35Z

I just noticed that the changes I proposed have already been done in the meantime?

albertz · 2025-09-30T12:16:26Z

#40796 (b4ba4e1) by @vasqu.

I guess this PR here can be closed then.

vasqu · 2025-09-30T12:24:39Z

Sorry about that @albertz, didnt know that there was already a different PR for this :/ got to it due to a different issue I noticed

Thanks nonetheless for the PR 🤗

Fix Gemma RMSNorm weight init

62f8cdc

Fix huggingface#40224

ArthurZucker reviewed Sep 11, 2025

View reviewed changes

albertz mentioned this pull request Sep 16, 2025

Gemma RMSNorm weight init should init with 0 #40224

Closed

4 tasks

albertz added 2 commits September 30, 2025 11:53

revert

d3516da

GemmaModel _init_weights

fba4f35

fix recgemma param init

984a8fc

albertz added 3 commits September 30, 2025 14:05

Merge branch 'main' into albert-fix-gemma-rmsnorm-init-40224

3b88fc3

Merge branch 'albert-fix-gemma-rmsnorm-init-40224' of https://github.…

7d6fdea

…com/albertz/transformers into albert-fix-gemma-rmsnorm-init-40224

revert

c96a2b3

albertz closed this Sep 30, 2025

albertz deleted the albert-fix-gemma-rmsnorm-init-40224 branch September 30, 2025 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Gemma RMSNorm weight init #40449

Fix Gemma RMSNorm weight init #40449

Uh oh!

albertz commented Aug 26, 2025

Uh oh!

Rocketknight1 commented Aug 26, 2025

Uh oh!

albertz commented Aug 26, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

albertz commented Sep 30, 2025

Uh oh!

albertz commented Sep 30, 2025

Uh oh!

albertz commented Sep 30, 2025

Uh oh!

vasqu commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix Gemma RMSNorm weight init #40449

Fix Gemma RMSNorm weight init #40449

Uh oh!

Conversation

albertz commented Aug 26, 2025

What does this PR do?

Who can review?

Uh oh!

Rocketknight1 commented Aug 26, 2025

Uh oh!

albertz commented Aug 26, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

albertz commented Sep 30, 2025

Uh oh!

albertz commented Sep 30, 2025

Uh oh!

albertz commented Sep 30, 2025

Uh oh!

vasqu commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants