Skip to content

Conversation

@albertz
Copy link

@albertz albertz commented Aug 26, 2025

What does this PR do?

Gemma RMSNorm weight is used additively in ...*(1+weight),
thus it should be initialized with zero.

Fixes #40224

Who can review?

@ArthurZucker

@Rocketknight1
Copy link
Member

Should this be here instead of in the Gemma modeling files? Leaving it to @ArthurZucker

@albertz
Copy link
Author

albertz commented Aug 26, 2025

Should this be here instead of in the Gemma modeling files? Leaving it to @ArthurZucker

I asked exactly about that in #40224 and @ArthurZucker suggested to do it that way.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups! Sorry @albertz ! The best way to do it:

  1. you just update the _init_weight of gemma3 and gemma
  2. normally this should set module._is_hf_initialized = True
    thus preventing doing it twice

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma

@albertz
Copy link
Author

albertz commented Sep 30, 2025

I updated the code for Gemma, i.e. GemmaModel in .../gemma/modular_gemma.py.

I did not update the code in .../gemma/modular_gemma.py as this is automatically generated? How would I update this?

Similarly, I also did not update Gemma2Model, as this derives from GemmaModel, and should already be correct? Or not?

Similarly, I also did not update Gemma3TextModel.

As far as I understand the code, I don't need to set module._is_hf_initialized = True, as the _initialize_weights method already does this?

@albertz
Copy link
Author

albertz commented Sep 30, 2025

I just noticed that the changes I proposed have already been done in the meantime?

@albertz
Copy link
Author

albertz commented Sep 30, 2025

#40796 (b4ba4e1) by @vasqu.

I guess this PR here can be closed then.

@albertz albertz closed this Sep 30, 2025
@albertz albertz deleted the albert-fix-gemma-rmsnorm-init-40224 branch September 30, 2025 12:18
@vasqu
Copy link
Contributor

vasqu commented Sep 30, 2025

Sorry about that @albertz, didnt know that there was already a different PR for this :/ got to it due to a different issue I noticed

Thanks nonetheless for the PR 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemma RMSNorm weight init should init with 0

4 participants