Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layernorm1p fix #7523

Merged
merged 10 commits into from
Sep 28, 2023
Merged

layernorm1p fix #7523

merged 10 commits into from
Sep 28, 2023

Conversation

dimapihtar
Copy link
Collaborator

What does this PR do ?

Passes layernorm_zero_centered_gamma param properly which fixes layernorm1p issue.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Copy link
Collaborator

@gshennvm gshennvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an option to pass normalization == "layernorm1p"? if we used layernorm1p this code still crashes

dimapihtar and others added 3 commits September 27, 2023 19:15
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
@dimapihtar dimapihtar requested a review from gshennvm September 27, 2023 16:28
@@ -72,6 +72,7 @@ model:
kv_channels: null # Projection weights dimension in multi-head attention. Set to hidden_size // num_attention_heads if null
apply_query_key_layer_scaling: False # scale Q * K^T by 1 / layer-number.
normalization: 'layernorm' # Normalization layer to use. Options are 'layernorm', 'rmsnorm'
layernorm_zero_centered_gamma: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need this flag. When normalization is 'layernorm1p' we will overwrite this flag anyway to be True. So we can hide it from the user to keep it consistent with the previous NeMo activation selection

@@ -1483,10 +1483,14 @@ def build_transformer_config(self) -> TransformerConfig:
activation_func = activation_to_func(activation)

normalization = self.cfg.get('normalization', 'layernorm')
layernorm_zero_centered_gamma = self.cfg.get('layernorm_zero_centered_gamma', 'False')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment. Can set this to be False instead

Copy link
Collaborator

@gshennvm gshennvm Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this line to layernorm_zero_centered_gamma = False ?

Copy link
Collaborator Author

@dimapihtar dimapihtar Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gshennvm I think it's better to leave it as it is since we have layernorm_zero_centered_gamma param in Launcher configs for improved models. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok with me. Users can customize it if needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both configs? layernorm1p and layernorm_zero_centered_gamma in the launcher?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximumEntropy given that we're moving to mcore and the user is likely not going to be exposed to the layernorm1p argument if they are coming from mcore. I was thinking maybe we can have both options. (i.e. if the user is more used to mcore they can say LayerNorm + layernorm_zero_centered_gamma). wdyt?

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
@ericharper ericharper merged commit 84bbad2 into r1.21.0 Sep 28, 2023
@ericharper ericharper deleted the dpykhtar/layernorm1p_fix branch September 28, 2023 22:53
github-actions bot pushed a commit that referenced this pull request Sep 28, 2023
* layernorm1p fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add layernorm1p to if statement

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* config changes

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* gpt config changes

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove layernorm_zero_centered_gamma from gpt config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* change line

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
dimapihtar added a commit that referenced this pull request Oct 10, 2023
* layernorm1p fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add layernorm1p to if statement



* config changes



* gpt config changes



* remove layernorm_zero_centered_gamma from gpt config



* change line



---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Oct 10, 2023
* layernorm1p fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add layernorm1p to if statement

* config changes

* gpt config changes

* remove layernorm_zero_centered_gamma from gpt config

* change line

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
erastorgueva-nv pushed a commit to erastorgueva-nv/NeMo that referenced this pull request Oct 11, 2023
* layernorm1p fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add layernorm1p to if statement



* config changes



* gpt config changes



* remove layernorm_zero_centered_gamma from gpt config



* change line



---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
yaoyu-33 pushed a commit that referenced this pull request Oct 16, 2023
* layernorm1p fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add layernorm1p to if statement



* config changes



* gpt config changes



* remove layernorm_zero_centered_gamma from gpt config



* change line



---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* layernorm1p fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add layernorm1p to if statement



* config changes



* gpt config changes



* remove layernorm_zero_centered_gamma from gpt config



* change line



---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants