Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPT-NeoX-20B past handling, attention computation #17811

Merged
merged 2 commits into from
Jun 30, 2022

Conversation

zphang
Copy link
Contributor

@zphang zphang commented Jun 21, 2022

What does this PR do?

  • Fixes GPT-NeoX-20B handing of the past object to correctly be used in .generate
  • Swaps attention computation for one more similar in the original training code, to hopefully avoid NaNs
  • Update docstring, removed unnecessary dropout configs in config object

Fixes # (issue)

#17632
#17452 (Hopefully)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@sgugger

@zphang zphang force-pushed the neox_20b_fixes_220621_v2 branch from f6c9561 to a84811d Compare June 21, 2022 22:42
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, I can confirm that with this PR, I get the same generations in float32 and float16 (whereas before, I get either a crappy one or some Nans in float16) for EleutherAI/gpt-neox-20b.

The cleaning up in the config LGTM, thanks for making the docstring on par with the defaults, and the two attributes you remove are not used anywhere.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 21, 2022

The documentation is not available anymore as the PR was closed or merged.

@sgugger
Copy link
Collaborator

sgugger commented Jun 21, 2022

There are a few equivalence tests failing with the PR, if you can dive into it. Let us know if you need any help!

@@ -635,7 +648,7 @@ def prepare_inputs_for_generation(self, input_ids, past=None, attention_mask=Non
attention_mask = input_ids.new_ones(input_shape)

# cut decoder_input_ids if past is used
if past is not None:
if past is not None and past[0] is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity why is the second statement needed here? The past[0] is not None part?

@@ -38,32 +38,28 @@ class GPTNeoXConfig(PretrainedConfig):


Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing @zphang !

@zphang zphang force-pushed the neox_20b_fixes_220621_v2 branch from ce7c60e to a84811d Compare June 29, 2022 01:03
@zphang
Copy link
Contributor Author

zphang commented Jun 29, 2022

I've run the tests locally and they pass, so I can't seem to reproduce the test errors. Can someone else give them a try?

@sgugger
Copy link
Collaborator

sgugger commented Jun 29, 2022

The tests pass on GPU but not on CPU on my side. So doing

CUDA_VISIBLE_DEVICES="" pytest tests/models/gpt_neox/test_modeling_gpt_neox.py

reproduces the failure.

@zphang zphang force-pushed the neox_20b_fixes_220621_v2 branch from c946908 to d2e9de9 Compare June 30, 2022 03:55
@sgugger sgugger merged commit 205bc41 into huggingface:main Jun 30, 2022
@sgugger
Copy link
Collaborator

sgugger commented Jun 30, 2022

Thanks again! Nice to be able to use GPT-Neo-X in float16 for generations :-)

viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
)

* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs

* 20B tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants