-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPT-NeoX-20B past handling, attention computation #17811
Conversation
f6c9561
to
a84811d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, I can confirm that with this PR, I get the same generations in float32 and float16 (whereas before, I get either a crappy one or some Nans in float16) for EleutherAI/gpt-neox-20b
.
The cleaning up in the config LGTM, thanks for making the docstring on par with the defaults, and the two attributes you remove are not used anywhere.
The documentation is not available anymore as the PR was closed or merged. |
There are a few equivalence tests failing with the PR, if you can dive into it. Let us know if you need any help! |
@@ -635,7 +648,7 @@ def prepare_inputs_for_generation(self, input_ids, past=None, attention_mask=Non | |||
attention_mask = input_ids.new_ones(input_shape) | |||
|
|||
# cut decoder_input_ids if past is used | |||
if past is not None: | |||
if past is not None and past[0] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity why is the second statement needed here? The past[0] is not None
part?
@@ -38,32 +38,28 @@ class GPTNeoXConfig(PretrainedConfig): | |||
|
|||
|
|||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning this up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing @zphang !
ce7c60e
to
a84811d
Compare
I've run the tests locally and they pass, so I can't seem to reproduce the test errors. Can someone else give them a try? |
The tests pass on GPU but not on CPU on my side. So doing
reproduces the failure. |
…ly avoid NaN, update docs
c946908
to
d2e9de9
Compare
Thanks again! Nice to be able to use GPT-Neo-X in float16 for generations :-) |
What does this PR do?
Fixes # (issue)
#17632
#17452 (Hopefully)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sgugger