-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPTQ] Fix test #28018
[GPTQ] Fix test #28018
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Do you know why this breaking change wasn't caught on the optimum side?
Can you explain a bit more how these changes fix the problem? The changes look OK but it's not clear how they link to the issue description.
# we need to put it directly to the gpu. Otherwise, we won't be able to initialize the exllama kernel | ||
quantized_model_from_saved = AutoModelForCausalLM.from_pretrained( | ||
tmpdirname, quantization_config=GPTQConfig(use_exllama=True, bits=4), device_map={"": 0} | ||
) | ||
self.assertEqual(quantized_model_from_saved.config.quantization_config.use_exllama, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, we don't save all the arguments anymore (only those in self.serialization_keys
) by modifying to_dict()
. The issue with that is that we are updating the quantization_config based on the one from optimum: config.quantization_config = GPTQConfig.from_dict_optimum(quantizer.to_dict())
.
This line was needed since some args could change like use_exllama
in optimum.
I was thinking on doing a PR to remove this line, and maybe not save args related to inference anymore (use_exllama,...) or revert the PR on optimum. What are your thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was needed since some args could change like use_exllama in optimum.
Sorry, I don't completely follow. Does this mean that use_exllama
will no longer change and the test check is no longer required?
I was thinking on doing a PR to remove this line, and maybe not save args related to inference anymore (use_exllama,...)
It depends. This can be considered a breaking change, as users might now expect these values in their configs. The most important thing is for old configs to still be loadable and produce the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't completely follow. Does this mean that use_exllama will no longer change and the test check is no longer required?
Basically, I mean that the user can set use_exllama=True
in transformers and this value can change in optimum (use_exllama=False
). However, since we don't serialize it anymore in optimum GPTQ config, use_exllama
will be set to the default value through: config.quantization_config = GPTQConfig.from_dict_optimum(quantizer.to_dict()).
It depends. This can be considered a breaking change, as users might now expect these values in their configs. The most important thing is for old configs to still be loadable and produce the same result.
Yes, the old configs will still work. However, for new users, they will have to pass these args each time.
I will probably work on the second option then since from the start, we should not have to let the user select the kernel since we can switch from one to another.
@@ -242,12 +244,11 @@ def test_change_loading_attributes(self): | |||
with tempfile.TemporaryDirectory() as tmpdirname: | |||
self.quantized_model.save_pretrained(tmpdirname) | |||
if not self.use_exllama: | |||
self.assertEqual(self.quantized_model.config.quantization_config.use_exllama, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the tests and explaining!
I'm approving as this is effectively just fixing the tests: the changes to GPTQ config have already been made and we haven't experience lots of users complaining. However, making changes like this can be breaking and cause a lot of issues downstream. Because of the coupling between optimum and the respective configs in transformers I'd expect both for changes in API to have backwards compatibility as part of their consideration, and for there to be sufficient tests run to make sure changes in optimum that affect transformers are caught before they're merged in.
@SunMarc Do you have permissions to merge? If not, I can merge this in if it's good to go |
I'll merge it ! thx for the reminder |
* fix test * reduce length * smaller model
* fix test * reduce length * smaller model
* fix test * reduce length * smaller model
What does this PR do ?
This PR fixes failing tests related to GPTQ quantization. The breaking tests are related to modification on optimum side and OOM from the new runner. I've also replaced for a smaller model. related optimum PR