Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPTQ] Fix test #28018

Merged
merged 3 commits into from
Jan 15, 2024
Merged

[GPTQ] Fix test #28018

merged 3 commits into from
Jan 15, 2024

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Dec 13, 2023

What does this PR do ?

This PR fixes failing tests related to GPTQ quantization. The breaking tests are related to modification on optimum side and OOM from the new runner. I've also replaced for a smaller model. related optimum PR

@SunMarc SunMarc requested a review from amyeroberts December 13, 2023 20:36
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Do you know why this breaking change wasn't caught on the optimum side?
Can you explain a bit more how these changes fix the problem? The changes look OK but it's not clear how they link to the issue description.

# we need to put it directly to the gpu. Otherwise, we won't be able to initialize the exllama kernel
quantized_model_from_saved = AutoModelForCausalLM.from_pretrained(
tmpdirname, quantization_config=GPTQConfig(use_exllama=True, bits=4), device_map={"": 0}
)
self.assertEqual(quantized_model_from_saved.config.quantization_config.use_exllama, True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, we don't save all the arguments anymore (only those in self.serialization_keys) by modifying to_dict() . The issue with that is that we are updating the quantization_config based on the one from optimum: config.quantization_config = GPTQConfig.from_dict_optimum(quantizer.to_dict()).
This line was needed since some args could change like use_exllama in optimum.

I was thinking on doing a PR to remove this line, and maybe not save args related to inference anymore (use_exllama,...) or revert the PR on optimum. What are your thoughts ?

Copy link
Collaborator

@amyeroberts amyeroberts Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was needed since some args could change like use_exllama in optimum.

Sorry, I don't completely follow. Does this mean that use_exllama will no longer change and the test check is no longer required?

I was thinking on doing a PR to remove this line, and maybe not save args related to inference anymore (use_exllama,...)

It depends. This can be considered a breaking change, as users might now expect these values in their configs. The most important thing is for old configs to still be loadable and produce the same result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't completely follow. Does this mean that use_exllama will no longer change and the test check is no longer required?

Basically, I mean that the user can set use_exllama=True in transformers and this value can change in optimum (use_exllama=False). However, since we don't serialize it anymore in optimum GPTQ config, use_exllama will be set to the default value through: config.quantization_config = GPTQConfig.from_dict_optimum(quantizer.to_dict()).

It depends. This can be considered a breaking change, as users might now expect these values in their configs. The most important thing is for old configs to still be loadable and produce the same result.

Yes, the old configs will still work. However, for new users, they will have to pass these args each time.

I will probably work on the second option then since from the start, we should not have to let the user select the kernel since we can switch from one to another.

@@ -242,12 +244,11 @@ def test_change_loading_attributes(self):
with tempfile.TemporaryDirectory() as tmpdirname:
self.quantized_model.save_pretrained(tmpdirname)
if not self.use_exllama:
self.assertEqual(self.quantized_model.config.quantization_config.use_exllama, False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@SunMarc SunMarc requested a review from amyeroberts December 13, 2023 21:13
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the tests and explaining!

I'm approving as this is effectively just fixing the tests: the changes to GPTQ config have already been made and we haven't experience lots of users complaining. However, making changes like this can be breaking and cause a lot of issues downstream. Because of the coupling between optimum and the respective configs in transformers I'd expect both for changes in API to have backwards compatibility as part of their consideration, and for there to be sufficient tests run to make sure changes in optimum that affect transformers are caught before they're merged in.

@huggingface huggingface deleted a comment from github-actions bot Jan 15, 2024
@amyeroberts
Copy link
Collaborator

@SunMarc Do you have permissions to merge? If not, I can merge this in if it's good to go

@SunMarc
Copy link
Member Author

SunMarc commented Jan 15, 2024

I'll merge it ! thx for the reminder

@SunMarc SunMarc merged commit 7c8dd88 into huggingface:main Jan 15, 2024
18 checks passed
MadElf1337 pushed a commit to MadElf1337/transformers that referenced this pull request Jan 15, 2024
* fix test

* reduce length

* smaller model
wgifford pushed a commit to wgifford/transformers that referenced this pull request Jan 21, 2024
* fix test

* reduce length

* smaller model
AjayP13 pushed a commit to AjayP13/transformers that referenced this pull request Jan 22, 2024
* fix test

* reduce length

* smaller model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants