Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove QuantizationScheme.default_scheme #202

Merged
merged 1 commit into from
Nov 22, 2024
Merged

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Nov 1, 2024

Purpose

  • Remove unused function. I believe that this function was initially intended to provide defaults to QuantizationModifier while not overwriting existing configs, but since all these values are all written to configs and the QuantizationModifier is now also the default for configs which do not specify values, this function is no longer necessary.

Prerequisites

Changes

  • Remove QuantizationScheme.default_scheme

Testing

  • Grepped codebases for uses

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify behaviour for cases that where we're only running kv_cache quantization? That is the only case I can recall where this function was relevant.

@kylesayrs
Copy link
Contributor Author

kylesayrs commented Nov 19, 2024

@dsikka Good idea. I tested a modified examples/quantization_kv_cache/llama3_fp8_kv_example.py with only kv cache quantization e2e with vllm
LC = kylesayrs/quantization-modifier-defaults
CT = this branch

@kylesayrs kylesayrs self-assigned this Nov 19, 2024
@kylesayrs kylesayrs requested a review from dsikka November 19, 2024 22:41
@dsikka
Copy link
Contributor

dsikka commented Nov 21, 2024

For completeness, could we validate that the example loads properly in vllm?
Alternatively, just make sure the keys/values are attached to the attention block in the state dict

@kylesayrs
Copy link
Contributor Author

@dsikka Yep, I validated that both kvcache quantization with weight&input quantization and without weight&input quantization load and produce valid results in vllm.

@horheynm
Copy link
Member

I remember hitting issues with this since it was setting default scheme which was not intended.
I do think we can get rid of this

@dsikka dsikka merged commit 7103a27 into main Nov 22, 2024
1 check passed
@dsikka dsikka deleted the default-QuantizationScheme branch November 22, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants