Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Update expected shape for per token strategy #210

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Nov 19, 2024

Background

  • While implementing Accelerate Utilities #193, a bug was discovered where the shape of per-token scales is being initialized with the incorrect shape

Changes

  • Change shape of initialized quantization parameters in per token case
  • Unrelated, compare tuples rather than lists in tests/test_quantization/test_configs/test_strategies.py

Testing

  • Added new tests in tests/test_quantization/lifecycle/test_initialize.py

@kylesayrs kylesayrs self-assigned this Nov 19, 2024
Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you seeing the failures?

The only relevant case is dynamic per token, which shouldn't be initializing anything as the scales are determined on the fly - so I dont think this change is required.

@kylesayrs
Copy link
Contributor Author

@dsikka There is a per-token non-dynamic test case in the tests.

I discovered this bug while implementing #193, which uses copy_ rather than out-of-place assignment when updating quantization parameters. Because copy_ requires the original shape (which used to be (1, )) to be the same as the newly updating shape (which is (1, 1)), an error was thrown.

In general, the initialized shape should be the same as the shape computed by q_params(). While I agree this bug only appears in special cases and that this change isn't necessary outside of the context of #193, I believe that it is correct.

@dsikka
Copy link
Contributor

dsikka commented Nov 21, 2024

@dsikka There is a per-token non-dynamic test case in the tests.

I discovered this bug while implementing #193, which uses copy_ rather than out-of-place assignment when updating quantization parameters. Because copy_ requires the original shape (which used to be (1, )) to be the same as the newly updating shape (which is (1, 1)), an error was thrown.

In general, the initialized shape should be the same as the shape computed by q_params(). While I agree this bug only appears in special cases and that this change isn't necessary outside of the context of #193, I believe that it is correct.

Yeah I agree that the shape is correct.
What I'm thinking about now is when is the shape ever just 1. I guess in the static per tensor case but we actually update this in vLLM to be (1, 1) anyway (have to confirm if this is still the case) so we could make this the default potentially

@kylesayrs
Copy link
Contributor Author

@dsikka From my reading, itlooks like it's just (1, ) https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py#L121-L124

This is my understanding of the different expected shapes

Strategy Scale Shape ZP Shape
Tensor (1, ) (1, )
Channel (out_dims, 1) (out_dims, 1)
Group (in_dims, out_dims // group_size) (in_dims, out_dims // group_size)
Block (1, ) (1, )
Token (1, 1) (1, 1)

I haven't explored block and token quantization much, but the (1, ) shape for block quantization seems suspicious to me. Maybe @rahul-tuli you might be familiar?

@horheynm
Copy link
Member

@dsikka From my reading, itlooks like it's just (1, ) https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py#L121-L124

This is my understanding of the different expected shapes

Strategy Scale Shape ZP Shape
Tensor (1, ) (1, )
Channel (out_dims, 1) (out_dims, 1)
Group (in_dims, out_dims // group_size) (in_dims, out_dims // group_size)
Block (1, ) (1, )
Token (1, 1) (1, 1)
I haven't explored block and token quantization much, but the (1, ) shape for block quantization seems suspicious to me. Maybe @rahul-tuli you might be familiar?

This is helpful, maybe should include in docs somewhere. Hard to navigate to here

),
],
)
def test_initialize_quantization_parameters(weights, input_activations):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question - for output activation we dont need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could test output activation, but I decided not to for test simplicity

@rahul-tuli rahul-tuli mentioned this pull request Nov 27, 2024
4 tasks
Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also make sure we're mocking correctly since the test case you pointed out applies the config to initialize scales/zp (which would be impacted by your change) but the mock doesn't seem to care about the shape
https://github.com/neuralmagic/compressed-tensors/blob/main/tests/conftest.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants