Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nncf quantization for decoder models #727

Merged
merged 10 commits into from
May 24, 2024
Merged

Fix nncf quantization for decoder models #727

merged 10 commits into from
May 24, 2024

Conversation

echarlaix
Copy link
Collaborator

@echarlaix echarlaix commented May 23, 2024

Added a fix to be able to quantize instances of OVModelForCausalLM using the quantizer

cc @nikita-savelyevv

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@echarlaix echarlaix marked this pull request as ready for review May 24, 2024 09:40
@echarlaix echarlaix requested a review from AlexKoff88 May 24, 2024 09:49
@@ -419,7 +420,7 @@ def prepare_inputs(
shape[2] = 0
else:
shape[1] = 0
inputs[input_name] = Tensor(model_inputs.get_element_type(), shape.get_shape())
inputs[input_name] = np.empty([dim.get_length() for dim in shape], dtype=dtype)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like instances of openvino.runtime.Tensor cannot be pickled which is needed for the fix added in #632 for the InferRequestWrapper

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it will not break the tracing of models. cc'ed @eaidova, @slyalin.

optimum/intel/openvino/quantization.py Outdated Show resolved Hide resolved
optimum/intel/openvino/quantization.py Show resolved Hide resolved
tests/openvino/test_quantization.py Outdated Show resolved Hide resolved
@echarlaix echarlaix merged commit e22b2fd into main May 24, 2024
10 of 13 checks passed
@echarlaix echarlaix deleted the fix-quantizer branch May 24, 2024 15:59
faaany pushed a commit to faaany/optimum-intel that referenced this pull request May 26, 2024
* Fix nncf quantization for decoder models

* add test

* update op quant op

* remove deprecated warning

* update expected quantized

* enable stateful

* style
@echarlaix echarlaix mentioned this pull request May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants