Fix nncf quantization for decoder models #727

echarlaix · 2024-05-23T17:26:02Z

Added a fix to be able to quantize instances of OVModelForCausalLM using the quantizer

HuggingFaceDocBuilderDev · 2024-05-23T17:31:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix · 2024-05-24T09:54:50Z

optimum/intel/openvino/modeling_decoder.py

@@ -419,7 +420,7 @@ def prepare_inputs(
 shape[2] = 0
 else:
 shape[1] = 0
- inputs[input_name] = Tensor(model_inputs.get_element_type(), shape.get_shape())
+ inputs[input_name] = np.empty([dim.get_length() for dim in shape], dtype=dtype)


Looks like instances of openvino.runtime.Tensor cannot be pickled which is needed for the fix added in #632 for the InferRequestWrapper

I hope it will not break the tracing of models. cc'ed @eaidova, @slyalin.

tests/openvino/test_quantization.py

optimum/intel/openvino/quantization.py

tests/openvino/test_quantization.py

* Fix nncf quantization for decoder models * add test * update op quant op * remove deprecated warning * update expected quantized * enable stateful * style

echarlaix added 2 commits May 23, 2024 19:24

Fix nncf quantization for decoder models

9215240

add test

0cea144

fix test

4e920f3

echarlaix marked this pull request as ready for review May 24, 2024 09:40

echarlaix requested a review from AlexKoff88 May 24, 2024 09:49

echarlaix commented May 24, 2024

View reviewed changes

trigger test

fadf9a7

AlexKoff88 reviewed May 24, 2024

View reviewed changes

tests/openvino/test_quantization.py Show resolved Hide resolved

update op quant op

af64d75

nikita-savelyevv reviewed May 24, 2024

View reviewed changes

optimum/intel/openvino/quantization.py Outdated Show resolved Hide resolved

optimum/intel/openvino/quantization.py Show resolved Hide resolved

tests/openvino/test_quantization.py Outdated Show resolved Hide resolved

echarlaix added 5 commits May 24, 2024 14:07

remove deprecated warning

6e2d448

update expected quantized

054c66a

enable stateful

e7f1569

style

3e6e47e

fix test

d3223d0

nikita-savelyevv approved these changes May 24, 2024

View reviewed changes

AlexKoff88 approved these changes May 24, 2024

View reviewed changes

echarlaix merged commit e22b2fd into main May 24, 2024
10 of 13 checks passed

echarlaix deleted the fix-quantizer branch May 24, 2024 15:59

echarlaix mentioned this pull request May 29, 2024

Fix bloom generation #736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nncf quantization for decoder models #727

Fix nncf quantization for decoder models #727

echarlaix commented May 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 23, 2024

echarlaix May 24, 2024

AlexKoff88 May 24, 2024

Fix nncf quantization for decoder models #727

Fix nncf quantization for decoder models #727

Conversation

echarlaix commented May 23, 2024 • edited Loading

HuggingFaceDocBuilderDev commented May 23, 2024

echarlaix May 24, 2024

Choose a reason for hiding this comment

AlexKoff88 May 24, 2024

Choose a reason for hiding this comment

echarlaix commented May 23, 2024 •

edited

Loading