Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support backup precision option for WC #2978

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

l-bat
Copy link
Collaborator

@l-bat l-bat commented Sep 20, 2024

Changes

Add functionality to determine the backup precision to be used for layers that are not quantized to the primary precision, which is set to INT8_ASYM by default.

Example: Compress weights to INT4_ASYM channel-wise (group size=-1), except embeddings, convolutions and last linear layers - they are remain in original floating-point precision.

from nncf import compress_weights, BackupMode, CompressWeightsMode
compressed_model = compress_weights(model, mode=CompressWeightsMode.INT4_ASYM, group size=-1, backup_mode=BackupMode.NONE) # model is openvino.Model object

image
statistics:
image

For mode=CompressWeightsMode.INT4_ASYM, backup_mode=BackupMode.INT8_ASYM, and a non-empty ignored_scope, the statistics string contains three different precisions:

image

Reason for changes

To define backup mode for compress_weight

Related tickets

  • 152056

Tests

  • test_data_free_compression_with_backup_mode
  • test_data_based_compression_with_backup_mode
  • tinyllama_awq_backup_mode_none

@l-bat l-bat requested a review from a team as a code owner September 20, 2024 09:35
@github-actions github-actions bot added documentation Improvements or additions to documentation NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Sep 20, 2024
@l-bat
Copy link
Collaborator Author

l-bat commented Sep 20, 2024

With introducing backup_precision we should reformat Statistics of the bitwidth distribution representation.

  1. Add fp16/fp32 to Num bits
    image
  2. Change Num bits to Data type
    image
    image

cc @AlexKoff88, @alexsu52, @ljaljushkin

@AlexKoff88
Copy link
Contributor

With introducing backup_precision we should reformat Statistics of the bitwidth distribution representation.

  1. Add fp16/fp32 to Num bits
    image
  2. Change Num bits to Data type
    image
    image

cc @AlexKoff88, @alexsu52, @ljaljushkin

Data type looks better to me.

@l-bat
Copy link
Collaborator Author

l-bat commented Sep 20, 2024

I also extended Statistics of the bitwidth distribution with nodes from ignored scope
Output of tests/openvino/native/quantization/test_weights_compression.py::test_awq_with_ignored_scope
image

@alexsu52
Copy link
Contributor

With introducing backup_precision we should reformat Statistics of the bitwidth distribution representation.

  1. Add fp16/fp32 to Num bits
    image
  2. Change Num bits to Data type
    image
    image

cc @AlexKoff88, @alexsu52, @ljaljushkin

Data type looks better to me.

IMHO: the postfixes "asym", "sym" do not look consistent for the column name "data type". I would expect to see int8, int4, uint4, fp8_*, fp16, fp32 for data type column. I would note that we can understand scheme of quantization by data type as well. If you want to stay int4_asym, int4_sym, int8_sym and etc I would suggest to name the column as "weight compression mode".

@@ -394,6 +395,7 @@ def compress_weights(
scale_estimation: Optional[bool] = None,
gptq: Optional[bool] = None,
lora_correction: Optional[bool] = None,
backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backup_precision: Optional[BackupPrecision] = BackupPrecision.INT8_ASYM,
backup_mode: Optional[CompressWeightsMode] = CompressWeightsMode.INT8_ASYM,

I would like to propose this change. It is more general and consistent with what compression mode will be used for backup operations.
cc' @AlexKoff88, @MaximProshin

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we should add CompressWeightsMode.FP, which doesn't make sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CompressWeightsMode.FP means that NNCF does not compress backup weights then None can be used.

@l-bat what do you think about renaming backup_precision -> backup_mode and introduce BackupMode or BackupCompressWeightsModeclass:

class BackupMode(StrEnum):
  NONE = "none"
  INT8_SYM = "int8_sym"
  INT8_ASYM = "int8_asym"

cc' @AlexKoff88

nncf/parameters.py Outdated Show resolved Hide resolved
nncf/quantization/quantize_model.py Outdated Show resolved Hide resolved
)

if node.node_name in ignored_names or self._backup_precision == BackupPrecision.FP:
if node.node_name in ignored_names or self._backup_mode == BackupMode.NONE:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code leads to the different number of nodes in nodes_to_compress and all_weight_params. This leads to an error in the AWQ algorithm. This bug can also be reproduced if we add Embeddings to the ignored_scope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligned all_weight_params and nodes_to_compress below

@@ -310,6 +313,7 @@ def apply(
dataset: Optional[Dataset] = None,
) -> TModel:
self._set_backend_entity(model)
# nodes_to_compress includes nodes from the ignored scope to be added to bitwidth_distribution_str
nodes_to_compress = self._get_nodes_to_compress(graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename it to avoid confusion with actual list of nodes to compress

Suggested change
nodes_to_compress = self._get_nodes_to_compress(graph)
candidates_to_compress = self._get_nodes_to_compress(graph)

# Filter the weight parameters that should remain in their original floating-point precision
all_weight_params = [w_params for w_params in all_weight_params if w_params.compression_config is not None]
# Remove nodes in the ignored scope from nodes_to_compress
nodes_to_compress = [node for node in nodes_to_compress if node.node_name not in ignored_names]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it makes sense to collect statistics for this limited list of nodes:

if dataset is not None and self._sensitivity_metric != SensitivityMetric.WEIGHT_QUANTIZATION_ERROR:
            activations = self._get_activations(dataset, self._subset_size, nodes_to_compress, graph, model)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more consistent, I'd extract the list of nodes to compress from all_weight_params. It would guarantee the same content in both lists.

num_int8: 290
tinyllama_awq_backup_mode_none_backend_OV:
metric_value: 0.85679
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexsu52 @l-bat is it what we get on developer hosts: CLX (Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz)?

@l-bat l-bat force-pushed the lt/wc_backup_precision branch 2 times, most recently from c23fe73 to bd407a8 Compare September 27, 2024 14:35
@@ -394,6 +395,7 @@ def compress_weights(
scale_estimation: Optional[bool] = None,
gptq: Optional[bool] = None,
lora_correction: Optional[bool] = None,
backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[BackupMode] is the same as Union[BackupMode], None]. What is the behavior if the user passes None to backup_mode?

Perhaps the same behavior should be used here as for ratio.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aligned behavior with subset_size: Optional[int] = 128.

if backup_mode is None:
    raise ValueError("Invalid backup_mode specified. Please choose from the available BackupMode options.")

Copy link
Contributor

@alexsu52 alexsu52 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you still throw an error, then there is no point in indicating in the type hints that such a value can be used. The same for subset.

Suggested change
backup_mode: Optional[BackupMode] = BackupMode.INT8_ASYM,
backup_mode: BackupMode = BackupMode.INT8_ASYM,

"Default values of `ratio` (1) and `group_size` (-1) parameters can not be overridden"
)

if backup_mode != BackupMode.INT8_ASYM:
Copy link
Contributor

@alexsu52 alexsu52 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, if the user calls the following: compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_ASYM), then no error will occur, am I right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but error will occur if user calls compressed_model = compress_weights(model, mode=CompressWeightsMode.INT8_SYM, backup_mode=BackupMode.INT8_SYM)

Why we have this restriction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error occurs for all backup_mode values except for the default value (BackupMode.INT8_ASYM). This indicates that the other options are not supported for 8-bit compression.

@l-bat l-bat requested a review from alexsu52 October 2, 2024 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Public API-impacting changes documentation Improvements or additions to documentation experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants