Skip to content

Why is LayerNorm not quantized to int8 in PTQ? #1963

@zhangxu223

Description

@zhangxu223

I'm using the Neural Compressor for post-training quantization on my model. Below is the configuration I'm using:

 conf = PostTrainingQuantConfig(
     approach="static",
    tuning_criterion=tuning_criterion,
     accuracy_criterion=accuracy_criterion,
     device="gpu",  # Set to perform quantization on GPU
     backend="onnxrt_cuda_ep",
     op_type_dict={
         "Conv": {
             "weight": {
                 "dtype": ["int8"]
             },
            "activation": {
                 "dtype": ["int8"]
             }
         },
         "Linear": {
             "weight": {
                 "dtype": ["int8"]
             },
             "activation": {
                 "dtype": ["int8"]
             }
         },
         "BatchNorm": {
             "weight": {
                 "dtype": ["int8"]
             },
             "activation": {
                 "dtype": ["int8"]
             }
         },
         "ReLU": {
             "activation": {
                 "dtype": ["int8"]
             }
         }
     }
 )

Terminal Output:

2024-08-01 11:19:24 [INFO] Quantize the model with default config.
2024-08-01 11:19:24 [WARNING] None of the submodule got qconfig applied. Make sure you passed correct configuration through `qconfig_dict` or by assigning the `.qconfig` attribute directly on submodules.
2024-08-01 11:22:05 [INFO] Ignore LayerNorm, InstanceNorm3d and Embedding quantizable ops due to accuracy issue in PyTorch.
2024-08-01 11:22:05 [INFO] |******Mixed Precision Statistics******|
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
2024-08-01 11:22:05 [INFO] |     Op Type     |  Total   |   FP32  |
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
2024-08-01 11:22:05 [INFO] |    LayerNorm    |    6     |    6    |
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+

Environment Information

  • Operating System: Windows 10
  • Python Version: 3.8
  • Neural Compressor Version: v2.6
  • Torch Version: 2.2.0+cu118
  • Hardware: NVIDIA GPU (CUDA 11.8)

It seems that LayerNorm is not quantized to int8. Why does this happen, and how can I ensure that all layers, including LayerNorm, are quantized to int8?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions