Why is LayerNorm not quantized to int8 in PTQ?

I'm using the Neural Compressor for post-training quantization on my model. Below is the configuration I'm using:

```
 conf = PostTrainingQuantConfig(
     approach="static",
    tuning_criterion=tuning_criterion,
     accuracy_criterion=accuracy_criterion,
     device="gpu",  # Set to perform quantization on GPU
     backend="onnxrt_cuda_ep",
     op_type_dict={
         "Conv": {
             "weight": {
                 "dtype": ["int8"]
             },
            "activation": {
                 "dtype": ["int8"]
             }
         },
         "Linear": {
             "weight": {
                 "dtype": ["int8"]
             },
             "activation": {
                 "dtype": ["int8"]
             }
         },
         "BatchNorm": {
             "weight": {
                 "dtype": ["int8"]
             },
             "activation": {
                 "dtype": ["int8"]
             }
         },
         "ReLU": {
             "activation": {
                 "dtype": ["int8"]
             }
         }
     }
 )
```

Terminal Output:

```
2024-08-01 11:19:24 [INFO] Quantize the model with default config.
2024-08-01 11:19:24 [WARNING] None of the submodule got qconfig applied. Make sure you passed correct configuration through `qconfig_dict` or by assigning the `.qconfig` attribute directly on submodules.
2024-08-01 11:22:05 [INFO] Ignore LayerNorm, InstanceNorm3d and Embedding quantizable ops due to accuracy issue in PyTorch.
2024-08-01 11:22:05 [INFO] |******Mixed Precision Statistics******|
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
2024-08-01 11:22:05 [INFO] |     Op Type     |  Total   |   FP32  |
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
2024-08-01 11:22:05 [INFO] |    LayerNorm    |    6     |    6    |
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
```

Environment Information

- Operating System: Windows 10
- Python Version: 3.8
- Neural Compressor Version: v2.6
- Torch Version: 2.2.0+cu118
- Hardware: NVIDIA GPU (CUDA 11.8)

It seems that LayerNorm is not quantized to int8. Why does this happen, and how can I ensure that all layers, including LayerNorm, are quantized to int8?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is LayerNorm not quantized to int8 in PTQ? #1963

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why is LayerNorm not quantized to int8 in PTQ? #1963

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions