-
Notifications
You must be signed in to change notification settings - Fork 286
Closed
Description
I'm using the Neural Compressor for post-training quantization on my model. Below is the configuration I'm using:
conf = PostTrainingQuantConfig(
approach="static",
tuning_criterion=tuning_criterion,
accuracy_criterion=accuracy_criterion,
device="gpu", # Set to perform quantization on GPU
backend="onnxrt_cuda_ep",
op_type_dict={
"Conv": {
"weight": {
"dtype": ["int8"]
},
"activation": {
"dtype": ["int8"]
}
},
"Linear": {
"weight": {
"dtype": ["int8"]
},
"activation": {
"dtype": ["int8"]
}
},
"BatchNorm": {
"weight": {
"dtype": ["int8"]
},
"activation": {
"dtype": ["int8"]
}
},
"ReLU": {
"activation": {
"dtype": ["int8"]
}
}
}
)
Terminal Output:
2024-08-01 11:19:24 [INFO] Quantize the model with default config.
2024-08-01 11:19:24 [WARNING] None of the submodule got qconfig applied. Make sure you passed correct configuration through `qconfig_dict` or by assigning the `.qconfig` attribute directly on submodules.
2024-08-01 11:22:05 [INFO] Ignore LayerNorm, InstanceNorm3d and Embedding quantizable ops due to accuracy issue in PyTorch.
2024-08-01 11:22:05 [INFO] |******Mixed Precision Statistics******|
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
2024-08-01 11:22:05 [INFO] | Op Type | Total | FP32 |
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
2024-08-01 11:22:05 [INFO] | LayerNorm | 6 | 6 |
2024-08-01 11:22:05 [INFO] +-----------------+----------+---------+
Environment Information
- Operating System: Windows 10
- Python Version: 3.8
- Neural Compressor Version: v2.6
- Torch Version: 2.2.0+cu118
- Hardware: NVIDIA GPU (CUDA 11.8)
It seems that LayerNorm is not quantized to int8. Why does this happen, and how can I ensure that all layers, including LayerNorm, are quantized to int8?
Metadata
Metadata
Assignees
Labels
No labels