Bert模型在Intel CPU上量化预测精度异常 #36962

yghstill · 2021-11-03T04:36:40Z

版本、环境信息：
1）PaddlePaddle版本：请提供您的PaddlePaddle版本号（如1.1）或CommitID
2）CPU：Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
3）GPU：-
4）系统环境：ubuntu 16，Python 3.7
复现信息：如为报错，请给出复现环境、复现步骤
（1）得到离线量化后的模型后，使用附件中save_quant_model.py对量化模型进行转换和优化操作导出Intel CPU上可以使用的量化模型
（2）使用PaddleInference 部署量化模型：运行cpu_infer.py执行预测（注意修改代码中设置的模型等路径），预测精度如下图，正确acc应该0.5以上，不符预期：
问题描述：我们在PaddlePaddle中适配支持Bert模型时，模型动转静后，再离线量化，模型在TRT int8上精度正常，在Intel CPU上预测流程可以跑通，但是精度异常

执行代码在code.zip

paddle-bot-old · 2021-11-03T04:36:51Z

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

lidanqing-intel · 2021-11-03T05:25:10Z

@yghstill 收到

wozna · 2021-11-03T15:58:53Z

@yghstill In the cpu_infer.py file, the module task_distill_zh is called, which is not available. Where can I find it?

lidanqing-intel · 2021-11-04T06:18:09Z

@wanghaoshuang
We can not reproduce the issue, error: ModuleNotFoundError: No module named 'task_distill_zh', could you please help? Thanks !

(py3.7) li@c3gpo:~/repo/Issue_Bert_36962$ python cpu_infer.py --model_path=$PWD/tnews_quant_models/model
Traceback (most recent call last):
  File "cpu_infer.py", line 26, in <module>
    from task_distill_zh import convert_example, METRIC_CLASSES, MODEL_CLASSES
ModuleNotFoundError: No module named 'task_distill_zh'
(py3.7) li@c3gpo:~/repo/Issue_Bert_36962$ pip install task_distill_zh
ERROR: Could not find a version that satisfies the requirement task_distill_zh
ERROR: No matching distribution found for task_distill_zh

yghstill · 2021-11-04T08:37:20Z

@lidanqing-intel @wozna Please clone code from: https://github.com/LiuChiachi/PaddleNLP/tree/add-task-distill-zh

And enter the following directory and run cpu_infer.py: https://github.com/LiuChiachi/PaddleNLP/tree/add-task-distill-zh/examples/model_compression/test_chinese_distillation

lidanqing-intel · 2021-11-04T08:55:35Z

@yghstill
I followed LiuChiachi's repo, but still, following link is not accessible for us from outside Baidu. https://paddlenlp.bj.bcebos.com/models/transformers/community/./quant_models/vocab.txt

[2021-11-04 03:03:40,990] [   ERROR] - Downloading from INFO:paddle.utils.download:unique_endpoints {''}
[2021-11-04 03:03:37,987] [    INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/community/./quant_models/vocab.txt and saved to /home/li/.paddlenlp/models/./quant_models
[2021-11-04 03:03:37,987] [    INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/community/./quant_models/vocab.txt
[2021-11-04 03:03:40,990] [   ERROR] - Downloading from https://paddlenlp.bj.bcebos.com/models/transformers/community/./quant_models/vocab.txt failed with code 404!
Traceback (most recent call last):

wget also could not download it.

wget https://paddlenlp.bj.bcebos.com/models/transformers/community/./quant_models/vocab.txt
2021-11-04 08:43:29 ERROR 404: Not Found.

but wget could download this link

wget https://paddle-inference-lib.bj.bcebos.com/2.1.0-cpu-avx-mkl/paddle_inference.tgz

Any suggestion? Maybe consult paddlepaddle framework team, we had more conversation with them maybe they know how to deal with it ?

yghstill · 2021-11-04T09:15:46Z

@lidanqing-intel Please modify quant_models to tnewst_quant_models in cpu_infer.py

lidanqing-intel · 2021-11-05T03:09:30Z

@yghstill Reproduced the issue.
@wozna

python save_quant_model.py --quant_model_path=$PWD/tnewst_quant_models/ --int8_model_save_path=INT8_PATH --ops_to_quantize="matmul,reshape,transpose"
matmul,reshape,transpose

In the ir passes, only this is fused, error could only be here.

Fused 12 ReshapeTransposeMatmulMkldnn patterns with reshape's xshape with transpose's xshape

Inference results

acc: 0.1053, time:  44.53691840171814

Maybe

scales are collected and calculated wrongly
[Check first] original model with mkldnn ON also had same wrong accuracy. Then fuse pass is wrong.
matmu_v2 is also quantized, but we haven't really verify its accuracy.

lidanqing-intel · 2021-11-08T11:10:38Z

Update:
Have tested fp32 w oneDNN ON work with good accuracy and perf, it is collecting and propogating int8 scales for matmul_v2 step get wrong.
Note: tuen oneDNN on need to turn on ir_pass, otherwise fuse will not be executed, the demo config set if false, which should not be.

sfraczek · 2021-11-17T15:08:24Z

We have found that we need to add dequantization of those matmul_v2 weights in quant2_int8_mkldnn_pass.

Can we assume that matmul_v2 weights are always assigned to input Y? Never to input X? Or should be assume that either X or Y can be weights?

lidanqing-intel · 2021-11-17T15:22:22Z

@yghstill Please answer above sfraczek's question, cause we may deliver the model before 20th

lidanqing-intel · 2021-11-18T06:43:05Z

@yghstill We also need a quant_model that is quant-aware trained with elementwise_add in the quantize_op_types during quant-aware training, This page we will update, we will add elementwise_add in the README.md.
https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/mkldnn_quant

This is wrong (inadequate), it should be

所以，在使用PaddleSlim量化训练时，只需要对 depthwise_conv2d, conv2d, mul, matmul, elementwise_add 进行量化，不支持其他op。

lidanqing-intel · 2021-11-18T08:42:43Z

@sfraczek
matmul_v2 Y could be weights or output of another op. X is always another op's output

lidanqing-intel · 2021-12-01T09:17:12Z

Should we enable all fuses of mul, matmul_v1, fc for matmul_v2
Could we just keep matmul_v2 as the only interface and mapping it into different ops: mul, matmul_v1, fc according to different situations

lidanqing-intel · 2021-12-01T09:21:15Z

Those mappings , like matmul_v2->matmul_v1, matmul_v2->mul, matmul_v2->fc, will they be remaining ?
Yes

lidanqing-intel · 2021-12-01T09:25:08Z

只需要matmul_v2 只考虑matmul_v1的所有pass，而不去考虑mul之类的。然后mamtul_v1算子不会被废弃。但是matmul_v2->matmul_v1 的 pass 会废弃。
最后solution: 能走matmul_v1的直接走v1，不能走的再走matmul_v2。matmul_v2->matmul_v1 这个 pass 会保留。

sfraczek · 2021-12-09T13:33:40Z

Intel i9 accuracy and performance
tnewst_quant_models: acc: 0.5351, time: 71.88971424102783
bert_fp32: acc: 0.5467, time: 46.44249224662781
bert_int8 acc: 0.5383, time: 36.17305397987366
fc_gelu acc: 0.5383, time: 34.68397045135498 （Perf on i9, should be faster on 6271）

lidanqing-intel · 2021-12-17T06:27:53Z

6 transformers
Bert FP32/INT8 tested on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz

Performance (fps) on 6271	Native FP32	oneDNN FP32	oneDNN INT8	oneDNN INT8/oneDNN FP32	oneDNN INT8/Native FP32
thread 1	313.53	280.13	128.82	2.17X	2.43X
thread 6	98.05	55.25	30.16	1.83X	3.25X

Accuracy on 6271	tnewst_quant_model	native without oneDNN FP32	oneDNN FP32	oneDNN INT8
thread 1/6	0.5351	0.5567	0.5567	0.5376

Paddle commit b0d12d9931f34294cdb50e9893287dc3deea5c60
config.switch_ir_optim(True) // Note: the original cpu_infer.py here is False, but that is wrong.
How to save int8 model (These default ops will be added into Paddle in next PR)

python3.7 save_quant_model.py --quant_model_path=$PWD/tnewst_quant_models/ --int8_model_save_path=INT8_PATH --ops_to_quantize="matmul,reshape,transpose,fc,matmul_v2"

oneDNN config settings in cpu_infer.py

    def create_predictor(cls, args):
        config = paddle.inference.Config(args.model_path + ".pdmodel",
                                         args.model_path + ".pdiparams")
#        config = paddle.inference.Config(args.model_path + "__model__",
#                                         args.model_path + "__params__")
            # set CPU configs accordingly,
            # such as enable_mkldnn, set_cpu_math_library_num_threads
        print("--------------------> choose cpu mode.")
        config.disable_gpu()
        config.enable_mkldnn()
        config.set_cpu_math_library_num_threads(6)
        # Note, Here is True, not false.
        config.switch_ir_optim(True)
        config.enable_memory_optim()
        config.switch_use_feed_fetch_ops(False)

lidanqing-intel · 2021-12-17T09:16:13Z

@yghstill @sfraczek Hi, performance and accuracy are benchmarked, you can verify @yghstill.

Because we are approaching the end of Q4. If there are more requirements, we will do next quarter. Some possible directions:

fc+elementwise_add
will check more

yghstill · 2021-12-23T02:35:43Z

@lidanqing-intel @sfraczek Verified. Thanks for the above optimization.

yghstill assigned lidanqing-intel Nov 3, 2021

lidanqing-intel added the Intel label Nov 3, 2021

lidanqing-intel assigned wozna Nov 3, 2021

lidanqing-intel added this to the Q4 milestone Nov 5, 2021

$@sfraczek$ sfraczek mentioned this issue Nov 10, 2021

Added reshape+transpose+matmul_v2 fuse pass #36759

Closed

$@sfraczek$ sfraczek mentioned this issue Nov 26, 2021

dequantize matmul and matmul_v2 Y weights in quant2_int8 #37618

Merged

$@sfraczek$ sfraczek mentioned this issue Dec 9, 2021

add map_matmul and fc_act_fuse passes to quant2_int8_mkldnn_pass #38023

Merged

yghstill closed this as completed Dec 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bert模型在Intel CPU上量化预测精度异常 #36962

Bert模型在Intel CPU上量化预测精度异常 #36962

yghstill commented Nov 3, 2021

paddle-bot-old bot commented Nov 3, 2021

lidanqing-intel commented Nov 3, 2021

wozna commented Nov 3, 2021 •

edited

Loading

lidanqing-intel commented Nov 4, 2021 •

edited

Loading

yghstill commented Nov 4, 2021 •

edited

Loading

lidanqing-intel commented Nov 4, 2021 •

edited

Loading

yghstill commented Nov 4, 2021 •

edited

Loading

lidanqing-intel commented Nov 5, 2021 •

edited

Loading

lidanqing-intel commented Nov 8, 2021 •

edited

Loading

sfraczek commented Nov 17, 2021

lidanqing-intel commented Nov 17, 2021

lidanqing-intel commented Nov 18, 2021

lidanqing-intel commented Nov 18, 2021

lidanqing-intel commented Dec 1, 2021

lidanqing-intel commented Dec 1, 2021 •

edited

Loading

lidanqing-intel commented Dec 1, 2021 •

edited

Loading

sfraczek commented Dec 9, 2021 •

edited by lidanqing-intel

Loading

lidanqing-intel commented Dec 17, 2021 •

edited

Loading

lidanqing-intel commented Dec 17, 2021 •

edited

Loading

yghstill commented Dec 23, 2021 •

edited

Loading

Bert模型在Intel CPU上量化预测精度异常 #36962

Bert模型在Intel CPU上量化预测精度异常 #36962

Comments

yghstill commented Nov 3, 2021

paddle-bot-old bot commented Nov 3, 2021

lidanqing-intel commented Nov 3, 2021

wozna commented Nov 3, 2021 • edited Loading

lidanqing-intel commented Nov 4, 2021 • edited Loading

yghstill commented Nov 4, 2021 • edited Loading

lidanqing-intel commented Nov 4, 2021 • edited Loading

yghstill commented Nov 4, 2021 • edited Loading

lidanqing-intel commented Nov 5, 2021 • edited Loading

lidanqing-intel commented Nov 8, 2021 • edited Loading

sfraczek commented Nov 17, 2021

lidanqing-intel commented Nov 17, 2021

lidanqing-intel commented Nov 18, 2021

lidanqing-intel commented Nov 18, 2021

lidanqing-intel commented Dec 1, 2021

lidanqing-intel commented Dec 1, 2021 • edited Loading

lidanqing-intel commented Dec 1, 2021 • edited Loading

sfraczek commented Dec 9, 2021 • edited by lidanqing-intel Loading

lidanqing-intel commented Dec 17, 2021 • edited Loading

lidanqing-intel commented Dec 17, 2021 • edited Loading

yghstill commented Dec 23, 2021 • edited Loading

wozna commented Nov 3, 2021 •

edited

Loading

lidanqing-intel commented Nov 4, 2021 •

edited

Loading

yghstill commented Nov 4, 2021 •

edited

Loading

lidanqing-intel commented Nov 4, 2021 •

edited

Loading

yghstill commented Nov 4, 2021 •

edited

Loading

lidanqing-intel commented Nov 5, 2021 •

edited

Loading

lidanqing-intel commented Nov 8, 2021 •

edited

Loading

lidanqing-intel commented Dec 1, 2021 •

edited

Loading

lidanqing-intel commented Dec 1, 2021 •

edited

Loading

sfraczek commented Dec 9, 2021 •

edited by lidanqing-intel

Loading

lidanqing-intel commented Dec 17, 2021 •

edited

Loading

lidanqing-intel commented Dec 17, 2021 •

edited

Loading

yghstill commented Dec 23, 2021 •

edited

Loading