-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bert模型在Intel CPU上量化预测精度异常 #36962
Comments
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~ Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day! |
@yghstill 收到 |
@yghstill In the cpu_infer.py file, the module |
@wanghaoshuang
|
@lidanqing-intel @wozna Please clone code from: https://github.com/LiuChiachi/PaddleNLP/tree/add-task-distill-zh And enter the following directory and run cpu_infer.py: https://github.com/LiuChiachi/PaddleNLP/tree/add-task-distill-zh/examples/model_compression/test_chinese_distillation |
@yghstill
wget also could not download it.
but wget could download this link
Any suggestion? Maybe consult paddlepaddle framework team, we had more conversation with them maybe they know how to deal with it ? |
@lidanqing-intel Please modify |
@yghstill Reproduced the issue.
In the ir passes, only this is fused, error could only be here.
Inference results
Maybe |
Update: |
We have found that we need to add dequantization of those matmul_v2 weights in quant2_int8_mkldnn_pass. Can we assume that matmul_v2 weights are always assigned to input Y? Never to input X? Or should be assume that either X or Y can be weights? |
@yghstill Please answer above sfraczek's question, cause we may deliver the model before 20th |
@yghstill We also need a quant_model that is quant-aware trained with
所以,在使用PaddleSlim量化训练时,只需要对 |
@sfraczek |
|
Those mappings , like matmul_v2->matmul_v1, matmul_v2->mul, matmul_v2->fc, will they be remaining ? |
只需要matmul_v2 只考虑matmul_v1的所有pass,而不去考虑mul之类的。然后mamtul_v1算子不会被废弃。但是matmul_v2->matmul_v1 的 pass 会废弃。 |
Intel i9 accuracy and performance |
6 transformers
|
@lidanqing-intel @sfraczek Verified. Thanks for the above optimization. |
版本、环境信息:
1)PaddlePaddle版本:请提供您的PaddlePaddle版本号(如1.1)或CommitID
2)CPU:Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
3)GPU:-
4)系统环境:ubuntu 16,Python 3.7
复现信息:如为报错,请给出复现环境、复现步骤
(1)得到离线量化后的模型后,使用附件中save_quant_model.py对量化模型进行转换和优化操作导出Intel CPU上可以使用的量化模型
(2)使用PaddleInference 部署量化模型:运行cpu_infer.py执行预测 (注意修改代码中设置的模型等路径),预测精度如下图,正确acc应该0.5以上,不符预期:
问题描述:我们在PaddlePaddle中适配支持Bert模型时,模型动转静后,再离线量化,模型在TRT int8上精度正常,在Intel CPU上预测流程可以跑通,但是精度异常
![image](https://user-images.githubusercontent.com/15628872/140010481-76fcb127-50e0-42b6-8139-3f1a36324f58.png)
执行代码在code.zip
code.zip
The text was updated successfully, but these errors were encountered: