Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: nvidia: conv: Fix int8 convolution primitive fails #1760

Merged
merged 1 commit into from
May 5, 2024
Merged

Conversation

kala855
Copy link
Contributor

@kala855 kala855 commented Nov 28, 2023

Description

When destination scaling is applied to int8 data types some benchmarks fails. This happened because the scaling was applied over the int8 result causing saturation issues.

Fixes #1749

Solution:

We create a temporal vector to save the values in f32 and keep the computation as expected by oneDNN. The scratchpad_size is modified because of this change. We launch cudnnConvolutionForward to obtain its result as f32. We apply the scaling parameters for src/wei at this stage. After the post-operations, we apply the scaling for dst over the f32 result to avoid saturation issues. Finally, the result is converted to s8.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Bug fixes

  • Have you included information on how to reproduce the issue (either in a github issue or in this PR)?

@vpirogov vpirogov added this to the v3.4 milestone Jan 12, 2024
@vpirogov
Copy link
Member

Just a reminder that oneDNN v3.4 code freeze is coming on January 26.

@kala855
Copy link
Contributor Author

kala855 commented Jan 16, 2024

We have updated the PR. We follow the equation to apply the destination scaling after the post-operations. As cudnnConvolutionForward does not support having its output in s32 format, we follow a process to obtain it as an f32 value when the inputs are s8. Then, the scaling parameters for src/wei are applied. Finally, after the post-operations, the scaling for dst is applied over the f32 value, which is then converted to s8.

src/gpu/nvidia/cudnn_convolution_impl.hpp Show resolved Hide resolved
src/gpu/nvidia/cudnn_convolution_impl.hpp Outdated Show resolved Hide resolved
src/gpu/nvidia/cudnn_convolution_impl.hpp Outdated Show resolved Hide resolved
src/gpu/nvidia/cudnn_convolution_impl.hpp Show resolved Hide resolved
@kala855 kala855 requested a review from igorsafo February 8, 2024 09:26
@vpirogov vpirogov modified the milestones: v3.4, v3.5 Feb 13, 2024
Copy link
Contributor

@igorsafo igorsafo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another class of test cases that fail: no dst scale, but dst is s8:
./build/tests/benchdnn/benchdnn --conv --engine=gpu --skip-impl=ref --dir=FWD_I --dt=s8:s8:s8 --attr-scales=src:common:2 --attr-post-ops=sum mb1ic512iw121oc512ow122kw6pw3nconv1d:21

Even dst scale is not set, the post ops computation should happen in f32 and then the dst should be quantized but the dst_scale will be equal to 1 (default scale value).

@kala855 kala855 requested a review from igorsafo March 26, 2024 08:55
@vpirogov vpirogov added the platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia label Mar 29, 2024
@kala855
Copy link
Contributor Author

kala855 commented Apr 3, 2024

There is another class of test cases that fail: no dst scale, but dst is s8: ./build/tests/benchdnn/benchdnn --conv --engine=gpu --skip-impl=ref --dir=FWD_I --dt=s8:s8:s8 --attr-scales=src:common:2 --attr-post-ops=sum mb1ic512iw121oc512ow122kw6pw3nconv1d:21

Even dst scale is not set, the post ops computation should happen in f32 and then the dst should be quantized but the dst_scale will be equal to 1 (default scale value).

I addressed this comment in the last commit. Just a kind reminder to take a look at it. Thanks @igorsafo

src/gpu/nvidia/cudnn_convolution_impl.hpp Outdated Show resolved Hide resolved
src/gpu/nvidia/cudnn_convolution_impl.hpp Outdated Show resolved Hide resolved
src/gpu/nvidia/cudnn_convolution_impl.hpp Outdated Show resolved Hide resolved
src/gpu/nvidia/cudnn_convolution_impl.hpp Outdated Show resolved Hide resolved
@igorsafo
Copy link
Contributor

@kala855 FYI, I pushed a minor change to reduce code duplication.

@dzarukin
Copy link
Contributor

dzarukin commented May 1, 2024

Please squash all the changes and remove all merge commits - the history must be linear.

@kala855
Copy link
Contributor Author

kala855 commented May 2, 2024

@dzarukin I did the squash as you mentioned. Thank you very much. Let me know if something else needs to be done.

@dzarukin dzarukin merged commit a986231 into main May 5, 2024
11 checks passed
@dzarukin dzarukin deleted the kala855/1749 branch May 5, 2024 01:12
@dzarukin
Copy link
Contributor

dzarukin commented May 5, 2024

Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[nvidia] int8 convolution primitive fails correctness check
5 participants