Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Conv and Gemm fix #2290

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

[GPU] Conv and Gemm fix #2290

wants to merge 2 commits into from

Conversation

dyoussif
Copy link
Contributor

@dyoussif dyoussif commented Dec 18, 2024

  1. Fix data type for conv zp mask; set to s16 instead of s32. This is to avoid invalid data type combination for the mad instruction,mad dst:d src0:d src1:d src2:d.

Failing case

--mode-modifier=P --conv --engine=gpu --skip-impl=ref --allow-enum-tags-only=false --check-ref-impl=true -- dt=u8:s8:u8 --attr-scales=wei:per_oc --attr-zero-points=src:per_dim_1 --attr-post-ops=hardswish:0.271:0.314+linear:0.271:0.314 g240mb32ic240ih28oc240oh14kh3sh2ph0n"f191c263e53dbb3ce0c02a13f311a72a*1"

  1. Fix gemm hang for the following case on Xe2:

--matmul --engine=gpu --dt=f16:s4:f16 --wtag=acb --attr-scales=wei:per_ocic:f16:128x1 --attr-zero-points=wei:per_ocic:u4:128x1 --attr-fpmath=f16:true --skip-impl=ref 3x96x512:3x512x64

Seems lookahead should match reqLoad

@dyoussif dyoussif requested a review from a team as a code owner December 18, 2024 23:46
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Dec 18, 2024
@dyoussif
Copy link
Contributor Author

make test
disable device_cpu
enable device_gpu
disable benchdnn_all
enable benchdnn_matmul

Comment on lines +963 to +964
auto reqLoadAq = every(kaq_load) | lookahead(kaq_load);
auto reqLoadBq = every(kbq_load) | lookahead(kbq_load);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyoussif, similar to the lines below, the lookahead should depend on whether we are repacking this data. If so -- yes, this patch is right. Otherwise, the original code is what's needed; with this patch we would load Aq/Bq too early if the group size exceeds the load chunk size for A/B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants