-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocked memory format tag (any vs BA16a64b4a) and kernel selection with INT8 #2730
Comments
Thank you for reaching us. Let me answer one by one. For Case1, according to the flag definition (https://github.com/oneapi-src/oneDNN/blob/main/src/common/verbose.cpp#L386-L412), they are memory extra flags. f# is showing extra flag for dnnl memory format This memory layout (wei_s8::blocked:BA16a64b4a:f8:zpm2) looks like coming from user's definition. Are you using a user-defined operation for matmul? |
[Update] I have tested with benchdnn matmul without zero point mapping option, then no issues come out. DNNL_VERBOSE=1 ./tests/benchdnn/benchdnn --matmul --wtag=any --dt=u8:s8:u8 700x512:512x1024 onednn_verbose,v1,primitive,exec,cpu,matmul,brg_matmul:avx512_core_vnni,undef,src:u8:a:blocked:ab::f0 wei:s8:a:blocked:BA16a64b4a::f0 dst:u8:a:blocked:ab::f0,,,700x512:512x1024,0.60791 DNNL_VERBOSE=1 ./tests/benchdnn/benchdnn --matmul --wtag=BA16a64b4a --dt=u8:s8:u8 700x512:512x1024 |
Could you provide us with a way to reproduce the steps of your issue? |
The difference is which side controls the amount of memory to allocate. When user passes You can read this behavior as "oneDNN can't make an optimized version of the matmul algorithm with src zero point other than through a special trick". Yet we can, but performance-wise there would be no sense as it requires upcasting int8 buffers to s32, and the whole point of int8 is just evaporates. The only option that you can do in such case if you really want to force that format on B is to make per-K reduction of B (or sum all K values, get N values total) and add that result through a binary post-op to the original matmul output dropping src zero-point from it. That would be the identical outcome (under some compliance of data types).
Nope. Only the library can initialize memory descriptors to that. Recommendations are provided above. Feel free to follow-up. Thank you. |
Thanks for the response Yes, We will not be observing this behavior with BenchDNN as mentioned above is due to zero-point is not passed. With BenchDNN do we have any option to pass zero-point and scales when we are dealing with INT8 ? |
There's a converter from verbose log to benchdnn. This is the benchdnn line for main branch for the verbose you shared:
Though I'm not sure which version you are on and whether you pre-modified the line before posting it, but the one posted is definitely not the fresh one... |
Thanks @dzarukin Hi @SriAlavandar DNNL_VERBOSE=1 ./tests/benchdnn/benchdnn --matmul --dt=u8:s8:u8 --stag=ab --dtag=ab --wtag=any --bia_dt=f32 --attr-scales=src:common:0+wei:common:2+dst:common:0 --attr-zero-points=src:common:1+dst:common:1 700x1024:1024x512 | grep matmul DNNL_VERBOSE=1 ./tests/benchdnn/benchdnn --matmul --dt=u8:s8:u8 --stag=ab --dtag=ab --wtag=BA16a64b4a --bia_dt=f32 --attr-scales=src:common:0+wei:common:2+dst:common:0 --attr-zero-points=src:common:1+dst:common:1 700x1024:1024x512 | grep matmul |
I am trying to run Matmul standalone with the blocked format and INT8 datatype.
Here is the combination that I am running with
Case 1: I am creating Memory descriptor for B Matrix with memory format tag any (tag::any)
Here is the log that we are observing
Case 2: I am creating Memory descriptor for B Matrix with static format tag (tag::BA16a64b4a)
Here are the following difference we can observe from above experiments
Questions:
The text was updated successfully, but these errors were encountered: