-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel verification failure during execution of axpby with complex float inputs #473
Comments
@maleadt This looks like a driver issue or driver mismatch with the base tool kit to me. Are you using same driver version for 2024.0 and 2024.1? I see this line: Is this the version used for both base tool kits? Can you try a newer driver with 2024.1? |
Yes, we build the driver separately from the MKL bits we use from Conda. Are there driver requirements on MLK from 2024.1? If so, are these documented anywhere?
Correct; that's what I meant with
How do I do that? |
NEO/IGC/SyCL is level-zero backend, but as you're selecting GPU, maybe that question was about whether it happens also when selecting CPU backend (with same NEO/IGC version)? |
I'm confused; does NEO/IGC have a cpu back-end? FWIW, using In any case, upgrading to NEO 24.13.29138.7 seems to have fixed this. |
Scratch that, I'm still seeing the error with It doesn't reproduce consistently though; is it possible the validation isn't deterministic, or that compiled kernels are cached somewhere? |
Sorry, I was confused myself...
One way to check that it's always the same kernel code, could be dumping it on disk: |
@maleadt I just tried to reproduce this with newly released compiler 2024.2 and oneMKL 2024.2, and I cannot reproduce the issue (I am not matching your environment exactly so it's possible I'm missing something).
How often does it reproduce? Are you recompiling between attempts or just re-running the compiled executable? |
@maleadt JuliaGPU/oneAPI.jl#467 please confirm if this works post tool chain update. |
Sorry for the delay in response. We've recently upgraded oneAPI.jl to v2024.2.1, and cannot reproduce this anymore. |
Summary
When executing
oneapi::mkl::blas::column_major::axpby
withstd::complex<float>
inputs underZE_ENABLE_VALIDATION_LAYER=1
, the kernel generated by MKL fails verification.Version
MKL from oneAPI 2024.1.0 as downloaded from Conda:
This is a regression, and AFAICT not present on 2024.0.0
Environment
NEO v24.9.28717 with IGC v1.0.16238
`clinfo`
Steps to reproduce
C++ reproducer:
Observed behavior
Full symbolized backtrace:
cc @pengtu
The text was updated successfully, but these errors were encountered: