Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault with the release 1.6.0 #455

Closed
amontoison opened this issue Aug 6, 2024 · 6 comments
Closed

Segmentation fault with the release 1.6.0 #455

amontoison opened this issue Aug 6, 2024 · 6 comments

Comments

@amontoison
Copy link
Member

I have a builkite build in Krylov.jl that is now failing with a segmentation fault with the new release of oneAPI.jl.

https://buildkite.com/julialang/krylov-dot-jl/builds/916#01912422-58e7-4616-8cd3-c791d29c6abd

I'm just using dense BLAS and LAPACK routines in this package.

@maleadt
Copy link
Member

maleadt commented Aug 7, 2024

Is the crash reproducible? Could you reduce it to the code that fails (or don't you have a system to try this on)?

@amontoison
Copy link
Member Author

It seems to related to broadcast.

using Krylov, oneAPI

T = Float32
m = 20
n = 10
A_cpu = rand(T, m, n)
b_cpu = rand(T, m)
A_gpu = oneMatrix(A_cpu)
b_gpu = oneVector(b_cpu)
x, stats = lsqr(A_gpu, b_gpu)
ZeError: error occurred when building module, see build log for details (code 1879048196, ZE_RESULT_ERROR_MODULE_BUILD_FAILURE)
  Stacktrace:
    [1] throw_api_error(res::oneAPI.oneL0._ze_result_t)
      @ oneAPI.oneL0 ~/.julia/packages/oneAPI/z4Axk/lib/level-zero/libze.jl:8
    [2] oneAPI.oneL0.ZeModule(ctx::oneAPI.oneL0.ZeContext, dev::oneAPI.oneL0.ZeDevice, image::Vector{UInt8}; build_flags::String, log::Bool)
      @ oneAPI.oneL0 ~/.julia/packages/oneAPI/z4Axk/lib/level-zero/module.jl:58
    [3] ZeModule
      @ ~/.julia/packages/oneAPI/z4Axk/lib/level-zero/module.jl:11 [inlined]
    [4] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
      @ oneAPI ~/.julia/packages/oneAPI/z4Axk/src/compiler/compilation.jl:91
    [5] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams}, compiler::typeof(oneAPI.compile), linker::typeof(oneAPI.link))
      @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:257
    [6] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams}, compiler::Function, linker::Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:151
    [7] macro expansion
      @ ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:203 [inlined]
    [8] macro expansion
      @ ./lock.jl:267 [inlined]
    [9] zefunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1, oneAPI.oneL0.DeviceBuffer}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64}}; kwargs::@Kwargs{})
      @ oneAPI ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:198
   [10] zefunction
      @ ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:195 [inlined]
   [11] macro expansion
      @ ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:66 [inlined]
   [12] #launch_heuristic#93
      @ ~/.julia/packages/oneAPI/z4Axk/src/gpuarrays.jl:17 [inlined]
   [13] launch_heuristic
      @ ~/.julia/packages/oneAPI/z4Axk/src/gpuarrays.jl:15 [inlined]
   [14] _copyto!
      @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [inlined]
   [15] materialize!
      @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:38 [inlined]
   [16] materialize!
      @ ./broadcast.jl:911 [inlined]
   [17] lsqr!(solver::LsqrSolver{Float32, Float32, oneArray{Float32, 1, oneAPI.oneL0.DeviceBuffer}}, A::oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}, b::oneArray{Float32, 1, oneAPI.oneL0.DeviceBuffer}; M::UniformScaling{Bool}, N::UniformScaling{Bool}, ldiv::Bool, sqd::Bool, λ::Float32, radius::Float32, etol::Float32, axtol::Float32, btol::Float32, conlim::Float32, atol::Float32, rtol::Float32, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#461#466", iostream::Core.CoreSTDOUT)
      @ Krylov ~/Bureau/git/Krylov.jl/src/lsqr.jl:206
   [18] lsqr!
      @ ~/Bureau/git/Krylov.jl/src/lsqr.jl:168 [inlined]
   [19] lsqr(A::oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}, b::oneArray{Float32, 1, oneAPI.oneL0.DeviceBuffer}; window::Int64, M::UniformScaling{Bool}, N::UniformScaling{Bool}, ldiv::Bool, sqd::Bool, λ::Float32, radius::Float32, etol::Float32, axtol::Float32, btol::Float32, conlim::Float32, atol::Float32, rtol::Float32, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#461#466", iostream::Core.CoreSTDOUT)
      @ Krylov ~/Bureau/git/Krylov.jl/src/lsqr.jl:163
   [20] lsqr(A::oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}, b::oneArray{Float32, 1, oneAPI.oneL0.DeviceBuffer})
      @ Krylov ~/Bureau/git/Krylov.jl/src/lsqr.jl:158

@amontoison
Copy link
Member Author

amontoison commented Aug 7, 2024

@maleadt
I isolated this small code snippet:

b_cpu = rand(Float32, 10)
b_gpu = oneVector(b_cpu)
b_gpu .= zero(Float32)

I'm wondering if it's related to the "old" oneAPI_Support_jll (0.5), generated for oneAPI 2024.1.0.

@maleadt
Copy link
Member

maleadt commented Aug 13, 2024

I isolated this small code snippet:

b_cpu = rand(Float32, 10)
b_gpu = oneVector(b_cpu)
b_gpu .= zero(Float32)

Hmm, that works for me.

I'm wondering if it's related to the "old" oneAPI_Support_jll (0.5), generated for oneAPI 2024.1.0.

oneAPI_Support_jll isn't used for the operations above; that JLL is only used for oneMKL.


Can you check if #445 is related? i.e., running with SYCL_PI_LEVEL_ZERO_BATCH_SIZE=1 in your environment?


ZeError: error occurred when building module, see build log for details (code 1879048196, ZE_RESULT_ERROR_MODULE_BUILD_FAILURE)

Did that not report anything else? The log as mentioned by the error should be consulted by oneAPI.jl and printed as an @error message when it first occurs.

@amontoison
Copy link
Member Author

With SYCL_PI_LEVEL_ZERO_BATCH_SIZE=1, I have the same error.
It didn't report anything else:

1-element ExceptionStack:
ZeError: error occurred when building module, see build log for details (code 1879048196, ZE_RESULT_ERROR_MODULE_BUILD_FAILURE)
Stacktrace:
  [1] throw_api_error(res::oneAPI.oneL0._ze_result_t)
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/z4Axk/lib/level-zero/libze.jl:8
  [2] oneAPI.oneL0.ZeModule(ctx::oneAPI.oneL0.ZeContext, dev::oneAPI.oneL0.ZeDevice, image::Vector{UInt8}; build_flags::String, log::Bool)
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/z4Axk/lib/level-zero/module.jl:58
  [3] ZeModule
    @ ~/.julia/packages/oneAPI/z4Axk/lib/level-zero/module.jl:11 [inlined]
  [4] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ oneAPI ~/.julia/packages/oneAPI/z4Axk/src/compiler/compilation.jl:91
  [5] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams}, compiler::typeof(oneAPI.compile), linker::typeof(oneAPI.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:257
  [6] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/Y4hSX/src/execution.jl:151
  [7] macro expansion
    @ ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:203 [inlined]
  [8] macro expansion
    @ ./lock.jl:267 [inlined]
  [9] zefunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1, oneAPI.oneL0.DeviceBuffer}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64}}; kwargs::@Kwargs{})
    @ oneAPI ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:198
 [10] zefunction
    @ ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:195 [inlined]
 [11] macro expansion
    @ ~/.julia/packages/oneAPI/z4Axk/src/compiler/execution.jl:66 [inlined]
 [12] #launch_heuristic#93
    @ ~/.julia/packages/oneAPI/z4Axk/src/gpuarrays.jl:17 [inlined]
 [13] launch_heuristic
    @ ~/.julia/packages/oneAPI/z4Axk/src/gpuarrays.jl:15 [inlined]
 [14] _copyto!
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [inlined]
 [15] materialize!
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:38 [inlined]
 [16] materialize!(dest::oneArray{Float32, 1, oneAPI.oneL0.DeviceBuffer}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Float32}})
    @ Base.Broadcast ./broadcast.jl:911
 [17] top-level scope
    @ REPL[7]:1

@amontoison
Copy link
Member Author

amontoison commented Sep 30, 2024

The issue was fixed by the release 0.6.0 of oneAPI_Support_jll.jl.
I suggest to do a new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants