-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty profile information on v1.11 #56327
Comments
Cannot reproduce this on 1.11.1 with
Everything is working as expected there. Edit: Can reproduce if I use 8 threads. |
Looks like dgemm kernels in openblas are missing correct unwind info now, since C=true shows the missing samples:
|
FYI @gbaraldi given we also encountered this and couldn't figure it out. |
openblas uses hand written assembly for this, so it is missing the required CFI info in the PROLOGUE macro that would normally be auto-generated by the compiler (https://github.com/OpenMathLib/OpenBLAS/blob/ac736820d7001a715395b01412823fe712566eb6/common_x86.h#L300) |
It's not limited to |
If you can share equivalent code it'd help debug this. |
Example from readme of julia> versioninfo();
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 7840U w/ Radeon 780M Graphics
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
JULIA_NUM_THREADS = 8
julia> using Profile
julia> function profile_test(n)
for i = 1:n
A = randn(100,100,20)
m = maximum(A)
Am = mapslices(sum, A; dims=2)
B = A[:,:,5]
Bsort = mapslices(sort, B; dims=1)
b = rand(100)
C = B.*b
end
end
profile_test (generic function with 1 method)
julia> @profile profile_test(2);
julia> Profile.clear(); @profile profile_test(10);
julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎12 @Base/task.jl:694; task_done_hook(t::Task)
╎ 12 @Base/task.jl:1021; wait()
11╎ 12 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
Total snapshots: 12. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
|
Can you show |
Here it is: julia> versioninfo();
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
julia> using Profile
julia> function profile_test(n)
for i = 1:n
A = randn(100,100,20)
m = maximum(A)
Am = mapslices(sum, A; dims=2)
B = A[:,:,5]
Bsort = mapslices(sort, B; dims=1)
b = rand(100)
C = B.*b
end
end
profile_test (generic function with 1 method)
julia> @profile profile_test(2);
julia> Profile.clear(); @profile profile_test(10);
julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎57 @REPL/src/LineEdit.jl:2835; #281
╎ 57 @Base/channels.jl:597; wait
╎ 57 @Base/condition.jl:125; wait
╎ 57 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; …
╎ 57 @Base/task.jl:1021; wait()
56╎ 57 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized{…
Total snapshots: 57. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
julia> Profile.clear(); @profile profile_test(10);
julia> Profile.print(C=true)
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎22 …julia-master/src/task.c:1213; start_task
╎ 22 …julia-master/src/task.c:319; jl_finish_task
╎ 22 …lia-master/src/julia.h:2157; jl_apply
╎ 22 …ux.gnu/lib/julia/sys.so:?; jfptr_task_done_hook_65631.1
╎ 22 @Base/task.jl:694; task_done_hook(t::Task)
╎ 22 @Base/task.jl:1021; wait()
╎ ╎ 22 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized…
╎ ╎ 22 …ster/src/scheduler.c:584; ijl_task_get_next
╎ ╎ 22 …uv/src/unix/thread.c:822; uv_cond_wait
╎ ╎ 22 …-linux-gnu/libc.so.6:?; pthread_cond_wait
21╎ ╎ 22 …-linux-gnu/libc.so.6:?;
Total snapshots: 22. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task. |
I think there's a bigger issue than failed unwinds happening. |
@gdalle it'd be good to rule out your code executing too quickly. Can you try with 1000? on 1.11.1 on MacOS:
|
Same behavior: julia> Profile.clear(); @profile profile_test(1000); Profile.print();
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎1453 @Base/task.jl:694; task_done_hook(t::Task)
╎ 1453 @Base/task.jl:1021; wait()
1452╎ 1453 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized…
Total snapshots: 1453. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
julia> Profile.clear(); @profile profile_test(1000); Profile.print(C=true);
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎1468 …lia-master/src/task.c:1213; start_task
╎ 1468 …lia-master/src/task.c:319; jl_finish_task
╎ 1468 …a-master/src/julia.h:2157; jl_apply
╎ 1468 ….gnu/lib/julia/sys.so:?; jfptr_task_done_hook_65631.1
╎ 1468 @Base/task.jl:694; task_done_hook(t::Task)
╎ 1468 @Base/task.jl:1021; wait()
╎ ╎ 1468 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchron…
╎ ╎ 1468 …ter/src/scheduler.c:584; ijl_task_get_next
╎ ╎ 1468 …/src/unix/thread.c:822; uv_cond_wait
╎ ╎ 1468 …linux-gnu/libc.so.6:?; pthread_cond_wait
1467╎ ╎ 1468 …inux-gnu/libc.so.6:?;
Total snapshots: 1468. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task. Did you run your test with |
@gdalle does nightly have the same issue? |
Indeed this looks fixed on nightly, so we're probably missing a backport. julia> versioninfo();
Julia Version 1.12.0-DEV.1502
Commit ee09ae70d9f (2024-10-26 01:01 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 8 GC (on 8 virtual cores)
julia> using Profile
julia> function profile_test(n)
for i = 1:n
A = randn(100,100,20)
m = maximum(A)
Am = mapslices(sum, A; dims=2)
B = A[:,:,5]
Bsort = mapslices(sort, B; dims=1)
b = rand(100)
C = B.*b
end
end
profile_test (generic function with 1 method)
julia> @profile profile_test(2);
julia> Profile.clear(); @profile profile_test(10);
julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line Function
=========================================================
╎16 @Base/client.jl:568 _start()
╎ 16 @Base/client.jl:593 repl_main
╎ 16 @Base/client.jl:511 run_main_repl(interactive::Bool, quiet::Bool, ba…
╎ 16 @Base/essentials.jl:1046 invokelatest
╎ 16 @Base/essentials.jl:1049 #invokelatest#1
╎ 16 @Base/client.jl:490 run_std_repl
╎ ╎ 16 @REPL/src/REPL.jl:676 run_repl
╎ ╎ 16 @REPL/src/REPL.jl:690 #run_repl#48
╎ ╎ 16 @REPL/src/REPL.jl:464 start_repl_backend
╎ ╎ 16 @REPL/src/REPL.jl:467 #start_repl_backend#41
╎ ╎ 16 @REPL/src/REPL.jl:482 repl_backend_loop
╎ ╎ ╎ 16 @REPL/src/REPL.jl:370 eval_user_input
╎ ╎ ╎ 16 @REPL/src/REPL.jl:345 toplevel_eval_with_hooks
╎ ╎ ╎ 16 @REPL/src/REPL.jl:352 toplevel_eval_with_hooks
╎ ╎ ╎ 16 @REPL/src/REPL.jl:352 toplevel_eval_with_hooks
╎ ╎ ╎ 16 @REPL/…rc/REPL.jl:352 toplevel_eval_with_hooks
╎ ╎ ╎ ╎ 16 @REPL/…rc/REPL.jl:352 toplevel_eval_with_hooks
╎ ╎ ╎ ╎ 16 @REPL/…c/REPL.jl:348 toplevel_eval_with_hooks
╎ ╎ ╎ ╎ 16 @REPL/…c/REPL.jl:341 __repl_entry_eval_expanded_with…
╎ ╎ ╎ ╎ 6 REPL[3]:3 profile_test(n::Int64)
╎ ╎ ╎ ╎ 6 @Random/…mal.jl:278 randn
╎ ╎ ╎ ╎ ╎ 6 @Random/…mal.jl:272 randn
╎ ╎ ╎ ╎ ╎ 2 @Base/boot.jl:642 Array
╎ ╎ ╎ ╎ ╎ 2 @Base/boot.jl:629 Array
╎ ╎ ╎ ╎ ╎ 2 @Base/boot.jl:579 new_as_memoryref
2╎ ╎ ╎ ╎ ╎ 2 @Base/boot.jl:562 GenericMemory
╎ ╎ ╎ ╎ ╎ 1 @Random/…al.jl:257 randn!(rng::Random.TaskLocalR…
╎ ╎ ╎ ╎ ╎ 1 @Random/…om.jl:269 rand!
╎ ╎ ╎ ╎ ╎ 1 @Random/…om.jl:269 rand!
╎ ╎ ╎ ╎ ╎ 1 @Random/…d.jl:303 rand!
╎ ╎ ╎ ╎ ╎ ╎ 1 @Random/…d.jl:169 xoshiro_bulk
╎ ╎ ╎ ╎ ╎ ╎ 1 @Random/….jl:170 xoshiro_bulk
╎ ╎ ╎ ╎ ╎ ╎ 1 @Random/….jl:266 xoshiro_bulk_simd(rng::Ra…
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…er.jl:180 unsafe_store!
1╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…er.jl:180 unsafe_store!
╎ ╎ ╎ ╎ ╎ 3 @Random/…al.jl:260 randn!(rng::Random.TaskLocalR…
╎ ╎ ╎ ╎ ╎ 2 @Random/…al.jl:89 _randn
╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:627 rem
╎ ╎ ╎ ╎ ╎ 1 @Base/…ors.jl:321 !=
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:518 ==
1╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…on.jl:639 ==
╎ ╎ ╎ ╎ ╎ 1 @Base/…tion.jl:430 *
1╎ ╎ ╎ ╎ ╎ 1 @Base/…oat.jl:493 *
╎ ╎ ╎ ╎ ╎ 1 @Random/…al.jl:90 _randn
╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:519 <
1╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:513 <
╎ ╎ ╎ ╎ 4 REPL[3]:4 profile_test(n::Int64)
╎ ╎ ╎ ╎ 4 @Base/…cedim.jl:979 maximum
╎ ╎ ╎ ╎ ╎ 4 @Base/…cedim.jl:979 #maximum#730
╎ ╎ ╎ ╎ ╎ 4 @Base/…edim.jl:983 _maximum
╎ ╎ ╎ ╎ ╎ 4 @Base/…edim.jl:983 #_maximum#732
╎ ╎ ╎ ╎ ╎ 4 @Base/…edim.jl:984 _maximum
╎ ╎ ╎ ╎ ╎ 4 @Base/…dim.jl:984 #_maximum#733
╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/…dim.jl:326 mapreduce
╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/…im.jl:326 #mapreduce#715
╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/…im.jl:334 _mapreduce_dim
╎ ╎ ╎ ╎ ╎ ╎ 4 @Base/…ce.jl:436 _mapreduce(f::typeof(ide…
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:642 mapreduce_impl(f::typeo…
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:615 _fast
2╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ls.jl:790 ifelse
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:643 mapreduce_impl(f::typeo…
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:615 _fast
2╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ls.jl:790 ifelse
╎ ╎ ╎ ╎ 5 REPL[3]:5 profile_test(n::Int64)
╎ ╎ ╎ ╎ 5 @Base/…array.jl:3300 mapslices
╎ ╎ ╎ ╎ ╎ 5 @Base/…rray.jl:3361 mapslices(f::typeof(sum), A::…
1╎ ╎ ╎ ╎ ╎ 3 @Base/…rray.jl:3372 _inner_mapslices!(R::Array{F…
╎ ╎ ╎ ╎ ╎ 2 @Base/…onal.jl:987 _unsafe_getindex!(::Vector{F…
╎ ╎ ╎ ╎ ╎ 2 @Base/…onal.jl:977 macro expansion
╎ ╎ ╎ ╎ ╎ 2 @Base/…sian.jl:64 macro expansion
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…nal.jl:979 macro expansion
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ay.jl:1000 setindex!
2╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ay.jl:1005 _setindex!
╎ ╎ ╎ ╎ ╎ 2 @Base/…rray.jl:3373 _inner_mapslices!(R::Array{F…
╎ ╎ ╎ ╎ ╎ 2 @Base/…edim.jl:979 sum
╎ ╎ ╎ ╎ ╎ 2 @Base/…edim.jl:979 #sum#722
╎ ╎ ╎ ╎ ╎ 2 @Base/…dim.jl:983 _sum
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…dim.jl:983 #_sum#724
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…im.jl:984 _sum
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…im.jl:984 #_sum#725
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…im.jl:326 mapreduce
╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…im.jl:326 #mapreduce#715
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…im.jl:334 _mapreduce_dim
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:436 _mapreduce
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:269 mapreduce_impl
╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…ce.jl:255 mapreduce_impl(f::t…
1╎ ╎ ╎ ╎ ╎ ╎ ╎ 2 @Base/…op.jl:75 macro expansion
1╎ ╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/int.jl:83 <
╎ ╎ ╎ ╎ 1 REPL[3]:7 profile_test(n::Int64)
╎ ╎ ╎ ╎ 1 @Base/…array.jl:3300 mapslices
╎ ╎ ╎ ╎ ╎ 1 @Base/…rray.jl:3361 mapslices(f::typeof(sort), A:…
╎ ╎ ╎ ╎ ╎ 1 @Base/…rray.jl:3373 _inner_mapslices!(R::Matrix{…
╎ ╎ ╎ ╎ ╎ 1 @Base/sort.jl:1737 sort
╎ ╎ ╎ ╎ ╎ 1 @Base/sort.jl:1737 #sort#24
╎ ╎ ╎ ╎ ╎ 1 @Base/sort.jl:1704 sort!
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:1711 #sort!#23
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:561 _sort!
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:683 _sort!
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:744 _sort!
╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:799 _sort!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:713 _sort!(v::Vector{Float…
╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:775 _sort!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:799 _sort!
╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:877 _sort!(v::Base.Rein…
1╎ ╎ ╎ ╎ ╎ ╎ ╎ 1 @Base/…rt.jl:898 _sort!
╎98 @Base/task.jl:839 task_done_hook(t::Task)
╎ 98 @Base/task.jl:1167 wait()
98╎ 98 @Base/task.jl:1158 poptask(W::Base.IntrusiveLinkedListSynchronized{T…
╎16 @REPL/src/LineEdit.jl:2868 #prompt!##2
╎ 16 @Base/lock.jl:294 macro expansion
╎ 16 @REPL/src/LineEdit.jl:2878 macro expansion
╎ 16 @REPL/src/LineEdit.jl:1728 (::REPL.LineEdit.var"#match_input##0#mat…
╎ 16 @Base/essentials.jl:1046 invokelatest
╎ 16 @Base/essentials.jl:1049 #invokelatest#1
╎ ╎ 16 @OhMyREPL/…rc/repl.jl:256 (::OhMyREPL.Prompt.var"#create_keybind…
╎ ╎ 16 @REPL/src/REPL.jl:1205 do_respond
╎ ╎ 16 @REPL/src/REPL.jl:1191 eval_with_backend
╎ ╎ 16 @Base/channels.jl:487 take!
╎ ╎ 16 @Base/channels.jl:493 take_buffered(c::Channel{Any})
╎ ╎ ╎ 16 @Base/condition.jl:136 wait
╎ ╎ ╎ 16 @Base/condition.jl:141 wait(c::Base.GenericCondition{Reent…
╎ ╎ ╎ 16 @Base/task.jl:1167 wait()
16╎ ╎ ╎ 16 @Base/task.jl:1158 poptask(W::Base.IntrusiveLinkedListSy…
Total snapshots: 246. Utilization: 54% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task. |
I've not marked those for backport as it'd be good to get confirmation that it's a good idea, but have prepared the backports here for testing #56358 |
Fixed in #56228 |
whereas on julia-1.10.5:
The text was updated successfully, but these errors were encountered: