Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty profile information on v1.11 #56327

Closed
kalmarek opened this issue Oct 25, 2024 · 17 comments
Closed

Empty profile information on v1.11 #56327

kalmarek opened this issue Oct 25, 2024 · 17 comments
Labels
profiler regression 1.11 Regression in the 1.11 release

Comments

@kalmarek
Copy link
Contributor

kalmarek commented Oct 25, 2024

julia> @profile peakflops();

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎6708 @Base/task.jl:694; task_done_hook(t::Task)
    ╎ 6708 @Base/task.jl:1021; wait()
67076708 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
Total snapshots: 6708. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

whereas on julia-1.10.5:

julia> @profile peakflops();

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
     ╎1937  @Base/Base.jl:608; (::Base.var"#1055#1056")()
     ╎ 1937  @Base/Base.jl:572; profile_printing_listener()
     ╎  1937  @Base/asyncevent.jl:159; wait
     ╎   1937  @Base/asyncevent.jl:142; _trywait(t::Base.AsyncCondition)

...... (lots of lines here)

     ╎    ╎    ╎    ╎    1233  a/src/matmul.jl:113; *(A::Matrix{Float64}, B::Matrix
     ╎    ╎    ╎    ╎     18    @Base/array.jl:420; similar
     ╎    ╎    ╎    ╎    ╎ 18    @Base/boot.jl:487; Array
   18╎    ╎    ╎    ╎    ╎  18    @Base/boot.jl:479; Array
     ╎    ╎    ╎    ╎     1215  a/src/matmul.jl:237; mul!
     ╎    ╎    ╎    ╎    ╎ 1215  /src/matmul.jl:263; mul!
     ╎    ╎    ╎    ╎    ╎  1215  /src/matmul.jl:352; generic_matmatmul!
     ╎    ╎    ╎    ╎    ╎   1215  src/matmul.jl:605; gemm_wrapper!(C::Matrix{Float
 1215╎    ╎    ╎    ╎    ╎    1215  a/src/blas.jl:1524; gemm!(transA::Char, transB:
    3╎    ╎    ╎    ╎  3     LinearAlgebra.jl:616; peakflops(n::Int64; eltype::Data
     ╎11622 @Base/task.jl:682; task_done_hook(t::Task)
     ╎ 11622 @Base/task.jl:1008; wait()
1162211622 @Base/task.jl:999; poptask(W::Base.IntrusiveLinkedListSynchronized{Tas
Total snapshots: 21307. Utilization: 36% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8
@thofma
Copy link
Contributor

thofma commented Oct 25, 2024

Cannot reproduce this on 1.11.1 with

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, broadwell)
Threads: 1 default, 0 interactive, 1 GC (on 48 virtual cores)

Everything is working as expected there.

Edit: Can reproduce if I use 8 threads.

@vtjnash
Copy link
Member

vtjnash commented Oct 25, 2024

Looks like dgemm kernels in openblas are missing correct unwind info now, since C=true shows the missing samples:

  127╎127   @julialib/libopenblas64_.so:?  dgemm_beta_ZEN
 5910╎5910  @julialib/libopenblas64_.so:?  dgemm_kernel_ZEN

@IanButterworth
Copy link
Member

FYI @gbaraldi given we also encountered this and couldn't figure it out.

@IanButterworth IanButterworth added profiler regression 1.11 Regression in the 1.11 release labels Oct 25, 2024
@vtjnash
Copy link
Member

vtjnash commented Oct 25, 2024

openblas uses hand written assembly for this, so it is missing the required CFI info in the PROLOGUE macro that would normally be auto-generated by the compiler (https://github.com/OpenMathLib/OpenBLAS/blob/ac736820d7001a715395b01412823fe712566eb6/common_x86.h#L300)

@kalmarek
Copy link
Contributor Author

It's not limited to peakflops() unfortunately, I see empty profile also for my own functions which don't even call blas

@IanButterworth
Copy link
Member

If you can share equivalent code it'd help debug this.

@kalmarek
Copy link
Contributor Author

Example from readme of ProfileView.jl:

julia> versioninfo();
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

julia> using Profile

julia> function profile_test(n)
           for i = 1:n
               A = randn(100,100,20)
               m = maximum(A)
               Am = mapslices(sum, A; dims=2)
               B = A[:,:,5]
               Bsort = mapslices(sort, B; dims=1)
               b = rand(100)
               C = B.*b
           end
       end
profile_test (generic function with 1 method)

julia> @profile profile_test(2);

julia> Profile.clear(); @profile profile_test(10);

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
  ╎12 @Base/task.jl:694; task_done_hook(t::Task)
  ╎ 12 @Base/task.jl:1021; wait()
1112 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
Total snapshots: 12. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

@IanButterworth
Copy link
Member

Can you show Profile.print(C=true) as well. Thanks

@gdalle
Copy link
Contributor

gdalle commented Oct 26, 2024

Here it is:

julia> versioninfo();
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)

julia> using Profile

julia> function profile_test(n)
           for i = 1:n
               A = randn(100,100,20)
               m = maximum(A)
               Am = mapslices(sum, A; dims=2)
               B = A[:,:,5]
               Bsort = mapslices(sort, B; dims=1)
               b = rand(100)
               C = B.*b
           end
       end
profile_test (generic function with 1 method)

julia> @profile profile_test(2);

julia> Profile.clear(); @profile profile_test(10);

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
  ╎57 @REPL/src/LineEdit.jl:2835; #28157 @Base/channels.jl:597; wait
  ╎  57 @Base/condition.jl:125; wait
  ╎   57 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; 57 @Base/task.jl:1021; wait()
5657 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized{
Total snapshots: 57. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

julia> Profile.clear(); @profile profile_test(10);

julia> Profile.print(C=true)
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
  ╎22 julia-master/src/task.c:1213; start_task
  ╎ 22 julia-master/src/task.c:319; jl_finish_task
  ╎  22 lia-master/src/julia.h:2157; jl_apply
  ╎   22 ux.gnu/lib/julia/sys.so:?; jfptr_task_done_hook_65631.122 @Base/task.jl:694; task_done_hook(t::Task)
  ╎     22 @Base/task.jl:1021; wait()
  ╎    ╎ 22 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized
  ╎    ╎  22 ster/src/scheduler.c:584; ijl_task_get_next
  ╎    ╎   22 uv/src/unix/thread.c:822; uv_cond_wait
  ╎    ╎    22 -linux-gnu/libc.so.6:?; pthread_cond_wait
21╎    ╎     22 -linux-gnu/libc.so.6:?; 
Total snapshots: 22. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

@IanButterworth
Copy link
Member

I think there's a bigger issue than failed unwinds happening.

@IanButterworth
Copy link
Member

@gdalle it'd be good to rule out your code executing too quickly. Can you try with 1000?

on 1.11.1 on MacOS:

julia> @profile profile_test(2);

julia> Profile.clear(); @time @profile profile_test(1000);
  0.894447 seconds (2.25 M allocations: 1.848 GiB, 7.67% gc time)

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
  ╎457 @Base/client.jl:541; _start()
  ╎ 457 @Base/client.jl:567; repl_main
  ╎  457 @Base/client.jl:430; run_main_repl(interactive::Bool, quiet::Bool, banner::Symbol, history_file::Bool, color_set::Bool)
  ╎   457 @Base/essentials.jl:1052; invokelatest
  ╎    457 @Base/essentials.jl:1055; #invokelatest#2
  ╎     457 @Base/client.jl:446; (::Base.var"#1139#1141"{Bool, Symbol, Bool})(REPL::Module)
  ╎    ╎ 457 @REPL/src/REPL.jl:469; run_repl(repl::REPL.AbstractREPL, consumer::Any)
  ╎    ╎  457 @REPL/src/REPL.jl:483; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
  ╎    ╎   457 @REPL/src/REPL.jl:324; kwcall(::NamedTuple, ::typeof(REPL.start_repl_backend), backend::REPL.REPLBackend, consumer::Any)
  ╎    ╎    457 @REPL/src/REPL.jl:327; start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
  ╎    ╎     457 @REPL/src/REPL.jl:342; repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
  ╎    ╎    ╎ 457 @REPL/src/REPL.jl:245; eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
 6╎    ╎    ╎  457 @Base/boot.jl:430; eval
  ╎    ╎    ╎   300 REPL[2]:3; profile_test(n::Int64)
  ╎    ╎    ╎    300 @Random/src/normal.jl:278; randn
  ╎    ╎    ╎     300 @Random/src/normal.jl:272; randn
  ╎    ╎    ╎    ╎ 11  @Base/boot.jl:598; Array
  ╎    ╎    ╎    ╎  11  @Base/boot.jl:585; Array
  ╎    ╎    ╎    ╎   11  @Base/boot.jl:535; new_as_memoryref
10╎    ╎    ╎    ╎    11  @Base/boot.jl:516; GenericMemory
  ╎    ╎    ╎    ╎ 113 @Random/src/normal.jl:257; randn!(rng::Random.TaskLocalRNG, A::Array{Float64, 3})
  ╎    ╎    ╎    ╎  113 @Random/src/Random.jl:269; rand!
  ╎    ╎    ╎    ╎   113 @Random/src/Random.jl:269; rand!
  ╎    ╎    ╎    ╎    113 @Random/src/XoshiroSimd.jl:280; rand!
  ╎    ╎    ╎    ╎     113 @Random/src/XoshiroSimd.jl:141; xoshiro_bulk
  ╎    ╎    ╎    ╎    ╎ 113 @Random/src/XoshiroSimd.jl:142; xoshiro_bulk
  ╎    ╎    ╎    ╎    ╎  1   @Random/src/XoshiroSimd.jl:229; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
 1╎    ╎    ╎    ╎    ╎   1   @Base/int.jl:514; <=
  ╎    ╎    ╎    ╎    ╎  10  @Random/src/XoshiroSimd.jl:230; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
10╎    ╎    ╎    ╎    ╎   10  @Random/src/XoshiroSimd.jl:58; _rotl23
  ╎    ╎    ╎    ╎    ╎  8   @Random/src/XoshiroSimd.jl:231; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
 8╎    ╎    ╎    ╎    ╎   8   @Random/src/XoshiroSimd.jl:65; _shl17
  ╎    ╎    ╎    ╎    ╎  6   @Random/src/XoshiroSimd.jl:233; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
 6╎    ╎    ╎    ╎    ╎   6   @Random/src/XoshiroSimd.jl:77; _xor
  ╎    ╎    ╎    ╎    ╎  3   @Random/src/XoshiroSimd.jl:234; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
 3╎    ╎    ╎    ╎    ╎   3   @Random/src/XoshiroSimd.jl:77; _xor
  ╎    ╎    ╎    ╎    ╎  7   @Random/src/XoshiroSimd.jl:235; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
 7╎    ╎    ╎    ╎    ╎   7   @Random/src/XoshiroSimd.jl:77; _xor
  ╎    ╎    ╎    ╎    ╎  4   @Random/src/XoshiroSimd.jl:236; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
 4╎    ╎    ╎    ╎    ╎   4   @Random/src/XoshiroSimd.jl:77; _xor
  ╎    ╎    ╎    ╎    ╎  27  @Random/src/XoshiroSimd.jl:237; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
27╎    ╎    ╎    ╎    ╎   27  @Random/src/XoshiroSimd.jl:55; _rotl45
  ╎    ╎    ╎    ╎    ╎  47  @Random/src/XoshiroSimd.jl:238; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int64, ::Type{UInt8}, ::Val{8}, f::typeof(Random…
  ╎    ╎    ╎    ╎    ╎   47  @Base/pointer.jl:180; unsafe_store!
47╎    ╎    ╎    ╎    ╎    47  @Base/pointer.jl:180; unsafe_store!
  ╎    ╎    ╎    ╎ 171 @Random/src/normal.jl:260; randn!(rng::Random.TaskLocalRNG, A::Array{Float64, 3})
23╎    ╎    ╎    ╎  23  @Base/array.jl:976; setindex!
 8╎    ╎    ╎    ╎  93  @Random/src/normal.jl:90; _randn
85╎    ╎    ╎    ╎   85  @Base/essentials.jl:917; getindex
14╎    ╎    ╎    ╎  55  @Random/src/normal.jl:91; _randn
12╎    ╎    ╎    ╎   12  @Random/src/normal.jl:0; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
 1╎    ╎    ╎    ╎   1   @Random/src/normal.jl:96; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
  ╎    ╎    ╎    ╎   1   @Random/src/normal.jl:104; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
  ╎    ╎    ╎    ╎    1   @Random/src/Random.jl:258; rand
  ╎    ╎    ╎    ╎     1   @Random/src/Random.jl:258; rand
  ╎    ╎    ╎    ╎    ╎ 1   @Random/src/Xoshiro.jl:305; rand
  ╎    ╎    ╎    ╎    ╎  1   @Random/src/Random.jl:258; rand
  ╎    ╎    ╎    ╎    ╎   1   @Random/src/Xoshiro.jl:274; rand
  ╎    ╎    ╎    ╎    ╎    1   @Random/src/Xoshiro.jl:215; setstate!
  ╎    ╎    ╎    ╎    ╎     1   @Base/task.jl:201; setproperty!
 1╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/Base.jl:53; setproperty!
22╎    ╎    ╎    ╎   27  @Random/src/normal.jl:107; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
 1╎    ╎    ╎    ╎    1   @Random/src/normal.jl:0; randn(rng::Random.TaskLocalRNG)
  ╎    ╎    ╎    ╎    2   @Random/src/normal.jl:73; randn(rng::Random.TaskLocalRNG)
  ╎    ╎    ╎    ╎     2   @Random/src/Random.jl:255; rand
  ╎    ╎    ╎    ╎    ╎ 2   @Random/src/Xoshiro.jl:296; rand
  ╎    ╎    ╎    ╎    ╎  2   @Random/src/Random.jl:258; rand
  ╎    ╎    ╎    ╎    ╎   1   @Random/src/Xoshiro.jl:264; rand
  ╎    ╎    ╎    ╎    ╎    1   @Random/src/Xoshiro.jl:227; getstate
 1╎    ╎    ╎    ╎    ╎     1   @Base/task.jl:193; getproperty
  ╎    ╎    ╎    ╎    ╎   1   @Random/src/Xoshiro.jl:270; rand
 1╎    ╎    ╎    ╎    ╎    1   @Base/int.jl:373; xor
  ╎    ╎    ╎    ╎    2   @Random/src/normal.jl:80; randn(rng::Random.TaskLocalRNG)
 2╎    ╎    ╎    ╎     2   @Random/src/normal.jl:0; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
 4╎    ╎    ╎    ╎ 4   @Random/src/normal.jl:261; randn!(rng::Random.TaskLocalRNG, A::Array{Float64, 3})
 1╎    ╎    ╎    ╎ 1   @Random/src/normal.jl:0; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
  ╎    ╎    ╎   97  REPL[2]:4; profile_test(n::Int64)
  ╎    ╎    ╎    97  @Base/reducedim.jl:982; maximum
  ╎    ╎    ╎     97  @Base/reducedim.jl:982; #maximum#941
  ╎    ╎    ╎    ╎ 97  @Base/reducedim.jl:986; _maximum
  ╎    ╎    ╎    ╎  97  @Base/reducedim.jl:986; #_maximum#943
  ╎    ╎    ╎    ╎   97  @Base/reducedim.jl:987; _maximum
  ╎    ╎    ╎    ╎    97  @Base/reducedim.jl:987; #_maximum#944
  ╎    ╎    ╎    ╎     97  @Base/reducedim.jl:329; mapreduce
  ╎    ╎    ╎    ╎    ╎ 97  @Base/reducedim.jl:329; #mapreduce#926
  ╎    ╎    ╎    ╎    ╎  97  @Base/reducedim.jl:337; _mapreduce_dim
  ╎    ╎    ╎    ╎    ╎   97  @Base/reduce.jl:444; _mapreduce(f::typeof(identity), op::typeof(max), ::IndexLinear, A::Array{Float64, 3})
  ╎    ╎    ╎    ╎    ╎    7   @Base/reduce.jl:651; mapreduce_impl(f::typeof(identity), op::typeof(max), A::Array{Float64, 3}, first::Int64, last::Int64)
  ╎    ╎    ╎    ╎    ╎     7   @Base/reduce.jl:623; _fast
 7╎    ╎    ╎    ╎    ╎    ╎ 7   @Base/essentials.jl:796; ifelse
  ╎    ╎    ╎    ╎    ╎    32  @Base/reduce.jl:652; mapreduce_impl(f::typeof(identity), op::typeof(max), A::Array{Float64, 3}, first::Int64, last::Int64)
 6╎    ╎    ╎    ╎    ╎     6   @Base/essentials.jl:917; getindex
  ╎    ╎    ╎    ╎    ╎     26  @Base/reduce.jl:623; _fast
26╎    ╎    ╎    ╎    ╎    ╎ 26  @Base/essentials.jl:796; ifelse
  ╎    ╎    ╎    ╎    ╎    46  @Base/reduce.jl:653; mapreduce_impl(f::typeof(identity), op::typeof(max), A::Array{Float64, 3}, first::Int64, last::Int64)
  ╎    ╎    ╎    ╎    ╎     46  @Base/reduce.jl:623; _fast
46╎    ╎    ╎    ╎    ╎    ╎ 46  @Base/essentials.jl:796; ifelse
  ╎    ╎    ╎    ╎    ╎    12  @Base/reduce.jl:654; mapreduce_impl(f::typeof(identity), op::typeof(max), A::Array{Float64, 3}, first::Int64, last::Int64)
  ╎    ╎    ╎    ╎    ╎     12  @Base/range.jl:908; iterate
12╎    ╎    ╎    ╎    ╎    ╎ 12  @Base/promotion.jl:639; ==
  ╎    ╎    ╎   25  REPL[2]:5; profile_test(n::Int64)
  ╎    ╎    ╎    25  @Base/abstractarray.jl:3265; mapslices
  ╎    ╎    ╎     1   @Base/abstractarray.jl:3280; mapslices(f::typeof(sum), A::Array{Float64, 3}; dims::Int64)
  ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:1312; getindex
  ╎    ╎    ╎    ╎  1   @Base/multidimensional.jl:915; _getindex(::IndexLinear, ::Array{Float64, 3}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}, ::Vararg{Union{Real, AbstractArr…
  ╎    ╎    ╎    ╎   1   @Base/multidimensional.jl:927; _unsafe_getindex(::IndexLinear, ::Array{Float64, 3}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}, ::Int64)
  ╎    ╎    ╎    ╎    1   @Base/abstractarray.jl:822; similar
  ╎    ╎    ╎    ╎     1   @Base/array.jl:361; similar
  ╎    ╎    ╎    ╎    ╎ 1   @Base/boot.jl:591; Array
  ╎    ╎    ╎    ╎    ╎  1   @Base/boot.jl:578; Array
 1╎    ╎    ╎    ╎    ╎   1   @Base/boot.jl:516; GenericMemory
  ╎    ╎    ╎     24  @Base/abstractarray.jl:3326; mapslices(f::typeof(sum), A::Array{Float64, 3}; dims::Int64)
  ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:3336; _inner_mapslices!(R::Array{Float64, 3}, indices::Base.Iterators.Drop{CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneT…
  ╎    ╎    ╎    ╎  1   @Base/broadcast.jl:867; materialize
  ╎    ╎    ╎    ╎   1   @Base/broadcast.jl:1097; copy
  ╎    ╎    ╎    ╎    1   @Base/ntuple.jl:50; ntuple
  ╎    ╎    ╎    ╎     1   @Base/broadcast.jl:1097; #17
  ╎    ╎    ╎    ╎    ╎ 1   @Base/broadcast.jl:645; _broadcast_getindex
  ╎    ╎    ╎    ╎    ╎  1   @Base/broadcast.jl:669; _getindex
  ╎    ╎    ╎    ╎    ╎   1   @Base/broadcast.jl:669; _getindex
  ╎    ╎    ╎    ╎    ╎    1   @Base/broadcast.jl:646; _broadcast_getindex
  ╎    ╎    ╎    ╎    ╎     1   @Base/broadcast.jl:673; _broadcast_getindex_evalf
 1╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/indices.jl:383; Slice
 7╎    ╎    ╎    ╎ 7   @Base/abstractarray.jl:3337; _inner_mapslices!(R::Array{Float64, 3}, indices::Base.Iterators.Drop{CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneT…
  ╎    ╎    ╎    ╎ 14  @Base/abstractarray.jl:3338; _inner_mapslices!(R::Array{Float64, 3}, indices::Base.Iterators.Drop{CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneT…
  ╎    ╎    ╎    ╎  14  @Base/reducedim.jl:982; sum
  ╎    ╎    ╎    ╎   14  @Base/reducedim.jl:982; #sum#933
  ╎    ╎    ╎    ╎    14  @Base/reducedim.jl:986; _sum
  ╎    ╎    ╎    ╎     14  @Base/reducedim.jl:986; #_sum#935
  ╎    ╎    ╎    ╎    ╎ 14  @Base/reducedim.jl:987; _sum
  ╎    ╎    ╎    ╎    ╎  14  @Base/reducedim.jl:987; #_sum#936
  ╎    ╎    ╎    ╎    ╎   14  @Base/reducedim.jl:329; mapreduce
  ╎    ╎    ╎    ╎    ╎    14  @Base/reducedim.jl:329; #mapreduce#926
  ╎    ╎    ╎    ╎    ╎     14  @Base/reducedim.jl:337; _mapreduce_dim
  ╎    ╎    ╎    ╎    ╎    ╎ 14  @Base/reduce.jl:444; _mapreduce
  ╎    ╎    ╎    ╎    ╎    ╎  14  @Base/reduce.jl:277; mapreduce_impl
 5╎    ╎    ╎    ╎    ╎    ╎   5   @Base/reduce.jl:0; mapreduce_impl(f::typeof(identity), op::typeof(Base.add_sum), A::Vector{Float64}, ifirst::Int64, ilast::Int64, blksize::I…
  ╎    ╎    ╎    ╎    ╎    ╎   9   @Base/reduce.jl:263; mapreduce_impl(f::typeof(identity), op::typeof(Base.add_sum), A::Vector{Float64}, ifirst::Int64, ilast::Int64, blksize:…
  ╎    ╎    ╎    ╎    ╎    ╎    3   @Base/simdloop.jl:71; macro expansion
  ╎    ╎    ╎    ╎    ╎    ╎     3   @Base/simdloop.jl:51; simd_inner_length
  ╎    ╎    ╎    ╎    ╎    ╎    ╎ 3   @Base/range.jl:768; length
 3╎    ╎    ╎    ╎    ╎    ╎    ╎  3   @Base/int.jl:86; -
 2╎    ╎    ╎    ╎    ╎    ╎    2   @Base/simdloop.jl:75; macro expansion
  ╎    ╎    ╎    ╎    ╎    ╎    4   @Base/simdloop.jl:77; macro expansion
  ╎    ╎    ╎    ╎    ╎    ╎     4   @Base/reduce.jl:265; macro expansion
  ╎    ╎    ╎    ╎    ╎    ╎    ╎ 4   @Base/reduce.jl:27; add_sum
 4╎    ╎    ╎    ╎    ╎    ╎    ╎  4   @Base/float.jl:491; +
  ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:3344; _inner_mapslices!(R::Array{Float64, 3}, indices::Base.Iterators.Drop{CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneT…
  ╎    ╎    ╎    ╎  1   @Base/array.jl:982; setindex!
  ╎    ╎    ╎    ╎   1   @Base/abstractarray.jl:699; checkbounds
  ╎    ╎    ╎    ╎    1   @Base/abstractarray.jl:681; checkbounds
  ╎    ╎    ╎    ╎     1   @Base/abstractarray.jl:98; axes
 1╎    ╎    ╎    ╎    ╎ 1   @Base/array.jl:194; size
  ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:3346; _inner_mapslices!(R::Array{Float64, 3}, indices::Base.Iterators.Drop{CartesianIndices{3, Tuple{Base.OneTo{Int64}, Base.OneT…
  ╎    ╎    ╎    ╎  1   @Base/iterators.jl:847; iterate
  ╎    ╎    ╎    ╎   1   @Base/multidimensional.jl:422; iterate
  ╎    ╎    ╎    ╎    1   @Base/multidimensional.jl:446; __inc
 1╎    ╎    ╎    ╎     1   @Base/int.jl:87; +
  ╎    ╎    ╎   5   REPL[2]:6; profile_test(n::Int64)
  ╎    ╎    ╎    5   @Base/abstractarray.jl:1312; getindex
  ╎    ╎    ╎     5   @Base/multidimensional.jl:915; _getindex
  ╎    ╎    ╎    ╎ 2   @Base/multidimensional.jl:927; _unsafe_getindex(::IndexLinear, ::Array{Float64, 3}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::…
  ╎    ╎    ╎    ╎  2   @Base/abstractarray.jl:822; similar
  ╎    ╎    ╎    ╎   2   @Base/array.jl:361; similar
  ╎    ╎    ╎    ╎    2   @Base/boot.jl:592; Array
  ╎    ╎    ╎    ╎     2   @Base/boot.jl:582; Array
  ╎    ╎    ╎    ╎    ╎ 2   @Base/boot.jl:535; new_as_memoryref
 2╎    ╎    ╎    ╎    ╎  2   @Base/boot.jl:516; GenericMemory
  ╎    ╎    ╎    ╎ 3   @Base/multidimensional.jl:929; _unsafe_getindex(::IndexLinear, ::Array{Float64, 3}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::…
  ╎    ╎    ╎    ╎  3   @Base/multidimensional.jl:948; _unsafe_getindex!
  ╎    ╎    ╎    ╎   3   @Base/multidimensional.jl:938; macro expansion
  ╎    ╎    ╎    ╎    3   @Base/cartesian.jl:64; macro expansion
  ╎    ╎    ╎    ╎     3   @Base/multidimensional.jl:940; macro expansion
 3╎    ╎    ╎    ╎    ╎ 3   @Base/array.jl:976; setindex!
  ╎    ╎    ╎   15  REPL[2]:7; profile_test(n::Int64)
  ╎    ╎    ╎    15  @Base/abstractarray.jl:3265; mapslices
  ╎    ╎    ╎     1   @Base/abstractarray.jl:3281; mapslices(f::typeof(sort), A::Matrix{Float64}; dims::Int64)
  ╎    ╎    ╎    ╎ 1   @Base/sort.jl:1720; sort
  ╎    ╎    ╎    ╎  1   @Base/sort.jl:1720; #sort#33
  ╎    ╎    ╎    ╎   1   @Base/sort.jl:1687; sort!
  ╎    ╎    ╎    ╎    1   @Base/sort.jl:1694; #sort!#32
  ╎    ╎    ╎    ╎     1   @Base/sort.jl:554; _sort!
  ╎    ╎    ╎    ╎    ╎ 1   @Base/sort.jl:676; _sort!
  ╎    ╎    ╎    ╎    ╎  1   @Base/sort.jl:737; _sort!
  ╎    ╎    ╎    ╎    ╎   1   @Base/sort.jl:792; _sort!
  ╎    ╎    ╎    ╎    ╎    1   @Base/sort.jl:704; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.Insertio…
  ╎    ╎    ╎    ╎    ╎     1   @Base/sort.jl:768; _sort!
  ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/sort.jl:792; _sort!
  ╎    ╎    ╎    ╎    ╎    ╎  1   @Base/sort.jl:860; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, a::Base.Sort.CheckSorted{Base.Sort.ComputeExt…
  ╎    ╎    ╎    ╎    ╎    ╎   1   @Base/sort.jl:886; _sort!
 1╎    ╎    ╎    ╎    ╎    ╎    1   @Base/sort.jl:905; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, a::Base.Sort.ConsiderCountingSort{Base.Sort…
  ╎    ╎    ╎     7   @Base/abstractarray.jl:3308; mapslices(f::typeof(sort), A::Matrix{Float64}; dims::Int64)
  ╎    ╎    ╎    ╎ 7   @Base/abstractarray.jl:822; similar
  ╎    ╎    ╎    ╎  7   @Base/array.jl:361; similar
  ╎    ╎    ╎    ╎   7   @Base/boot.jl:592; Array
  ╎    ╎    ╎    ╎    7   @Base/boot.jl:582; Array
  ╎    ╎    ╎    ╎     7   @Base/boot.jl:535; new_as_memoryref
 7╎    ╎    ╎    ╎    ╎ 7   @Base/boot.jl:516; GenericMemory
  ╎    ╎    ╎     7   @Base/abstractarray.jl:3326; mapslices(f::typeof(sort), A::Matrix{Float64}; dims::Int64)
  ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:3337; _inner_mapslices!(R::Matrix{Float64}, indices::Base.Iterators.Drop{CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{…
  ╎    ╎    ╎    ╎  1   @Base/multidimensional.jl:938; _unsafe_getindex!
  ╎    ╎    ╎    ╎   1   @Base/cartesian.jl:64; macro expansion
  ╎    ╎    ╎    ╎    1   @Base/multidimensional.jl:940; macro expansion
 1╎    ╎    ╎    ╎     1   @Base/array.jl:976; setindex!
  ╎    ╎    ╎    ╎ 6   @Base/abstractarray.jl:3338; _inner_mapslices!(R::Matrix{Float64}, indices::Base.Iterators.Drop{CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{…
  ╎    ╎    ╎    ╎  6   @Base/sort.jl:1720; sort
  ╎    ╎    ╎    ╎   6   @Base/sort.jl:1720; #sort#33
  ╎    ╎    ╎    ╎    4   @Base/abstractarray.jl:1192; copymutable
  ╎    ╎    ╎    ╎     4   @Base/array.jl:356; similar
  ╎    ╎    ╎    ╎    ╎ 1   @Base/boot.jl:578; Array
 1╎    ╎    ╎    ╎    ╎  1   @Base/boot.jl:516; GenericMemory
 3╎    ╎    ╎    ╎    ╎ 3   @Base/boot.jl:579; Array
  ╎    ╎    ╎    ╎    2   @Base/sort.jl:1687; sort!
  ╎    ╎    ╎    ╎     2   @Base/sort.jl:1694; #sort!#32
  ╎    ╎    ╎    ╎    ╎ 2   @Base/sort.jl:554; _sort!
  ╎    ╎    ╎    ╎    ╎  2   @Base/sort.jl:676; _sort!
  ╎    ╎    ╎    ╎    ╎   2   @Base/sort.jl:737; _sort!
  ╎    ╎    ╎    ╎    ╎    2   @Base/sort.jl:792; _sort!
  ╎    ╎    ╎    ╎    ╎     1   @Base/sort.jl:704; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.Inserti…
  ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/sort.jl:768; _sort!
  ╎    ╎    ╎    ╎    ╎    ╎  1   @Base/sort.jl:792; _sort!
  ╎    ╎    ╎    ╎    ╎    ╎   1   @Base/sort.jl:851; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, a::Base.Sort.CheckSorted{Base.Sort.ComputeEx…
  ╎    ╎    ╎    ╎    ╎    ╎    1   @Base/sort.jl:1428; _issorted
  ╎    ╎    ╎    ╎    ╎    ╎     1   @Base/abstractarray.jl:699; checkbounds
  ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:689; checkbounds
  ╎    ╎    ╎    ╎    ╎    ╎    ╎  1   @Base/abstractarray.jl:389; eachindex
  ╎    ╎    ╎    ╎    ╎    ╎    ╎   1   @Base/abstractarray.jl:137; axes1
  ╎    ╎    ╎    ╎    ╎    ╎    ╎    1   @Base/reinterpretarray.jl:359; axes
  ╎    ╎    ╎    ╎    ╎    ╎    ╎     1   @Base/abstractarray.jl:98; axes
 1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/array.jl:194; size
  ╎    ╎    ╎    ╎    ╎     1   @Base/sort.jl:706; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.Inserti…
  ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/sort.jl:768; _sort!
  ╎    ╎    ╎    ╎    ╎    ╎  1   @Base/sort.jl:792; _sort!
  ╎    ╎    ╎    ╎    ╎    ╎   1   @Base/sort.jl:860; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, a::Base.Sort.CheckSorted{Base.Sort.ComputeEx…
 1╎    ╎    ╎    ╎    ╎    ╎    1   @Base/sort.jl:881; _sort!
  ╎    ╎    ╎   9   REPL[2]:9; profile_test(n::Int64)
  ╎    ╎    ╎    9   @Base/broadcast.jl:867; materialize
  ╎    ╎    ╎     9   @Base/broadcast.jl:892; copy
  ╎    ╎    ╎    ╎ 7   @Base/broadcast.jl:920; copyto!
  ╎    ╎    ╎    ╎  7   @Base/broadcast.jl:967; copyto!
  ╎    ╎    ╎    ╎   7   @Base/simdloop.jl:77; macro expansion
  ╎    ╎    ╎    ╎    7   @Base/broadcast.jl:968; macro expansion
  ╎    ╎    ╎    ╎     5   @Base/broadcast.jl:605; getindex
  ╎    ╎    ╎    ╎    ╎ 3   @Base/broadcast.jl:645; _broadcast_getindex
  ╎    ╎    ╎    ╎    ╎  3   @Base/broadcast.jl:669; _getindex
  ╎    ╎    ╎    ╎    ╎   3   @Base/broadcast.jl:639; _broadcast_getindex
  ╎    ╎    ╎    ╎    ╎    3   @Base/multidimensional.jl:702; getindex
  ╎    ╎    ╎    ╎    ╎     3   @Base/array.jl:919; getindex
  ╎    ╎    ╎    ╎    ╎    ╎ 3   @Base/abstractarray.jl:1347; _to_linear_index
  ╎    ╎    ╎    ╎    ╎    ╎  3   @Base/abstractarray.jl:3048; _sub2ind
  ╎    ╎    ╎    ╎    ╎    ╎   3   @Base/abstractarray.jl:3064; _sub2ind
  ╎    ╎    ╎    ╎    ╎    ╎    3   @Base/abstractarray.jl:3080; _sub2ind_recurse
 3╎    ╎    ╎    ╎    ╎    ╎     3   @Base/int.jl:87; +
  ╎    ╎    ╎    ╎    ╎ 2   @Base/broadcast.jl:646; _broadcast_getindex
  ╎    ╎    ╎    ╎    ╎  2   @Base/broadcast.jl:673; _broadcast_getindex_evalf
 2╎    ╎    ╎    ╎    ╎   2   @Base/float.jl:493; *
  ╎    ╎    ╎    ╎     2   @Base/multidimensional.jl:704; setindex!
 2╎    ╎    ╎    ╎    ╎ 2   @Base/array.jl:983; setindex!
  ╎    ╎    ╎    ╎ 2   @Base/broadcast.jl:223; similar
  ╎    ╎    ╎    ╎  2   @Base/broadcast.jl:224; similar
  ╎    ╎    ╎    ╎   2   @Base/abstractarray.jl:867; similar
  ╎    ╎    ╎    ╎    2   @Base/abstractarray.jl:868; similar
  ╎    ╎    ╎    ╎     2   @Base/boot.jl:599; Array
  ╎    ╎    ╎    ╎    ╎ 2   @Base/boot.jl:592; Array
  ╎    ╎    ╎    ╎    ╎  2   @Base/boot.jl:582; Array
  ╎    ╎    ╎    ╎    ╎   2   @Base/boot.jl:535; new_as_memoryref
 2╎    ╎    ╎    ╎    ╎    2   @Base/boot.jl:516; GenericMemory
 1╎51  @Base/multidimensional.jl:948; _unsafe_getindex!(::Vector{Float64}, ::Array{Float64, 3}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}, ::Int64)
  ╎ 50  @Base/multidimensional.jl:938; macro expansion
 1╎  1   @Base/cartesian.jl:0; macro expansion
 1╎  49  @Base/cartesian.jl:66; macro expansion
  ╎   48  @Base/indices.jl:401; iterate
  ╎    48  @Base/range.jl:908; iterate
48╎     48  @Base/promotion.jl:639; ==
  ╎5   @Base/sort.jl:701; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.InsertionSortAlg, Base.Sort.Chec…
  ╎ 5   @Base/sort.jl:637; send_to_end!
  ╎  5   @Base/sort.jl:637; #send_to_end!#15
  ╎   5   @Base/sort.jl:612; send_to_end!
 2╎    2   @Base/sort.jl:0; send_to_end!(f::ComposedFunction{typeof(!), typeof(isnan)}, v::SubArray{Float64, 1, Vector{Float64}, Tuple{StepRange{Int64, Int64}}, true}; lo::Int…
  ╎    2   @Base/sort.jl:619; send_to_end!(f::ComposedFunction{typeof(!), typeof(isnan)}, v::SubArray{Float64, 1, Vector{Float64}, Tuple{StepRange{Int64, Int64}}, true}; lo::I…
  ╎     2   @Base/subarray.jl:343; getindex
 2╎    ╎ 2   @Base/essentials.jl:917; getindex
  ╎    1   @Base/sort.jl:625; send_to_end!(f::ComposedFunction{typeof(!), typeof(isnan)}, v::SubArray{Float64, 1, Vector{Float64}, Tuple{StepRange{Int64, Int64}}, true}; lo::I…
 1╎     1   @Base/int.jl:86; -
  ╎29  @Base/sort.jl:703; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.InsertionSortAlg, Base.Sort.Chec…
  ╎ 29  @Base/sort.jl:612; send_to_end!
18╎  18  @Base/sort.jl:0; send_to_end!(f::Base.Sort.var"#19#22"{Base.Order.ForwardOrdering}, v::Vector{Float64}; lo::Int64, hi::Int64)
  ╎  1   @Base/sort.jl:615; send_to_end!(f::Base.Sort.var"#19#22"{Base.Order.ForwardOrdering}, v::Vector{Float64}; lo::Int64, hi::Int64)
 1╎   1   @Base/int.jl:87; +
  ╎  9   @Base/sort.jl:620; send_to_end!(f::Base.Sort.var"#19#22"{Base.Order.ForwardOrdering}, v::Vector{Float64}; lo::Int64, hi::Int64)
 9╎   9   @Base/essentials.jl:917; getindex
  ╎  1   @Base/sort.jl:625; send_to_end!(f::Base.Sort.var"#19#22"{Base.Order.ForwardOrdering}, v::Vector{Float64}; lo::Int64, hi::Int64)
 1╎   1   @Base/int.jl:86; -
  ╎1   @Base/sort.jl:704; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.InsertionSortAlg, Base.Sort.Chec…
  ╎ 1   @Base/sort.jl:768; _sort!
  ╎  1   @Base/sort.jl:790; _sort!
  ╎   1   @Base/sort.jl:828; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ReverseOrdering{Base.Orde…
  ╎    1   @Base/reinterpretarray.jl:548; setindex!
  ╎     1   @Base/reinterpretarray.jl:590; _setindex_ra!
 1╎    ╎ 1   @Base/array.jl:976; setindex!
  ╎1   @Base/sort.jl:706; _sort!(v::Vector{Float64}, a::Base.Sort.IEEEFloatOptimization{Base.Sort.IsUIntMappable{Base.Sort.Small{40, Base.Sort.InsertionSortAlg, Base.Sort.Chec…
  ╎ 1   @Base/sort.jl:768; _sort!
  ╎  1   @Base/sort.jl:790; _sort!
 1╎   1   @Base/sort.jl:825; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering, kw::@Nam…
  ╎91  @Base/sort.jl:912; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, a::Base.Sort.ConsiderCountingSort{Base.Sort.CountingSort, Base.Sort.Cons…
  ╎ 91  @Base/sort.jl:972; _sort!
  ╎  91  @Base/sort.jl:790; _sort!
 6╎   6   @Base/sort.jl:0; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering, kw::@Named…
  ╎   1   @Base/sort.jl:822; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering, kw::@Nam…
  ╎    1   @Base/reinterpretarray.jl:402; getindex
  ╎     1   @Base/reinterpretarray.jl:439; _getindex_ra
 1╎    ╎ 1   @Base/essentials.jl:917; getindex
  ╎   19  @Base/sort.jl:824; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering, kw::@Nam…
  ╎    19  @Base/reinterpretarray.jl:402; getindex
  ╎     19  @Base/reinterpretarray.jl:439; _getindex_ra
19╎    ╎ 19  @Base/essentials.jl:917; getindex
44╎   44  @Base/sort.jl:825; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ReverseOrdering{Base.Orde…
  ╎   12  @Base/sort.jl:828; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering, kw::@Nam…
  ╎    12  @Base/reinterpretarray.jl:548; setindex!
  ╎     12  @Base/reinterpretarray.jl:590; _setindex_ra!
12╎    ╎ 12  @Base/array.jl:976; setindex!
  ╎   8   @Base/sort.jl:831; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ForwardOrdering, kw::@Nam…
  ╎    8   @Base/reinterpretarray.jl:548; setindex!
  ╎     8   @Base/reinterpretarray.jl:590; _setindex_ra!
 8╎    ╎ 8   @Base/array.jl:976; setindex!
 1╎   1   @Base/sort.jl:832; _sort!(v::Base.ReinterpretArray{UInt64, 1, Float64, Vector{Float64}, false}, ::Base.Sort.InsertionSortAlg, o::Base.Order.ReverseOrdering{Base.Orde…
  ╎7   @Random/src/normal.jl:104; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Float64)
  ╎ 7   @Base/special/exp.jl:327; exp(x::Float64)
  ╎  1   @Base/special/exp.jl:215; exp_impl
  ╎   1   @Base/special/exp.jl:182; table_unpack
 1╎    1   @Base/essentials.jl:730; reinterpret
  ╎  1   @Base/special/exp.jl:216; exp_impl
  ╎   1   @Base/special/exp.jl:78; expm1b_kernel
  ╎    1   @Base/math.jl:94; evalpoly
  ╎     1   @Base/math.jl:95; macro expansion
 1╎    ╎ 1   @Base/float.jl:496; muladd
  ╎  4   @Base/special/exp.jl:218; exp_impl
 4╎   4   @Base/float.jl:692; abs
  ╎  1   @Base/special/exp.jl:229; exp_impl
 1╎   1   @Base/essentials.jl:730; reinterpret
Total snapshots: 667. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

@gdalle
Copy link
Contributor

gdalle commented Oct 26, 2024

Same behavior:

julia> Profile.clear(); @profile profile_test(1000); Profile.print();
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎1453 @Base/task.jl:694; task_done_hook(t::Task)
    ╎ 1453 @Base/task.jl:1021; wait()
14521453 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchronized
Total snapshots: 1453. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

julia> Profile.clear(); @profile profile_test(1000); Profile.print(C=true);
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎1468 lia-master/src/task.c:1213; start_task
    ╎ 1468 lia-master/src/task.c:319; jl_finish_task
    ╎  1468 a-master/src/julia.h:2157; jl_apply
    ╎   1468 ….gnu/lib/julia/sys.so:?; jfptr_task_done_hook_65631.11468 @Base/task.jl:694; task_done_hook(t::Task)
    ╎     1468 @Base/task.jl:1021; wait()
    ╎    ╎ 1468 @Base/task.jl:1012; poptask(W::Base.IntrusiveLinkedListSynchron
    ╎    ╎  1468 ter/src/scheduler.c:584; ijl_task_get_next
    ╎    ╎   1468 /src/unix/thread.c:822; uv_cond_wait
    ╎    ╎    1468 linux-gnu/libc.so.6:?; pthread_cond_wait
1467╎    ╎     1468 inux-gnu/libc.so.6:?; 
Total snapshots: 1468. Utilization: 0% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

Did you run your test with julia +1.11 -t 8? In single-threaded mode, the profile is full as usual. The problem is only with more than one thread.

@IanButterworth
Copy link
Member

IanButterworth commented Oct 26, 2024

With -t6 the profile looks healthy to me on MacOS.

Seems like the round-robin thread sampler is broken on linux?

@vtjnash I see that #55500 was backported but a fix for that PR wasn't #55622 (maybe also #55704 should be?)

Could that be the issue?

@IanButterworth
Copy link
Member

@gdalle does nightly have the same issue?

@gdalle
Copy link
Contributor

gdalle commented Oct 26, 2024

Indeed this looks fixed on nightly, so we're probably missing a backport.

julia> versioninfo();
Julia Version 1.12.0-DEV.1502
Commit ee09ae70d9f (2024-10-26 01:01 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 8 GC (on 8 virtual cores)

julia> using Profile

julia> function profile_test(n)
           for i = 1:n
               A = randn(100,100,20)
               m = maximum(A)
               Am = mapslices(sum, A; dims=2)
               B = A[:,:,5]
               Bsort = mapslices(sort, B; dims=1)
               b = rand(100)
               C = B.*b
           end
       end
profile_test (generic function with 1 method)

julia> @profile profile_test(2);

julia> Profile.clear(); @profile profile_test(10);

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line  Function
=========================================================
   ╎16  @Base/client.jl:568  _start()
   ╎ 16  @Base/client.jl:593  repl_main
   ╎  16  @Base/client.jl:511  run_main_repl(interactive::Bool, quiet::Bool, ba16  @Base/essentials.jl:1046  invokelatest
   ╎    16  @Base/essentials.jl:1049  #invokelatest#116  @Base/client.jl:490  run_std_repl
   ╎    ╎ 16  @REPL/src/REPL.jl:676  run_repl
   ╎    ╎  16  @REPL/src/REPL.jl:690  #run_repl#48
   ╎    ╎   16  @REPL/src/REPL.jl:464  start_repl_backend
   ╎    ╎    16  @REPL/src/REPL.jl:467  #start_repl_backend#41
   ╎    ╎     16  @REPL/src/REPL.jl:482  repl_backend_loop
   ╎    ╎    ╎ 16  @REPL/src/REPL.jl:370  eval_user_input
   ╎    ╎    ╎  16  @REPL/src/REPL.jl:345  toplevel_eval_with_hooks
   ╎    ╎    ╎   16  @REPL/src/REPL.jl:352  toplevel_eval_with_hooks
   ╎    ╎    ╎    16  @REPL/src/REPL.jl:352  toplevel_eval_with_hooks
   ╎    ╎    ╎     16  @REPL/rc/REPL.jl:352  toplevel_eval_with_hooks
   ╎    ╎    ╎    ╎ 16  @REPL/rc/REPL.jl:352  toplevel_eval_with_hooks
   ╎    ╎    ╎    ╎  16  @REPL/c/REPL.jl:348  toplevel_eval_with_hooks
   ╎    ╎    ╎    ╎   16  @REPL/c/REPL.jl:341  __repl_entry_eval_expanded_with
   ╎    ╎    ╎    ╎    6   REPL[3]:3  profile_test(n::Int64)
   ╎    ╎    ╎    ╎     6   @Random/mal.jl:278  randn
   ╎    ╎    ╎    ╎    ╎ 6   @Random/mal.jl:272  randn
   ╎    ╎    ╎    ╎    ╎  2   @Base/boot.jl:642  Array
   ╎    ╎    ╎    ╎    ╎   2   @Base/boot.jl:629  Array
   ╎    ╎    ╎    ╎    ╎    2   @Base/boot.jl:579  new_as_memoryref
  2╎    ╎    ╎    ╎    ╎     2   @Base/boot.jl:562  GenericMemory
   ╎    ╎    ╎    ╎    ╎  1   @Random/al.jl:257  randn!(rng::Random.TaskLocalR
   ╎    ╎    ╎    ╎    ╎   1   @Random/om.jl:269  rand!
   ╎    ╎    ╎    ╎    ╎    1   @Random/om.jl:269  rand!
   ╎    ╎    ╎    ╎    ╎     1   @Random/d.jl:303  rand!
   ╎    ╎    ╎    ╎    ╎    ╎ 1   @Random/d.jl:169  xoshiro_bulk
   ╎    ╎    ╎    ╎    ╎    ╎  1   @Random/….jl:170  xoshiro_bulk
   ╎    ╎    ╎    ╎    ╎    ╎   1   @Random/….jl:266  xoshiro_bulk_simd(rng::Ra
   ╎    ╎    ╎    ╎    ╎    ╎    1   @Base/er.jl:180  unsafe_store!
  1╎    ╎    ╎    ╎    ╎    ╎     1   @Base/er.jl:180  unsafe_store!
   ╎    ╎    ╎    ╎    ╎  3   @Random/al.jl:260  randn!(rng::Random.TaskLocalR
   ╎    ╎    ╎    ╎    ╎   2   @Random/al.jl:89  _randn
   ╎    ╎    ╎    ╎    ╎    1   @Base/int.jl:627  rem
   ╎    ╎    ╎    ╎    ╎     1   @Base/ors.jl:321  !=
   ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/int.jl:518  ==
  1╎    ╎    ╎    ╎    ╎    ╎  1   @Base/on.jl:639  ==
   ╎    ╎    ╎    ╎    ╎    1   @Base/tion.jl:430  *
  1╎    ╎    ╎    ╎    ╎     1   @Base/oat.jl:493  *
   ╎    ╎    ╎    ╎    ╎   1   @Random/al.jl:90  _randn
   ╎    ╎    ╎    ╎    ╎    1   @Base/int.jl:519  <
  1╎    ╎    ╎    ╎    ╎     1   @Base/int.jl:513  <
   ╎    ╎    ╎    ╎    4   REPL[3]:4  profile_test(n::Int64)
   ╎    ╎    ╎    ╎     4   @Base/cedim.jl:979  maximum
   ╎    ╎    ╎    ╎    ╎ 4   @Base/cedim.jl:979  #maximum#730
   ╎    ╎    ╎    ╎    ╎  4   @Base/edim.jl:983  _maximum
   ╎    ╎    ╎    ╎    ╎   4   @Base/edim.jl:983  #_maximum#732
   ╎    ╎    ╎    ╎    ╎    4   @Base/edim.jl:984  _maximum
   ╎    ╎    ╎    ╎    ╎     4   @Base/dim.jl:984  #_maximum#733
   ╎    ╎    ╎    ╎    ╎    ╎ 4   @Base/dim.jl:326  mapreduce
   ╎    ╎    ╎    ╎    ╎    ╎  4   @Base/im.jl:326  #mapreduce#715
   ╎    ╎    ╎    ╎    ╎    ╎   4   @Base/im.jl:334  _mapreduce_dim
   ╎    ╎    ╎    ╎    ╎    ╎    4   @Base/ce.jl:436  _mapreduce(f::typeof(ide
   ╎    ╎    ╎    ╎    ╎    ╎     2   @Base/ce.jl:642  mapreduce_impl(f::typeo
   ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2   @Base/ce.jl:615  _fast
  2╎    ╎    ╎    ╎    ╎    ╎    ╎  2   @Base/ls.jl:790  ifelse
   ╎    ╎    ╎    ╎    ╎    ╎     2   @Base/ce.jl:643  mapreduce_impl(f::typeo
   ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2   @Base/ce.jl:615  _fast
  2╎    ╎    ╎    ╎    ╎    ╎    ╎  2   @Base/ls.jl:790  ifelse
   ╎    ╎    ╎    ╎    5   REPL[3]:5  profile_test(n::Int64)
   ╎    ╎    ╎    ╎     5   @Base/array.jl:3300  mapslices
   ╎    ╎    ╎    ╎    ╎ 5   @Base/rray.jl:3361  mapslices(f::typeof(sum), A::
  1╎    ╎    ╎    ╎    ╎  3   @Base/rray.jl:3372  _inner_mapslices!(R::Array{F
   ╎    ╎    ╎    ╎    ╎   2   @Base/onal.jl:987  _unsafe_getindex!(::Vector{F
   ╎    ╎    ╎    ╎    ╎    2   @Base/onal.jl:977  macro expansion
   ╎    ╎    ╎    ╎    ╎     2   @Base/sian.jl:64  macro expansion
   ╎    ╎    ╎    ╎    ╎    ╎ 2   @Base/nal.jl:979  macro expansion
   ╎    ╎    ╎    ╎    ╎    ╎  2   @Base/ay.jl:1000  setindex!
  2╎    ╎    ╎    ╎    ╎    ╎   2   @Base/ay.jl:1005  _setindex!
   ╎    ╎    ╎    ╎    ╎  2   @Base/rray.jl:3373  _inner_mapslices!(R::Array{F
   ╎    ╎    ╎    ╎    ╎   2   @Base/edim.jl:979  sum
   ╎    ╎    ╎    ╎    ╎    2   @Base/edim.jl:979  #sum#722
   ╎    ╎    ╎    ╎    ╎     2   @Base/dim.jl:983  _sum
   ╎    ╎    ╎    ╎    ╎    ╎ 2   @Base/dim.jl:983  #_sum#724
   ╎    ╎    ╎    ╎    ╎    ╎  2   @Base/im.jl:984  _sum
   ╎    ╎    ╎    ╎    ╎    ╎   2   @Base/im.jl:984  #_sum#725
   ╎    ╎    ╎    ╎    ╎    ╎    2   @Base/im.jl:326  mapreduce
   ╎    ╎    ╎    ╎    ╎    ╎     2   @Base/im.jl:326  #mapreduce#715
   ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2   @Base/im.jl:334  _mapreduce_dim
   ╎    ╎    ╎    ╎    ╎    ╎    ╎  2   @Base/ce.jl:436  _mapreduce
   ╎    ╎    ╎    ╎    ╎    ╎    ╎   2   @Base/ce.jl:269  mapreduce_impl
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    2   @Base/ce.jl:255  mapreduce_impl(f::t
  1╎    ╎    ╎    ╎    ╎    ╎    ╎     2   @Base/op.jl:75  macro expansion
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/int.jl:83  <
   ╎    ╎    ╎    ╎    1   REPL[3]:7  profile_test(n::Int64)
   ╎    ╎    ╎    ╎     1   @Base/array.jl:3300  mapslices
   ╎    ╎    ╎    ╎    ╎ 1   @Base/rray.jl:3361  mapslices(f::typeof(sort), A:
   ╎    ╎    ╎    ╎    ╎  1   @Base/rray.jl:3373  _inner_mapslices!(R::Matrix{
   ╎    ╎    ╎    ╎    ╎   1   @Base/sort.jl:1737  sort
   ╎    ╎    ╎    ╎    ╎    1   @Base/sort.jl:1737  #sort#24
   ╎    ╎    ╎    ╎    ╎     1   @Base/sort.jl:1704  sort!
   ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/rt.jl:1711  #sort!#23
   ╎    ╎    ╎    ╎    ╎    ╎  1   @Base/rt.jl:561  _sort!
   ╎    ╎    ╎    ╎    ╎    ╎   1   @Base/rt.jl:683  _sort!
   ╎    ╎    ╎    ╎    ╎    ╎    1   @Base/rt.jl:744  _sort!
   ╎    ╎    ╎    ╎    ╎    ╎     1   @Base/rt.jl:799  _sort!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1   @Base/rt.jl:713  _sort!(v::Vector{Float
   ╎    ╎    ╎    ╎    ╎    ╎    ╎  1   @Base/rt.jl:775  _sort!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎   1   @Base/rt.jl:799  _sort!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    1   @Base/rt.jl:877  _sort!(v::Base.Rein
  1╎    ╎    ╎    ╎    ╎    ╎    ╎     1   @Base/rt.jl:898  _sort!
   ╎98  @Base/task.jl:839  task_done_hook(t::Task)
   ╎ 98  @Base/task.jl:1167  wait()
 9898  @Base/task.jl:1158  poptask(W::Base.IntrusiveLinkedListSynchronized{T
   ╎16  @REPL/src/LineEdit.jl:2868  #prompt!##216  @Base/lock.jl:294  macro expansion
   ╎  16  @REPL/src/LineEdit.jl:2878  macro expansion
   ╎   16  @REPL/src/LineEdit.jl:1728  (::REPL.LineEdit.var"#match_input##0#mat…
   ╎    16  @Base/essentials.jl:1046  invokelatest
   ╎     16  @Base/essentials.jl:1049  #invokelatest#1
   ╎    ╎ 16  @OhMyREPL/…rc/repl.jl:256  (::OhMyREPL.Prompt.var"#create_keybind…
   ╎    ╎  16  @REPL/src/REPL.jl:1205  do_respond
   ╎    ╎   16  @REPL/src/REPL.jl:1191  eval_with_backend
   ╎    ╎    16  @Base/channels.jl:487  take!
   ╎    ╎     16  @Base/channels.jl:493  take_buffered(c::Channel{Any})
   ╎    ╎    ╎ 16  @Base/condition.jl:136  wait
   ╎    ╎    ╎  16  @Base/condition.jl:141  wait(c::Base.GenericCondition{Reent
   ╎    ╎    ╎   16  @Base/task.jl:1167  wait()
 16╎    ╎    ╎    16  @Base/task.jl:1158  poptask(W::Base.IntrusiveLinkedListSy
Total snapshots: 246. Utilization: 54% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.

@IanButterworth
Copy link
Member

I've not marked those for backport as it'd be good to get confirmation that it's a good idea, but have prepared the backports here for testing #56358

@IanButterworth
Copy link
Member

Fixed in #56228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiler regression 1.11 Regression in the 1.11 release
Projects
None yet
Development

No branches or pull requests

5 participants