Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dynamic dispatch with runtime branch on rev keyword in sortperm #47966

Merged
merged 9 commits into from
Dec 28, 2022

Conversation

LilithHafner
Copy link
Member

@LilithHafner LilithHafner commented Dec 22, 2022

This eliminates unexpected allocations in sortperm and sortperm!, sometimes substantially improving runtime for small inputs:

julia> v = rand(10); ix = similar(eachindex(v));

julia> @btime sortperm!(copyto!($ix, eachindex($v)), $v);
  571.896 ns (3 allocations: 48 bytes) => 106.496 ns (0 allocations: 0 bytes)

julia> @btime sortperm($v);
  598.118 ns (4 allocations: 192 bytes) => 139.678 ns (1 allocation: 144 bytes)

Fixes #47949.


Outdated OP:

I think this might avoid compiler heuristics that arise due to keyword arguments in sortperm, but I still don't understand why the allocations were there in the first place. Fixes #47949.

- add strange hacks to avoid compiler heuristics?
- add allocation tests

The situation is still not perfect (e.g. @test_broken 0 == @allocations sortperm!(i, v, rev=true)), but it is better according to allocation tests and some benchmarks.

@LilithHafner LilithHafner added performance Must go faster sorting Put things in order backport 1.9 Change should be backported to release-1.9 labels Dec 22, 2022
@LilithHafner
Copy link
Member Author

@nanosoldier runbenchmarks("sort", vs=":master")

@LilithHafner LilithHafner marked this pull request as ready for review December 22, 2022 12:20
@LilithHafner
Copy link
Member Author

This improves the situation with #47811, but a regression remains for sizes 32–256.

julia> for i in 1:12
           n = 2^i
           print(lpad(n, 4)); @btime sortperm(x) setup=(x=rand($n));
       end
   2  224.951 ns (2 allocations: 96 bytes)  => 481.651 ns (4 allocations: 128 bytes) => 52.206 ns (1 allocation: 80 bytes)
   4  236.767 ns (2 allocations: 112 bytes) => 494.113 ns (4 allocations: 144 bytes) => 64.007 ns (1 allocation: 96 bytes)
   8  261.952 ns (2 allocations: 144 bytes) => 523.632 ns (4 allocations: 176 bytes) => 87.860 ns (1 allocation: 128 bytes)
  16  315.431 ns (2 allocations: 208 bytes) => 726.031 ns (5 allocations: 432 bytes) => 246.363 ns (2 allocations: 384 bytes)
  32  482.479 ns (2 allocations: 352 bytes) => 1.218 μs (5 allocations: 720 bytes)   => 684.515 ns (2 allocations: 672 bytes)
  64  858.757 ns (2 allocations: 592 bytes) => 2.011 μs (5 allocations: 1.17 KiB)    => 1.609 μs (2 allocations: 1.12 KiB)
 128  2.150 μs (2 allocations: 1.08 KiB)    => 3.818 μs (5 allocations: 2.17 KiB)    => 3.446 μs (2 allocations: 2.12 KiB)
 256  5.314 μs (2 allocations: 2.14 KiB)    => 7.808 μs (5 allocations: 4.30 KiB)    => 7.536 μs (2 allocations: 4.25 KiB)
 512  15.903 μs (2 allocations: 4.14 KiB)   => 15.624 μs (5 allocations: 8.30 KiB)   => 15.077 μs (2 allocations: 8.25 KiB)
1024  45.238 μs (2 allocations: 8.14 KiB)   => 32.604 μs (5 allocations: 16.30 KiB)  => 33.118 μs (2 allocations: 16.25 KiB)
2048  99.861 μs (2 allocations: 16.14 KiB)  => 69.427 μs (5 allocations: 32.30 KiB)  => 70.779 μs (2 allocations: 32.25 KiB)
4096  219.132 μs (3 allocations: 32.06 KiB) => 150.960 μs (7 allocations: 64.14 KiB) => 152.937 μs (4 allocations: 64.09 KiB)

julia> VERSION
v"1.8.3" => v"1.10.0-DEV.181" => PR

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@LilithHafner
Copy link
Member Author

Performance diff vs master on JuliaCI/BaseBenchmarks.jl#305 is free of regressions and shows some improvements

"insertionsort", "sortperm! reverse" => TrialJudgement(-23.35% => improvement)
"insertionsort", "sortperm forwards" => TrialJudgement(-18.31% => invariant)
"length = 3", "sortperm(rand(length))" => TrialJudgement(-80.03% => improvement)
"length = 10", "sortperm(rand(length))" => TrialJudgement(-68.33% => improvement)
"length = 10", "mixed eltype with by order" => TrialJudgement(-20.71% => improvement)
"length = 30", "sortperm(rand(length))" => TrialJudgement(-44.86% => improvement)
"length = 100", "sortperm(rand(length))" => TrialJudgement(-13.21% => invariant)
"issues", "sortperm on a view (Float64)" => TrialJudgement(-9.12% => invariant)

@LilithHafner
Copy link
Member Author

LilithHafner commented Dec 22, 2022

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

As expected, sortperm! allocations go to 0 for insertion & quick sorts when not reversed

@nanosoldier runbenchmarks("sort", vs = ":release-1.8")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@LilithHafner
Copy link
Member Author

@nanosoldier runbenchmarks("sort", vs = ":release-1.8")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@LilithHafner
Copy link
Member Author

There are no remaining regressions vs 1.8 in BaseBenchmarks' sortperm and sorpterm! benchmarks!

Some sort regressions are still detected and some sortperm regressions remain and are detected by the more diverse JuliaCI/BaseBenchmarks.jl#305 benchmarks.

@LilithHafner
Copy link
Member Author

Bump

@oscardssmith
Copy link
Member

ready to merge?

@LilithHafner
Copy link
Member Author

LilithHafner commented Dec 27, 2022

I think so; I'm not entirely sure how this PR improves the situation, but I'm pretty confident that it does. I imagine compiler heuristics about inlining and constant propagation are involved.

I'd be comfortable with merging this as is, but it would be better to have someone who understands the exact mechanisms at play here.

base/sort.jl Outdated
end

# TODO stop using these three hacks
# but check performance, especially unexpected allocations, when removing
Base.@assume_effects :nothrow very_unsafe_copyto!(a, b) = a .= b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is scary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... @aviatesk, is this sort of thing acceptable or is there another way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local test shows that perhaps @noinline is enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we do code review. Thanks!

I think I introduced these when they were necessary and then added branching on rev which is more impactful and fixes the root of the issue so now none of these hacks are necessary at all, and CI checks for unexpected allocations automatically :)

LilithHafner and others added 2 commits December 27, 2022 10:23
Thanks @N5N3 for pointing out that they are unnecessary
@LilithHafner LilithHafner added the needs nanosoldier run This PR should have benchmarks run on it label Dec 27, 2022
@LilithHafner LilithHafner changed the title Reduce unexpected allocations in sortperm Replace dynamic dispatch with runtime branch on rev keyword in sortperm Dec 27, 2022
@LilithHafner
Copy link
Member Author

@nanosoldier runbenchmarks("sort", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@KristofferC
Copy link
Member

This is good to go now, right?

@LilithHafner
Copy link
Member Author

Comparing to before the recent changes, when performance was good to go but there were still unnecessary unsafe operations.

@nanosoldier runbenchmarks("sort", vs="@4191e27f71ee7a32e5e6b2a932603ce341070ed1")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@LilithHafner
Copy link
Member Author

lgtm

@LilithHafner LilithHafner removed the needs nanosoldier run This PR should have benchmarks run on it label Dec 27, 2022
@KristofferC KristofferC merged commit 6ac5159 into master Dec 28, 2022
@KristofferC KristofferC deleted the sortperm-allocs branch December 28, 2022 19:47
KristofferC pushed a commit that referenced this pull request Dec 28, 2022
@KristofferC KristofferC mentioned this pull request Dec 28, 2022
14 tasks
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sorting Put things in order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sortperm! allocates due to inference/type stability issues
5 participants