Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Base.Ordering for heap, and other performance improvements #547

Merged
merged 5 commits into from
Aug 17, 2020

Conversation

milesfrain
Copy link
Contributor

@milesfrain milesfrain commented Oct 25, 2019

This merge sequence should be followed for generating useful benchmark comparisons:

  1. Use PkgBenchmark to detect performance regressions for heaps #531
  2. Use separate files for package benchmarks #545
  3. Expand benchmarking in anticipation of heap ordering change #546
  4. Use Base.Ordering for heap, and other performance improvements #547

@oxinabox This change introduces API changes, and deprecation warnings should be added. Could we replace the previous deprecation warnings with a version bump?

@milesfrain
Copy link
Contributor Author

milesfrain commented Oct 25, 2019

Here are the expected benchmark results once the first 2 PRs are merged. There are occasional regressions for very small heaps (n = 10), but I suspect this is mostly due to Travis noise. Additional discussion of results.

Here are the local results:

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
                    ["SparseIntSet", "iterate two exclude one"] 0.88 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 0.39 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 0.69 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^1", "Min"] 0.26 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^3", "Min"] 0.59 (5%) ✅    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^1", "Min"] 0.89 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^1", "Min"] 0.93 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 0.89 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^3", "Min"] 0.86 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^1"] 1.14 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMaxHeap", "make", "Float64", "10^3"] 0.73 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^1"] 0.92 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMaxHeap", "pop", "Float64", "10^3"] 0.94 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^1"] 0.90 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMaxHeap", "push", "Float64", "10^3"] 0.90 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^1"] 1.09 (5%) ❌    1.00 (1%)
           ["heap", "BinaryMinHeap", "make", "Float64", "10^3"] 0.72 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^1"] 0.87 (5%) ✅    1.00 (1%)
            ["heap", "BinaryMinHeap", "pop", "Float64", "10^3"] 0.95 (5%) ✅    1.00 (1%)
           ["heap", "BinaryMinHeap", "push", "Float64", "10^3"] 0.91 (5%) ✅    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 0.44 (5%) ✅    1.00 (1%)
  ["heap", "MutableBinaryHeap", "make", "Int64", "10^1", "Min"] 0.32 (5%) ✅    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 0.49 (5%) ✅  0.57 (1%) ✅
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 0.59 (5%) ✅  0.57 (1%) ✅

@codecov
Copy link

codecov bot commented Oct 25, 2019

Codecov Report

Merging #547 into master will decrease coverage by 0.15%.
The diff coverage is 95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #547      +/-   ##
==========================================
- Coverage   87.82%   87.67%   -0.16%     
==========================================
  Files          32       32              
  Lines        2087     2053      -34     
==========================================
- Hits         1833     1800      -33     
+ Misses        254      253       -1
Impacted Files Coverage Δ
src/heaps.jl 100% <100%> (+8%) ⬆️
src/heaps/minmax_heap.jl 91.86% <100%> (+0.09%) ⬆️
src/heaps/arrays_as_heaps.jl 89.28% <100%> (+3.07%) ⬆️
src/heaps/mutable_binary_heap.jl 88.57% <100%> (ø) ⬆️
src/heaps/binary_heap.jl 88.88% <83.33%> (-11.12%) ⬇️
src/sparse_int_set.jl 95.78% <0%> (-0.13%) ⬇️
src/priorityqueue.jl 96.89% <0%> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ad2cf6...5bf4b0c. Read the comment docs.

@oxinabox
Copy link
Member

@oxinabox This change introduces API changes, and deprecation warnings should be added. Could we replace the previous deprecation warnings with a version bump?

Here is the rule for deprecations:

  1. Deprecations should be added in nonbreaking releases
  2. Deprecations should be removed in breaking releases.

So any time we bump the leading nonzero we can (and really should) remove all deprecations.

@milesfrain milesfrain changed the title Use Base.Ordering for heap, and other performance changes Use Base.Ordering for heap, and other performance improvements Oct 25, 2019
@milesfrain
Copy link
Contributor Author

@oxinabox I'm having trouble setting-up deprecation warnings, and could use some guidance.

Here's an example of the API change:

BinaryMinHeap{Int}() # old
BinaryMinHeap(Int)   # new

I was hoping something like this would work:

@deprecate BinaryMinHeap{T}() where {T} BinaryMinHeap(T)

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the getting rid of the LessThan and GreaterThan,
much better to use the Base types.

it seems like this PR is undoing a lot of the work of
#472
which I don't understand.

src/heaps.jl Show resolved Hide resolved
src/heaps.jl Outdated Show resolved Hide resolved
src/heaps/binary_heap.jl Show resolved Hide resolved
src/heaps/binary_heap.jl Outdated Show resolved Hide resolved
src/heaps/binary_heap.jl Outdated Show resolved Hide resolved
src/heaps/binary_heap.jl Outdated Show resolved Hide resolved
src/heaps/minmax_heap.jl Outdated Show resolved Hide resolved
src/heaps/mutable_binary_heap.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Member

@oxinabox I'm having trouble setting-up deprecation warnings, and could use some guidance.

I think you need @deprecate_binding maybe.
But lets hold off on deciding if we want to do that in the first place.

@milesfrain
Copy link
Contributor Author

@oxinabox @dillondaudert

I was unaware of #472, and don't want to undo previous changes, but types as arguments seems like a simpler way to construct heaps when passing either the element type or vector of elements and an optional ordering. Maybe ordering should be mandatory for BinaryHeap.

Here's an example of the API that I'm proposing:

BinaryMinHeap(Int)
BinaryMinHeap(rand(3))

BinaryHeap(rand(3))
BinaryHeap(Int)
BinaryHeap(rand(3), Base.Forward)
BinaryHeap(Int, Base.Forward)

Let me know what constructor API you'd like to see instead.

@oxinabox
Copy link
Member

oxinabox commented Oct 30, 2019

Let me know what constructor API you'd like to see instead.

I would like to see the current API,
but using Base.Forward etc rather than DataStructures.GreateThan etc

@dillondaudert
Copy link
Contributor

I agree that the API should match Base.

Also changing floating point numbers to a partial order could result in surprising behavior with NaNs, I think that should be considered when weighing the performance trade-off of isless.

@milesfrain
Copy link
Contributor Author

@dillondaudert
That's a good point about unexpected behavior with NaNs. I'll change this to use the slower and safer comparison by default, and document how users can substitute the riskier and faster comparison if they want better performance.

@oxinabox
Here's a branch which demonstrates a dilemma with replacing DataStructures.<GreaterThan/LessThan> with Base.<Forward/Reverse>.

A ForwardOrdering instance can easily be created from its type, but it's not straightforward to do this for ReverseOrdering. It seems that the instances Base.Forward and Base.Reverse should be passed into the constructors directly, but I don't think it is possible to tack on an extra ordering function parameter to BinaryHeap using the existing const Binary<Min/Max>Heap{T} type alias strategy, which leads me to believe that handling Base.Ordering requires undoing some of #472 to make Binary<Min/Max>Heap behave as a pseudoconstructor(?) and not as a parametric composite type constructor alias. Is it problematic for heap constructors to behave more like the array-constructing zeros(Int,3) and rand(Int,3), rather than Vector{Int}(undef,3)?

Here's a log showing the difficulties of constructing a ReverseOrdering instance from its type.

julia> Base.Forward
Base.Order.ForwardOrdering()

julia> Base.ForwardOrdering()
Base.Order.ForwardOrdering()


julia> Base.Reverse
Base.Order.ReverseOrdering{Base.Order.ForwardOrdering}(Base.Order.ForwardOrdering())

julia> Base.ReverseOrdering()
ERROR: MethodError: no method matching Base.Order.ReverseOrdering()

julia> Base.Order.ReverseOrdering{Base.Order.ForwardOrdering}()
ERROR: MethodError: no method matching Base.Order.ReverseOrdering{Base.Order.ForwardOrdering}()

@oxinabox
Copy link
Member

oxinabox commented Nov 1, 2019

Yeah, fair call.
To be clear, I think it is fine for BinaryHeap to take an ordering object instance, (it already does),
since then can have By orderings etc.
It isn't fine for it to take a Type{Int} etc argument.
As that puts us out of aligment for conventions for data structures.

We just need to add:
Base.Order.ReverseOrdering() = Base.Order.Reverse
like you say to fix this.
(I tested this on a local copy of your ord4 branch)
I will open a PR to Julia itself now to add that missing definition, and see how that goes.
Then if it is accepted we can add a compatability definition for it.

@oxinabox
Copy link
Member

@milesfrain sorry for the delay, was waiting to see if anyone upstream in JuliaLang objected to the addition of Reverse()

They did not.
So we can now proceed as if it were a thing.

I will have it added to Compat.jl
and then we can just load Compat.jl and it will be a thing in all julia versions

@milesfrain
Copy link
Contributor Author

@oxinabox No worries. I was also waiting to see what happens with the Reverse() PR. I'll continue working through the requested changes.

@oxinabox
Copy link
Member

Ok, @milesfrain you can now add a Compat v3 dependency, and start using the ReverseOrdering() constructor

@milesfrain
Copy link
Contributor Author

@oxinabox

I believe this latest changeset address all of your latest review feedback, but let me know if I missed anything, or if you have additional suggestions.

I need some help troubleshooting why Travis is failing with Compat V3 and Julia 0.7.

Local benchmark results:

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
                        ["SparseIntSet", "pop push worst case"] 0.93 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 1.14 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 0.68 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^3", "Min"] 0.62 (5%) ✅    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 0.91 (5%) ✅    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^3", "Min"] 0.94 (5%) ✅    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^1", "Min"] 0.84 (5%) ✅    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^3", "Min"] 0.92 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 0.93 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^1", "Min"] 0.95 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^3", "Min"] 0.91 (5%) ✅    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 1.13 (5%) ❌    1.00 (1%)
   ["heap", "MutableBinaryHeap", "pop", "Int64", "10^3", "Min"] 1.09 (5%) ❌    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 0.44 (5%) ✅  0.57 (1%) ✅
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 0.44 (5%) ✅  0.57 (1%) ✅

@oxinabox
Copy link
Member

We should just drop support for Julia 0.7.
For now can just delete it from Travis

@milesfrain
Copy link
Contributor Author

@oxinabox
I'm ready for a re-review.

Regarding the CI errors:

  • There's an incompatibility with nightly Julia for Travis and AppVeyor. It's passing on stable Julia.
  • Coveralls reports a lack of coverage for FasterForward, even though it's exercised in tests.

See the "construct heap" tests for examples of all the available heap constructors. Both immutable and mutable heaps match the requested API discussed earlier, but MinMax heap is missing a way to specify an element type along with an array of elements (e.g. BinaryMinMaxHeap{Int}(vs)). Should MinMax heap be changed as part of this PR? It is untouched otherwise.

@dillondaudert
Copy link
Contributor

I don't see any issue with adding BinaryMinMaxHeap{T}(xs::AbstractVector{T}) where T as a constructor.

@oxinabox
Copy link
Member

oxinabox commented Dec 3, 2019

Nightly is failing. because Julia nightly is I think a bit borked.
we should allow failures on nightly.

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a quick-look,
but I haven't time to review this properly in the next 2 weeks.

@ararslan @kmsquire @scls19fr
could one of you give this a final look and bring it home?

Last thing is deprecations.
SemVer does not require us to do deprecations, if we are tagging this as a breaking change.
Which I think we should because I don't think we can easily depprecate it fully.
Maybe we can (if so that would be great)

Best instructions that exist from how to do deprecations. is the source:
https://github.com/JuliaLang/julia/blob/master/base/deprecated.jl

And examples in old versions of that file:
https://github.com/JuliaLang/julia/blob/v0.7.0/base/deprecated.jl#L398

@deprecate_binding is really powerful.

benchmark/bench_heap.jl Show resolved Hide resolved
src/heaps.jl Outdated Show resolved Hide resolved
src/heaps/binary_heap.jl Show resolved Hide resolved
appveyor.yml Outdated Show resolved Hide resolved
src/heaps.jl Outdated Show resolved Hide resolved
src/heaps/binary_heap.jl Show resolved Hide resolved
src/heaps/binary_heap.jl Show resolved Hide resolved
@jzazo
Copy link

jzazo commented Jun 2, 2020

What's missing for this to be tagged?
Should I stick with this for the time being? Thanks!

@milesfrain
Copy link
Contributor Author

milesfrain commented Jun 2, 2020

What's missing for this to be tagged?

I was just waiting on additional feedback:

@ararslan @kmsquire @scls19fr
could one of you give this a final look and bring it home?

But perhaps I was being a bit too patient. When reviewers are ready, I'll rebase.

Last thing is deprecations.
SemVer does not require us to do deprecations, if we are tagging this as a breaking change.
Which I think we should because I don't think we can easily depprecate it fully.
Maybe we can (if so that would be great)

I was hoping we could just make this a breaking change, rather than add partial deprecations. I don't think everything in this new version will translate fully via deprecations.

@oxinabox oxinabox mentioned this pull request Jun 3, 2020
3 tasks
@oxinabox
Copy link
Member

oxinabox commented Jun 3, 2020

Ok, I have taken another look.
this still looks good to me.
Rebase it, set the version to 0.18.0-DEV
and I will look to making a breaking release this weekend with this change.

@milesfrain
Copy link
Contributor Author

This is ready for review.

Latest benchmark results show some minor regressions for the small (10^1 element) heaps. Might just be noise, since some differences are also reported for the untouched SparseIntSet.

                                                             ID  time ratio memory ratio
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––– ––––––––––––
                                ["SparseIntSet", "create_fill"] 0.90 (5%) ✅    1.00 (1%)
                        ["SparseIntSet", "pop push worst case"] 1.11 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^1", "Min"] 1.24 (5%) ❌    1.00 (1%)
       ["heap", "BinaryHeap", "make", "Float64", "10^3", "Min"] 0.79 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^1", "Min"] 1.16 (5%) ❌    1.00 (1%)
         ["heap", "BinaryHeap", "make", "Int64", "10^3", "Min"] 0.81 (5%) ✅    1.00 (1%)
        ["heap", "BinaryHeap", "pop", "Float64", "10^1", "Min"] 1.42 (5%) ❌    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^1", "Min"] 1.47 (5%) ❌    1.00 (1%)
          ["heap", "BinaryHeap", "pop", "Int64", "10^3", "Min"] 0.90 (5%) ✅    1.00 (1%)
       ["heap", "BinaryHeap", "push", "Float64", "10^3", "Min"] 0.94 (5%) ✅    1.00 (1%)
         ["heap", "BinaryHeap", "push", "Int64", "10^3", "Min"] 0.95 (5%) ✅    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^1", "Min"] 1.19 (5%) ❌    1.00 (1%)
["heap", "MutableBinaryHeap", "make", "Float64", "10^3", "Min"] 0.93 (5%) ✅    1.00 (1%)
["heap", "MutableBinaryHeap", "push", "Float64", "10^1", "Min"] 0.94 (5%) ✅    1.00 (1%)
                 ["heap", "nlargest", "a=rand(10^4)", "n=10^2"] 0.71 (5%) ✅  0.57 (1%) ✅
                ["heap", "nsmallest", "a=rand(10^4)", "n=10^2"] 0.51 (5%) ✅  0.57 (1%) ✅

@milesfrain
Copy link
Contributor Author

I'm also curious what the best local testing workflow is.
test/README suggests running julia runtests.jl binheap, but that command seems to be missing something. Best I could come up with is:

julia -e 'using Pkg; Pkg.activate("."); include("test/runtests.jl")'

With a temporary edit of the tests array in runtests.jl to just select binheap.

@oxinabox
Copy link
Member

oxinabox commented Jun 5, 2020

julia --project="." then ] test in the REPL is typical.
julia --project="." -e "using Pkg; Pkg.test()" should work if you don't want to enter the REPL

I think test/README is just wrong and very outdated

@oxinabox
Copy link
Member

Sorry for they delay in this, just hoping to get a few more deprecations in before making the breaking change

@jlumpe
Copy link
Contributor

jlumpe commented Aug 3, 2020

I didn't see this PR and essentially wrote a duplicate (#646) of it. Linking it here in case you would like to compare the two.

@milesfrain
Copy link
Contributor Author

Bummer about your duplicated effort @jlumpe . I've made that same mistake myself a few times.

@oxinabox
Copy link
Member

oxinabox commented Aug 4, 2020

Ok finally we are in a position in master that we can merge this.
Master is now 0.18-DEV.
so if you can rebase this against master and then we will merge.

@oxinabox
Copy link
Member

oh, i didn't see this had been rebased.
Yay, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants