Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use safer computation of midpoint in sorting (fixes #33977) #34106

Merged
merged 1 commit into from
Dec 23, 2019

Conversation

timholy
Copy link
Member

@timholy timholy commented Dec 15, 2019

Fixes #33977

I've tested that ÷2 is as fast as a shift locally, but let's see what nanosoldier says:
@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`make -j3 USECCACHE=1 USE_BINARYBUILDER_LLVM=0`, ProcessExited(2)) [2]

Logs and partial data can be found here
cc @ararslan

@timholy
Copy link
Member Author

timholy commented Dec 15, 2019

Hmm, I am getting

$ make
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   501  100   501    0     0   1318      0 --:--:-- --:--:-- --:--:--  1314
100 1423k  100 1423k    0     0  1562k      0 --:--:-- --:--:-- --:--:-- 1562k
===============================================================================
  ERROR: sha512 checksum failure on SuiteSparse.v5.4.0-2.x86_64-linux-gnu.tar.gz, should be:
      841050c51b5965dc3ccb5b933df09d33e5912b435c39cf67098aed4d30386cf1
      9825433aea2b2af4668f7358f49ddc178444831eeeeb996bb35328c51bb2441d
  But `sha512sum /home/tim/src/julia-master/deps/srccache/SuiteSparse.v5.4.0-2.x86_64-linux-gnu.tar.gz | awk '{ print $1; }'` results in:
      7b49f0bd1a20104df6993e949ab0e8aa25b7ea64250a726e9c1b0312debdbb87
      c949e4449790de4eaffd072a671da32da8a5301ee62b589ce05ea350f65a3e3e
  This can happen due to bad downloads or network proxies, please check your
  network proxy/firewall settings and delete
  /home/tim/src/julia-master/deps/srccache/SuiteSparse.v5.4.0-2.x86_64-linux-gnu.tar.gz
  to force a redownload when you are ready
===============================================================================
/home/tim/src/julia-master/deps/suitesparse.mk:144: recipe for target '/home/tim/src/julia-master/usr/manifest/suitesparse' failed
make[1]: *** [/home/tim/src/julia-master/usr/manifest/suitesparse] Error 2
Makefile:60: recipe for target 'julia-deps' failed
make: *** [julia-deps] Error 2

Anyone else seeing this? (Yes, I tried deleting that file, but it happened again.) Even a make cleanall didn't fix it.

@StefanKarpinski
Copy link
Member

julia> lo, hi = typemax(Int) - 4, typemax(Int)
(9223372036854775803, 9223372036854775807)

julia> (lo + hi) >>> 1
9223372036854775805

julia> (lo + hi) ÷ 2
-3

@timholy
Copy link
Member Author

timholy commented Dec 15, 2019

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`make -j3 USECCACHE=1 USE_BINARYBUILDER_LLVM=0`, ProcessExited(2)) [2]

Logs and partial data can be found here
cc @ararslan

@timholy
Copy link
Member Author

timholy commented Dec 15, 2019

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

Some of those regressions were real. Let's see if Knuth can come to the rescue.

@nanosoldier runbenchmarks(ALL, vs=":master")

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

@nanosoldier runbenchmarks(ALL, vs=":master")

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

From local measurements I expect a slight regression (e.g., BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]] of 30%), but much better than the 2x from before. And if that's not tolerable, I think the only other option is to decree that 64 bits is enough to never overflow and make sure we're always doing this with 64-bit integers.

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`make -j3 USECCACHE=1 USE_BINARYBUILDER_LLVM=0`, ProcessExited(2)) [2]

Logs and partial data can be found here
cc @ararslan

@StefanKarpinski
Copy link
Member

I wonder if it would be better to do something like this:

lo + ((hi - lo) >>> 1)

It does pass all the tests, for what it's worth.

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

In my tests it's the same speed. But I guess the Knuth method is necessary in C where they don't have >>>, whereas we do. Certainly the method you propose is more transparent.

I guess one advantage of the Knuth method is that it gives

julia> Base.Sort.midpoint(3, 1)
2

whereas lo + ((hi - lo) >>> 1) yields

julia> Base.Sort.midpoint(3, 1)
-9223372036854775806

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

Actually, I take that back about them being the same speed. The results are pretty variable, so I have to run it many times to be sure, but a fairly typical result of the Knuth method is

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     7.694 μs (0.00% GC)
  median time:      8.395 μs (0.00% GC)
  mean time:        8.450 μs (0.00% GC)
  maximum time:     216.727 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

but if I write this as lo + ((hi - lo) >>> 0x01) (note the 0x01 instead of 1) typically yields something like

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     5.884 μs (0.00% GC)
  median time:      6.393 μs (0.00% GC)
  mean time:        7.154 μs (0.00% GC)
  maximum time:     4.049 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

If I write the shift with 1 instead of 0x01, the advantage disappears. Interestingly, the Knuth method doesn't benefit as much (or at all) from 1->0x01.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 16, 2019

Yes, the version I posted does require that lo ≤ hi. It is kind of nice if the midpoint function can take the arguments in either order though. Knuth's method is pretty clean, so 👍.

Another consideration, however is whether the definition makes sense for any integer type—it's usually good to think about bigints. I'm not sure the lo ⊻ hi operation makes sense for bigints since it would require an infinite number of leading ones. Perhaps only use this method for BigIntegers and use something like this for the most general definition:

function midpoint(a::Integer, b::Integer)
    lo, hi = minmax(a, b)
    lo + ((hi - lo) >> 1)
end

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

Here's a breakdown of algorithms and timing. All 3 are overflow- and negative-safe, but only Algs 1 and 3 are safe for either ordering of inputs. master is overflow- but not negative-safe (and it's overflow-safe only for Signed integers).

master

midpoint(lo::T, hi::T) where T<:Integer = (lo + hi) >>> 1
julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     5.820 μs (0.00% GC)
  median time:      7.173 μs (0.00% GC)
  mean time:        7.355 μs (0.00% GC)
  maximum time:     579.311 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Alg 1

function midpoint(a::T, b::T) where T<:Integer
    lo, hi = minmax(a, b)
    return lo + ((hi - lo) >>> 0x01)
end
julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     9.932 μs (0.00% GC)
  median time:      10.518 μs (0.00% GC)
  mean time:        10.589 μs (0.00% GC)
  maximum time:     331.517 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Alg 2

midpoint(lo::T, hi::T) where T<:Integer = lo + ((hi - lo) >>> 0x01)
julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     5.736 μs (0.00% GC)
  median time:      6.173 μs (0.00% GC)
  mean time:        6.530 μs (0.00% GC)
  maximum time:     303.521 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Alg 3

midpoint(lo::T, hi::T) where T<:Integer = (lo & hi) + ((lo  hi) >> 0x01)
```julia
julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     7.396 μs (0.00% GC)
  median time:      8.072 μs (0.00% GC)
  mean time:        8.642 μs (0.00% GC)
  maximum time:     2.117 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

As you can see, Alg 2 ≈ master are the fastest, followed by Alg 3, followed by Alg 1.

@StefanKarpinski
Copy link
Member

If it's only for internal use, it seems like we can safely assume that lo ≤ hi in all usages, and use Alg 2 without losing any performance. If we're going to make this a public function, which may be a good idea since if we're having this problem, so will other users, then it might be better to go with Alg 3, which balances performance with argument order independence.

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

I tend to agree. Perhaps we should do both: implement Alg2 in Base.Sort, but to prevent abuse simultaneously add Alg3 as a method in Base. Shall we export it?

Edit: there's also the issue, though, that so far we've only been worrying about Integer.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@timholy
Copy link
Member Author

timholy commented Dec 16, 2019

@nanosoldier runbenchmarks(ALL, vs=":master")

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 17, 2019

I’m sure float midpoint is gonna be even more fun :grimace:

@timholy
Copy link
Member Author

timholy commented Dec 17, 2019

I've already added a candidate implementation here. I kept it pretty simple.

@StefanKarpinski
Copy link
Member

Do we already have a middle function for floating point and others? It seems weird that the integer version rounds down while the float version computes the middle. Just smells a bit like there are actually different functions.

@ararslan
Copy link
Member

Do we already have a middle function for floating point and others?

Yes, but it lives in Statistics.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@timholy
Copy link
Member Author

timholy commented Dec 17, 2019

Benchmarks look clean, so if folks are happy with the code & tests then this seems good to go.

@StefanKarpinski
Copy link
Member

It seems like it would be best to separate bug fix part here without any API changes, which would also be backportable, from the new API. The bug fix can be merged right away and backported while the API change can be discussed further.

@timholy
Copy link
Member Author

timholy commented Dec 22, 2019

I read your mind ahead of time 😄. The first commit is the bugfix, the second commit is the API change. You could merge just the first commit.

@StefanKarpinski
Copy link
Member

Ok, I’m not in a position to do that just now but if you want to separate them and merge the bug fix when it passes CI, that would be 👌🏼

@timholy timholy merged commit 4a8ea8c into master Dec 23, 2019
@timholy timholy deleted the teh/fix33977 branch December 23, 2019 14:25
@KristofferC
Copy link
Member

This looks like it was merged before branching for 1.4 so removing that label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

searchsorted segfaults with offset arrays
6 participants