Use safer computation of midpoint in sorting (fixes #33977) #34106

timholy · 2019-12-15T12:45:25Z

I've tested that ÷2 is as fast as a shift locally, but let's see what nanosoldier says:
@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2019-12-15T12:46:09Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`make -j3 USECCACHE=1 USE_BINARYBUILDER_LLVM=0`, ProcessExited(2)) [2]

Logs and partial data can be found here
cc @ararslan

timholy · 2019-12-15T12:56:27Z

Hmm, I am getting

$ make
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   501  100   501    0     0   1318      0 --:--:-- --:--:-- --:--:--  1314
100 1423k  100 1423k    0     0  1562k      0 --:--:-- --:--:-- --:--:-- 1562k
===============================================================================
  ERROR: sha512 checksum failure on SuiteSparse.v5.4.0-2.x86_64-linux-gnu.tar.gz, should be:
      841050c51b5965dc3ccb5b933df09d33e5912b435c39cf67098aed4d30386cf1
      9825433aea2b2af4668f7358f49ddc178444831eeeeb996bb35328c51bb2441d
  But `sha512sum /home/tim/src/julia-master/deps/srccache/SuiteSparse.v5.4.0-2.x86_64-linux-gnu.tar.gz | awk '{ print $1; }'` results in:
      7b49f0bd1a20104df6993e949ab0e8aa25b7ea64250a726e9c1b0312debdbb87
      c949e4449790de4eaffd072a671da32da8a5301ee62b589ce05ea350f65a3e3e
  This can happen due to bad downloads or network proxies, please check your
  network proxy/firewall settings and delete
  /home/tim/src/julia-master/deps/srccache/SuiteSparse.v5.4.0-2.x86_64-linux-gnu.tar.gz
  to force a redownload when you are ready
===============================================================================
/home/tim/src/julia-master/deps/suitesparse.mk:144: recipe for target '/home/tim/src/julia-master/usr/manifest/suitesparse' failed
make[1]: *** [/home/tim/src/julia-master/usr/manifest/suitesparse] Error 2
Makefile:60: recipe for target 'julia-deps' failed
make: *** [julia-deps] Error 2

Anyone else seeing this? (Yes, I tried deleting that file, but it happened again.) Even a make cleanall didn't fix it.

StefanKarpinski · 2019-12-15T18:51:56Z

julia> lo, hi = typemax(Int) - 4, typemax(Int)
(9223372036854775803, 9223372036854775807)

julia> (lo + hi) >>> 1
9223372036854775805

julia> (lo + hi) ÷ 2
-3

timholy · 2019-12-15T20:33:22Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2019-12-15T20:33:56Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`make -j3 USECCACHE=1 USE_BINARYBUILDER_LLVM=0`, ProcessExited(2)) [2]

Logs and partial data can be found here
cc @ararslan

base/sort.jl

timholy · 2019-12-15T22:32:08Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2019-12-16T06:02:43Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

timholy · 2019-12-16T13:23:39Z

Some of those regressions were real. Let's see if Knuth can come to the rescue.

@nanosoldier runbenchmarks(ALL, vs=":master")

timholy · 2019-12-16T13:40:06Z

@nanosoldier runbenchmarks(ALL, vs=":master")

timholy · 2019-12-16T13:44:05Z

From local measurements I expect a slight regression (e.g., BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]] of 30%), but much better than the 2x from before. And if that's not tolerable, I think the only other option is to decree that 64 bits is enough to never overflow and make sure we're always doing this with 64-bit integers.

nanosoldier · 2019-12-16T13:52:03Z

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`make -j3 USECCACHE=1 USE_BINARYBUILDER_LLVM=0`, ProcessExited(2)) [2]

Logs and partial data can be found here
cc @ararslan

StefanKarpinski · 2019-12-16T16:10:09Z

I wonder if it would be better to do something like this:

lo + ((hi - lo) >>> 1)

It does pass all the tests, for what it's worth.

timholy · 2019-12-16T16:25:10Z

In my tests it's the same speed. But I guess the Knuth method is necessary in C where they don't have >>>, whereas we do. Certainly the method you propose is more transparent.

I guess one advantage of the Knuth method is that it gives

julia> Base.Sort.midpoint(3, 1)
2

whereas lo + ((hi - lo) >>> 1) yields

julia> Base.Sort.midpoint(3, 1)
-9223372036854775806

timholy · 2019-12-16T16:45:59Z

Actually, I take that back about them being the same speed. The results are pretty variable, so I have to run it many times to be sure, but a fairly typical result of the Knuth method is

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     7.694 μs (0.00% GC)
  median time:      8.395 μs (0.00% GC)
  mean time:        8.450 μs (0.00% GC)
  maximum time:     216.727 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

but if I write this as lo + ((hi - lo) >>> 0x01) (note the 0x01 instead of 1) typically yields something like

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     5.884 μs (0.00% GC)
  median time:      6.393 μs (0.00% GC)
  mean time:        7.154 μs (0.00% GC)
  maximum time:     4.049 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

If I write the shift with 1 instead of 0x01, the advantage disappears. Interestingly, the Knuth method doesn't benefit as much (or at all) from 1->0x01.

StefanKarpinski · 2019-12-16T16:49:26Z

Yes, the version I posted does require that lo ≤ hi. It is kind of nice if the midpoint function can take the arguments in either order though. Knuth's method is pretty clean, so 👍.

Another consideration, however is whether the definition makes sense for any integer type—it's usually good to think about bigints. I'm not sure the lo ⊻ hi operation makes sense for bigints since it would require an infinite number of leading ones. Perhaps only use this method for BigIntegers and use something like this for the most general definition:

function midpoint(a::Integer, b::Integer)
    lo, hi = minmax(a, b)
    lo + ((hi - lo) >> 1)
end

timholy · 2019-12-16T17:37:10Z

Here's a breakdown of algorithms and timing. All 3 are overflow- and negative-safe, but only Algs 1 and 3 are safe for either ordering of inputs. master is overflow- but not negative-safe (and it's overflow-safe only for Signed integers).

master

midpoint(lo::T, hi::T) where T<:Integer = (lo + hi) >>> 1

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     5.820 μs (0.00% GC)
  median time:      7.173 μs (0.00% GC)
  mean time:        7.355 μs (0.00% GC)
  maximum time:     579.311 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Alg 1

function midpoint(a::T, b::T) where T<:Integer
    lo, hi = minmax(a, b)
    return lo + ((hi - lo) >>> 0x01)
end

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     9.932 μs (0.00% GC)
  median time:      10.518 μs (0.00% GC)
  mean time:        10.589 μs (0.00% GC)
  maximum time:     331.517 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Alg 2

midpoint(lo::T, hi::T) where T<:Integer = lo + ((hi - lo) >>> 0x01)

julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     5.736 μs (0.00% GC)
  median time:      6.173 μs (0.00% GC)
  mean time:        6.530 μs (0.00% GC)
  maximum time:     303.521 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Alg 3

midpoint(lo::T, hi::T) where T<:Integer = (lo & hi) + ((lo ⊻ hi) >> 0x01)

```julia
julia> run(BaseBenchmarks.SUITE[["sparse", "index", ("spmat", "row", "logical", 1000)]])
BenchmarkTools.Trial: 
  memory estimate:  4.81 KiB
  allocs estimate:  11
  --------------
  minimum time:     7.396 μs (0.00% GC)
  median time:      8.072 μs (0.00% GC)
  mean time:        8.642 μs (0.00% GC)
  maximum time:     2.117 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

As you can see, Alg 2 ≈ master are the fastest, followed by Alg 3, followed by Alg 1.

StefanKarpinski · 2019-12-16T18:06:43Z

If it's only for internal use, it seems like we can safely assume that lo ≤ hi in all usages, and use Alg 2 without losing any performance. If we're going to make this a public function, which may be a good idea since if we're having this problem, so will other users, then it might be better to go with Alg 3, which balances performance with argument order independence.

timholy · 2019-12-16T18:50:35Z

I tend to agree. Perhaps we should do both: implement Alg2 in Base.Sort, but to prevent abuse simultaneously add Alg3 as a method in Base. Shall we export it?

Edit: there's also the issue, though, that so far we've only been worrying about Integer.

nanosoldier · 2019-12-16T21:11:58Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

timholy · 2019-12-16T23:15:04Z

@nanosoldier runbenchmarks(ALL, vs=":master")

StefanKarpinski · 2019-12-17T00:10:26Z

I’m sure float midpoint is gonna be even more fun :grimace:

timholy · 2019-12-17T00:20:12Z

I've already added a candidate implementation here. I kept it pretty simple.

StefanKarpinski · 2019-12-17T00:25:59Z

Do we already have a middle function for floating point and others? It seems weird that the integer version rounds down while the float version computes the middle. Just smells a bit like there are actually different functions.

ararslan · 2019-12-17T00:26:50Z

Do we already have a middle function for floating point and others?

Yes, but it lives in Statistics.

nanosoldier · 2019-12-17T06:58:24Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

timholy · 2019-12-17T09:21:05Z

Benchmarks look clean, so if folks are happy with the code & tests then this seems good to go.

base/floatfuncs.jl

StefanKarpinski · 2019-12-22T14:06:40Z

It seems like it would be best to separate bug fix part here without any API changes, which would also be backportable, from the new API. The bug fix can be merged right away and backported while the API change can be discussed further.

timholy · 2019-12-22T14:09:46Z

I read your mind ahead of time 😄. The first commit is the bugfix, the second commit is the API change. You could merge just the first commit.

StefanKarpinski · 2019-12-22T14:14:22Z

Ok, I’m not in a position to do that just now but if you want to separate them and merge the bug fix when it passes CI, that would be 👌🏼

KristofferC · 2020-01-21T12:49:01Z

This looks like it was merged before branching for 1.4 so removing that label.

timholy force-pushed the teh/fix33977 branch from 1e8d5f4 to 5664788 Compare December 15, 2019 20:32

StefanKarpinski reviewed Dec 15, 2019

View reviewed changes

base/sort.jl Outdated Show resolved Hide resolved

timholy force-pushed the teh/fix33977 branch from 5664788 to cec4b81 Compare December 15, 2019 22:31

timholy force-pushed the teh/fix33977 branch from cec4b81 to 0738131 Compare December 16, 2019 13:22

timholy force-pushed the teh/fix33977 branch from 0738131 to fd838be Compare December 16, 2019 13:27

timholy force-pushed the teh/fix33977 branch from fd838be to eb83468 Compare December 16, 2019 21:08

JeffBezanson reviewed Dec 19, 2019

View reviewed changes

base/floatfuncs.jl Outdated Show resolved Hide resolved

timholy force-pushed the teh/fix33977 branch from eb83468 to ace9503 Compare December 22, 2019 10:39

Sign-aware computation of midpoint for sorting (fixes #33977)

fe7006d

timholy force-pushed the teh/fix33977 branch from ace9503 to fe7006d Compare December 23, 2019 10:12

timholy merged commit 4a8ea8c into master Dec 23, 2019

timholy deleted the teh/fix33977 branch December 23, 2019 14:25

timholy added backport 1.0 labels Jan 18, 2020

KristofferC removed the backport 1.4 label Jan 21, 2020

KristofferC mentioned this pull request Feb 20, 2020

WIP: Backports release 1.0.6 #34011

Closed

56 tasks

KristofferC pushed a commit that referenced this pull request Apr 11, 2020

Sign-aware computation of midpoint for sorting (fixes #33977) (#34106)

3972549

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use safer computation of midpoint in sorting (fixes #33977) #34106

Use safer computation of midpoint in sorting (fixes #33977) #34106

timholy commented Dec 15, 2019

nanosoldier commented Dec 15, 2019

timholy commented Dec 15, 2019

StefanKarpinski commented Dec 15, 2019

timholy commented Dec 15, 2019

nanosoldier commented Dec 15, 2019

timholy commented Dec 15, 2019

nanosoldier commented Dec 16, 2019

timholy commented Dec 16, 2019

timholy commented Dec 16, 2019

timholy commented Dec 16, 2019

nanosoldier commented Dec 16, 2019

StefanKarpinski commented Dec 16, 2019

timholy commented Dec 16, 2019

timholy commented Dec 16, 2019 •

edited

Loading

StefanKarpinski commented Dec 16, 2019 •

edited

Loading

timholy commented Dec 16, 2019 •

edited

Loading

StefanKarpinski commented Dec 16, 2019

timholy commented Dec 16, 2019 •

edited

Loading

nanosoldier commented Dec 16, 2019

timholy commented Dec 16, 2019

StefanKarpinski commented Dec 17, 2019 •

edited

Loading

timholy commented Dec 17, 2019

StefanKarpinski commented Dec 17, 2019

ararslan commented Dec 17, 2019

nanosoldier commented Dec 17, 2019

timholy commented Dec 17, 2019

StefanKarpinski commented Dec 22, 2019

timholy commented Dec 22, 2019

StefanKarpinski commented Dec 22, 2019

KristofferC commented Jan 21, 2020

Use safer computation of midpoint in sorting (fixes #33977) #34106

Use safer computation of midpoint in sorting (fixes #33977) #34106

Conversation

timholy commented Dec 15, 2019

nanosoldier commented Dec 15, 2019

timholy commented Dec 15, 2019

StefanKarpinski commented Dec 15, 2019

timholy commented Dec 15, 2019

nanosoldier commented Dec 15, 2019

timholy commented Dec 15, 2019

nanosoldier commented Dec 16, 2019

timholy commented Dec 16, 2019

timholy commented Dec 16, 2019

timholy commented Dec 16, 2019

nanosoldier commented Dec 16, 2019

StefanKarpinski commented Dec 16, 2019

timholy commented Dec 16, 2019

timholy commented Dec 16, 2019 • edited Loading

StefanKarpinski commented Dec 16, 2019 • edited Loading

timholy commented Dec 16, 2019 • edited Loading

master

Alg 1

Alg 2

Alg 3

StefanKarpinski commented Dec 16, 2019

timholy commented Dec 16, 2019 • edited Loading

nanosoldier commented Dec 16, 2019

timholy commented Dec 16, 2019

StefanKarpinski commented Dec 17, 2019 • edited Loading

timholy commented Dec 17, 2019

StefanKarpinski commented Dec 17, 2019

ararslan commented Dec 17, 2019

nanosoldier commented Dec 17, 2019

timholy commented Dec 17, 2019

StefanKarpinski commented Dec 22, 2019

timholy commented Dec 22, 2019

StefanKarpinski commented Dec 22, 2019

KristofferC commented Jan 21, 2020

timholy commented Dec 16, 2019 •

edited

Loading

StefanKarpinski commented Dec 16, 2019 •

edited

Loading

timholy commented Dec 16, 2019 •

edited

Loading

timholy commented Dec 16, 2019 •

edited

Loading

StefanKarpinski commented Dec 17, 2019 •

edited

Loading