-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctness issues with minimum
and maximum
#45932
Comments
Thanks so much for the detailed report! The short circuiting has been entirely removed in #43604. So at least the incoming 1.8 would handle As for |
minimum
and maximum
minimum
and maximum
Short-circuiting for I wonder if we can make BigFloat min and max never allocate --- surely either one argument or the other is always a correct answer? |
I've been swamped with real life stuff lately. I hope to actually finish #45581 soon. This should restore correctness and improve performance across a range of cases. The optimizations fail to apply with I hate function-specialized versions of general-purpose functions like |
Ah, so with #43604 then one of the issues is fixed! If #45581 might be finished somewhat soon then maybe the best idea is to just wait for that? Indeed with #43604 fixing the short circuiting then overloading Regarding a non-allocating min and max for |
#41709 would take care of it. |
Regarding signed zeros, my understanding is that -0.0 == 0.0, and they must be treated as the same number in every occasion. I'm surprised there's any handling of signed zeros in the codebase, other than perhaps specialized functions. As for mapreduce, the discussion reminds me a bit of #45000, which I still consider a major issue despite it having been swiftly closed. My understanding is that the algorithm we use in general for mapreduce should have only been used for a specific application: accurate summations with arbitrary length. For moderate-sized arrays, the approach is actually less accurate today than a simple summation algorithm compiled for a modern processor, because of vectorization. I personally think there's a place for specialized algorithms, but using this "piecewise reduction" algorithm by default is not great. It's a specialized algorithm being used in general. A default reduction should simply translate into a basic for-loop. By using something else, we risk preventing compiler optimizations, and I believe this actually happens in many situations. |
This addresses some of the issues discussed in JuliaLang/julia#45932. For Julia 1.8.0 it completely fixes minimum and maximum, for earlier versions of Julia it only partially fixes some of the issues. The change completely removes the specialised implementation of extrema. It also completely removes the specialised implementation of minimum and maximum for Mag and Arf. The default implementations work well, though are slower due to more allocations. In the future it could maybe be an option to have a faster version using MutableArithmetics.jl. The Base implementation of minimum and maximum is fixed for Arb by overloading Base._fast to make it correct for Arb. Before 1.8.0 this is not enough to fix all problems and it thus keeps the specialised implementation for Julia before 1.8.0-rc3 (latest release candidate as of writing), for later versions the specialised implementation can be removed.
This addresses some of the issues discussed in JuliaLang/julia#45932. For Julia 1.8.0 it completely fixes minimum and maximum, for earlier versions of Julia it only partially fixes some of the issues. The change completely removes the specialised implementation of extrema. It also completely removes the specialised implementation of minimum and maximum for Mag and Arf. The default implementations work well, though are slower due to more allocations. In the future it could maybe be an option to have a faster version using MutableArithmetics.jl. The Base implementation of minimum and maximum is fixed for Arb by overloading Base._fast to make it correct for Arb. Before 1.8.0 this is not enough to fix all problems and it thus keeps the specialised implementation for Julia before 1.8.0-rc3 (latest release candidate as of writing), for later versions the specialised implementation can be removed.
This addresses some of the issues discussed in JuliaLang/julia#45932. For Julia 1.8.0 it completely fixes minimum and maximum, for earlier versions of Julia it only partially fixes some of the issues. The change completely removes the specialised implementation of extrema. It also completely removes the specialised implementation of minimum and maximum for Mag and Arf. The default implementations work well, though are slower due to more allocations. In the future it could maybe be an option to have a faster version using MutableArithmetics.jl. The Base implementation of minimum and maximum is fixed for Arb by overloading Base._fast to make it correct for Arb. Before 1.8.0 this is not enough to fix all problems and it thus keeps the specialised implementation for Julia before 1.8.0-rc3 (latest release candidate as of writing), for later versions the specialised implementation can be removed.
This addresses some of the issues discussed in JuliaLang/julia#45932. For Julia 1.8.0 it completely fixes minimum and maximum, for earlier versions of Julia it only partially fixes some of the issues. The change completely removes the specialised implementation of extrema. It also completely removes the specialised implementation of minimum and maximum for Mag and Arf. The default implementations work well, though are slower due to more allocations. In the future it could maybe be an option to have a faster version using MutableArithmetics.jl. The Base implementation of minimum and maximum is fixed for Arb by overloading Base._fast to make it correct for Arb. Before 1.8.0 this is not enough to fix all problems and it thus keeps the specialised implementation for Julia before 1.8.0-rc3 (latest release candidate as of writing), for later versions the specialised implementation can be removed.
See also: #36287 |
#43604 removed short-circuiting in the presence of |
IIUC the only example in the OP that remains broken is julia> minimum(abs, fill(-0.0, 16))
-0.0
julia> minimum(abs.(fill(-0.0, 16)))
0.0 |
I have some comments regarding correctness issues with
minimum
andmaximum
. One is about handling of signed zero for floating points and the other about short circuiting forNaN
andmissing
. I believe the current implementation favours performance over correctness, something I think one should be extremely careful with.Before discussing the issues I want to give some context about the
minimum
andmaximum
functions and why I think it is extra important that these methods are correctly implemented. The callminimum(A::AbstractArray)
eventually callsmapreduce(identity, min, A)
, similarlyminimum(f, A)
eventually callsmapreduce(f, min, A)
. After some more indirection this callsBase._mapreduce(f, min, IndexLinear(), A)
, this method has some special cases if the length ofA
is small, more precisely it handleslength(A) < 16
directly, otherwise it callsBase.mapreduce_impl(f, min, A, firstindex(A), lastindex(A))
. So far everything uses the default implementations of the reduction functions, there is no special handling of themin
argument, theBase.mapreduce_impl
function does however have a specialised method formin
andmax
found here.The above indirection leads to three potential issues
f
. This means that ifBase.mapreduce_impl(f, min, A, firstindex(A), lastindex(A))
makes some optimizations that are not valid for your custom typeT
there is no way to handle this. For most Julia functions you can create a custom method by dispatching on your type, but here this is not possible. This calls for being extra careful that the implementation is correct in all possible cases.minimum
andmaximum
. However, since they callmapreduce(f, min, A)
you would get the same correctness issues if you for example callreduce(min, A)
and it is not really possible to add documentation toreduce
andmapreduce
about special handing ofmin
andmax
. Again this calls for being extra careful with correctness since it is not possible to document it fully.Now to the actual issues. I give all examples using
minimum
butmaximum
has exactly the same issues. Whenever I refer to timings it is for a AMD Ryzen 9 5900X.Handling of signed zeros
The main optimization done in the implementation of
Base.mapreduce_impl(f, min, A, firstindex(A), lastindex(A))
is in the way it handles signed zeros. It calls_fast(min, x, y)
which is defined byso in general it calls
min
directly and there is nothing to discuss. But ifx
andy
are floats it uses a simplified version ofmin
which doesn't take into account the ordering of-0.0
and0.0
. To still enforce the ordering of the signed zeros it ends withHowever the issue with this code is that it doesn't apply the function
f
tox
before comparison, so it only works correctly forf = identity
(or as long asf
preserves the sign I guess). For example we haveNote that if you give fewer than 16 elements it uses another implementation and you get a correct result (so the example
maximum([0.0, -0.0]) === 0.0
in the comment doesn't actually use the code below the comment), which makes it harder to track down this issue.What I can tell there is no simple fix for this which doesn't sacrifice performance. One possibility is to add the application of
f
to the check for-0.0
. It would give correct results but would mean applyingf
twice on each element, which I don't think is an option. The most "correct" solution would be to avoid using the_fast
method and just usemin
directly. However this comes at a fairly large performance penalty compared to the current implementation, about a 100% increasing in runtime forFloat64
in some benchmarks I did. Another, slightly more involved, solution is to special case onf = identity
and apply this optimization for that case but not otherwise.Personally I would lean to the correct but slower solution, avoid using
_fast
and callmin
directly. But I don't know what type of performance regressions people are willing to accept.Lastly I can also mention that the current version gives a quite noticeable performance penalty if the minimum happens to be
0.0
, since the array is traversed twice.Short circuiting for
NaN
andmissing
The implementation has a short circuit meant for when it encounters
NaN
ormissing
. The relevant code isThe benefit of this is that we can avoid looking through the whole vector in some cases. However it has some issues. The main issue is that it short circuits on BOTH
NaN
andmissing
and hence can returnNaN
in cases where it would be more correct to returnmissing
, exactly which one is returned is also slightly involved which makes it very hard to test for this case. Some examplesHere I think the obvious solution is to just remove this short circuiting. As mentioned in #35939 it seems unlikely that this is used in the hot path in many cases. For the vast majority of cases this should not lead to any performance reductions, it might even lead to very slight performance improvements. If one really wants to keep this behaviour then one option could be to only short circuit on
missing
, in which case the behaviour would be correct. If one really wants to I guess it might be possible to check if the return type off
containsmissing
or not and short circuit onNaN
if it does not. In that case the check should be updated to useisnan
. But as I said I would opt towards just removing the short circuiting altogether.Behaviour for other types
Finally I wanted to give some observations on the behaviour for other types. The reason I took a look at this function from the beginning was due to issues I encountered when developing Arblib.jl.
For
AbstractFloat
it usesifelse(isnan(x), x, ifelse(x < y, x, y))
instead ofmin
directly. Except for not handling signed zeros this is equivalent forFloat64
(and most otherAbstractFloat
in Base). ForBigFloat
it has the benefit of not allocating (whichmin
does). In fact I don't know whymin(x::BigFloat, y::BigFloat)
is written to allocate a new variable for the result, in contrast the implementation ofmin(x::BigInt, y::BigInt)
does not allocate a new variable.My issue with using this approach instead of
min
directly is that it is not valid for theArb
type that occurs in Arblib.jl. TheArb
type can slightly simplified be described as a floating point with some extra information keeping track of rounding errors, when taking the minimum these rounding errors have to be taken into account and this is not done with the above check. Now this might very well be my own fault for subtypingAbstractFloat
even though the type doesn't satisfy the assumptions assumed by Julia, but in many cases it is very natural to treat anArb
as a floating point and see the tracking of the rounding errors as a bonus. In most cases you can then just dispatch on theArb
type when you need some different implementation, but as mentioned in the beginning this is not possible forminimum
.The second observation is related to the current implementation of the short circuiting. It currently assumes that
(x == x) === true
is false only forNaN
andmissing
. While this is true for all types in base Julia I don't think it should be assumed for custom types. For example theArb
type mentioned above doesn't satisfy this assumption (x == x
is false if the rounding error is non-zero). It would be better to be more explicit and instead use something along the like of(x isa Number && isnan(x)) || ismissing(x)
as mentioned in #35989, this should also be slightly faster in many cases. Though my stance would be that the short circuiting should be removed entirely.What to do now?
I would be happy to prepare a PR to handle some of these issues. But exactly what approach to take depends on how much performance one is willing to sacrifice or how much code complexity one is willing to add. Below are some possible solutions, I can add more benchmark information if it is of interest.
The simplest possible implementation which handles all of the above mentioned issues would be
The drawbacks of this approach is no short circuiting and worse performance for floating points. For
Float64
it is about 100% slower. ForBigFloat
it is about 300% slower due tomin
allocating whereas the previous version didn't allocate. The performance regression forBigFloat
I think is acceptable, if you want to make it faster you can use MutableArithmetic.jl. The performance penalty forFloat64
I'm not sure if is okay or not.A solution which solves some of the issues but doesn't pay much in terms of performance would be to replace the
op
in the above implementation with the_fast
version. This would mean that it doesn't handle signed zeros correctly but at least it handlesNaN
andmissing
correctly. The only performance regression would be the removal of the short circuiting. In this case a note should probably be added tominimum
about not handling signed zeros.Finally, the most complicated solution would be to special case on
f = identity
and in that case use_fast
and explicitly check for the result being zero in the end, falling back tomin
iff
is notidentity
.None of the solutions keep the short circuiting, this I think is a good trade of in terms of correctness versus performance. But others might think differently.
The text was updated successfully, but these errors were encountered: