-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of -0.0 in histograms #768
Conversation
`searchsortedfirst` and `searchsortedlast` use `isless` for comparisons and therefore consider `-0.0` to be different from `0.0`. This means that these two values do not end up in the same bin when an edge is 0. This does not make much sense statistically, but even worse is that when an extreme edge is 0, `-0.0` is not counted at all. Fix this by replacing `-0.0` with `0.0` before the search.
is |
src/hist.jl
Outdated
@@ -226,11 +226,17 @@ binindex(h::AbstractHistogram{T,1}, x::Real) where {T} = binindex(h, (x,))[1] | |||
binindex(h::Histogram{T,N}, xs::NTuple{N,Real}) where {T,N} = | |||
map((edge, x) -> _edge_binindex(edge, h.closed, x), h.edges, xs) | |||
|
|||
_normalize_zero(x::AbstractFloat) = isequal(x, -0.0) ? oftype(x, 0.0) : x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_normalize_zero(x::AbstractFloat) = isequal(x, -0.0) ? oftype(x, 0.0) : x | |
_normalize_zero(x::AbstractFloat) = ifelse(isequal(x, -0.0), oftype(x, 0.0), x) |
for the performance? also I hope it's automatically inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT LLVM is smart enough that this gives exactly the same code.
@@ -226,11 +226,17 @@ binindex(h::AbstractHistogram{T,1}, x::Real) where {T} = binindex(h, (x,))[1] | |||
binindex(h::Histogram{T,N}, xs::NTuple{N,Real}) where {T,N} = | |||
map((edge, x) -> _edge_binindex(edge, h.closed, x), h.edges, xs) | |||
|
|||
_normalize_zero(x::AbstractFloat) = isequal(x, -0.0) ? oftype(x, 0.0) : x | |||
_normalize_zero(x::Any) = x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_normalize_zero(x::Any) = x | |
_normalize_zero(x) = x |
I thought we need to ifelse(isequal(x, -0.0), zero(x), x) for type stability of |
Do you think the spirit of this PR is right? Have you encountered this problem in FHist.jl too? |
Yes I think zero is the only tricky case apart from |
Unfortunately, passing |
yeah totally. For physics we don't run into -0.0 much and also we don't care about losing a few events... (they are weighted to 0.0001) or something.
yeah I don't completely understand the performance model of |
Thanks for this! |
I've found a solution though it's unfortunately a bit complex. It turns out that the overhead of normalizing Luckily it's almost impossible to construct a range which includes |
src/hist.jl
Outdated
# so check the former just in case as it is cheap | ||
foreach(edges) do e | ||
e isa AbstractRange && | ||
(isequal(first(e), -0.0) || isequal(last(e), -0.0)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not any(isequal(-0.0), e)
as this would be a bit safer and cost is negligible I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess in normal use the number of bins is much lower than the number of observations so checking for all of them has a negligible cost. I've pushed a commit to do that.
searchsortedfirst
andsearchsortedlast
useisless
for comparisons and therefore consider-0.0
to be different from0.0
. This means that these two values do not end up in the same bin when an edge is 0.This does not make much sense statistically, but even worse is that when an extreme edge is 0,
-0.0
is not counted at all.Fix this by replacing
-0.0
with0.0
before the search.Closes #766.