-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix grouped maximum, minimum, var and std with only missing values #2063
Conversation
src/groupeddataframe/grouping.jl
Outdated
check_aggregate(::typeof(mean∘skipmissing)) = Reduce(Base.add_sum, !ismissing, /) | ||
# Empty groups can only happen with skipmissing and are only a problem for min and max | ||
function check_notempty(res, count::Integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach has the side effect of making an unnecessary copy of the result. Maybe I can find something better...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a new commit which avoids the copy.
src/groupeddataframe/grouping.jl
Outdated
@@ -1002,10 +1007,10 @@ function groupreduce!(res, f, op, condf, adjust, | |||
end | |||
outcol = adjust === nothing ? res : map(adjust, res, counts) | |||
# Undo pool sharing done by groupreduce_init | |||
if outcol isa CategoricalVector | |||
if outcol isa CategoricalVector && outcol.pool === incol.pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see where this "unnecessary" copy is made. It seems that the check you do here is needed. Also it should not be very expensive - right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is actually just a consequence of the fact that a copy is made: the pool can be in a different order after map
. The unnecessary copy is due to passing adjust !== nothing
, which induces a call to map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The changes also look good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Indeed you did a very good design of this code initially, so the fix was simple.
Sorry for merging #2050. I have forgotten that it will introduce the conflict with this PR. I hope rebasing will not be that hard. |
When a group only contains missing values, grouped `maximum` and `minimum` have to throw an error like the corresponding functions. Otherwise they incorrectly return `typemin`/`typemax`. Also fix `var` and `std` to return `NaN` in that case instead of `-0.0`. This problem cannot happen without `skipmissing`, as empty groups are not allowed.
aca0015
to
4a79830
Compare
No worries, I had anticipated that. Actually the diff is so simple that git figured it out automatically! |
When a group only contains missing values, grouped
maximum
andminimum
have to throw an error like the corresponding functions. Otherwise they incorrectly returntypemin
/typemax
. Also fixvar
andstd
to returnNaN
in that case instead of-0.0
.This problem cannot happen without
skipmissing
, as empty groups are not allowed.Fixes #2061.