Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix grouped maximum, minimum, var and std with only missing values #2063

Merged
merged 2 commits into from
Dec 26, 2019

Conversation

nalimilan
Copy link
Member

When a group only contains missing values, grouped maximum and minimum have to throw an error like the corresponding functions. Otherwise they incorrectly return typemin/typemax. Also fix var and std to return NaN in that case instead of -0.0.

This problem cannot happen without skipmissing, as empty groups are not allowed.

Fixes #2061.

check_aggregate(::typeof(mean∘skipmissing)) = Reduce(Base.add_sum, !ismissing, /)
# Empty groups can only happen with skipmissing and are only a problem for min and max
function check_notempty(res, count::Integer)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has the side effect of making an unnecessary copy of the result. Maybe I can find something better...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a new commit which avoids the copy.

@@ -1002,10 +1007,10 @@ function groupreduce!(res, f, op, condf, adjust,
end
outcol = adjust === nothing ? res : map(adjust, res, counts)
# Undo pool sharing done by groupreduce_init
if outcol isa CategoricalVector
if outcol isa CategoricalVector && outcol.pool === incol.pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see where this "unnecessary" copy is made. It seems that the check you do here is needed. Also it should not be very expensive - right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is actually just a consequence of the fact that a copy is made: the pool can be in a different order after map. The unnecessary copy is due to passing adjust !== nothing, which induces a call to map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The changes also look good.

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Indeed you did a very good design of this code initially, so the fix was simple.

@bkamins
Copy link
Member

bkamins commented Dec 24, 2019

Sorry for merging #2050. I have forgotten that it will introduce the conflict with this PR. I hope rebasing will not be that hard.

When a group only contains missing values, grouped `maximum` and `minimum` have to
throw an error like the corresponding functions. Otherwise they incorrectly return
`typemin`/`typemax`. Also fix `var` and `std` to return `NaN` in that case instead
of `-0.0`.

This problem cannot happen without `skipmissing`, as empty groups are not allowed.
@nalimilan
Copy link
Member Author

No worries, I had anticipated that. Actually the diff is so simple that git figured it out automatically!

@nalimilan nalimilan merged commit 8902a08 into master Dec 26, 2019
@nalimilan nalimilan deleted the nl/aggregate branch December 26, 2019 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in by when composing with skipmissing
2 participants