Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling of an empty iterator for mean and var #29033

Merged
merged 5 commits into from
Oct 11, 2018

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Sep 4, 2018

This is a follow up of #28777 fixing what is minimally needed to be consistent in Julia 1.0 between empty arrays and empty collections (needed for skipmissing).

Major cleanup of consistency of the outputs for different scenarios of application of functions from Statistics module are left for Julia 2.0.

@@ -86,6 +86,10 @@ end
@test ismissing(mean([missing, NaN]))
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0])
@test mean(skipmissing([1, missing, 2])) === 1.5
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also make sense to test that an empty Int input gives a Float64 output, and that an error is thrown when the element type isn't known (e.g. using a generator).

BTW, you could probably use ===.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in muddy waters :) You cannot use === as we have different NaN in Julia and here it would fail. Notice that:

julia> NaN === -NaN
false

and this is exactly the case here, that is why I had to use isequal.

I will add the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something: you really expect NaN::Float64 here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is more complex (and that is why I have used this as a test):

julia> x = mean(Complex{Float64}[])
NaN + NaN*im

julia> bitstring(x.re)
"1111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(x.im)
"1111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(NaN) # see that the bitstring is different
"0111111111111000000000000000000000000000000000000000000000000000"

also you have:

julia> x = mean(Int[])
NaN

julia> x === NaN
false

For the same reason that we have different NaNs in Julia (and as you see they actually happen in practice).

@@ -245,6 +249,9 @@ end
@test ismissing(f([missing, NaN], missing))
@test f(skipmissing([1, missing, 2]), 0) === f([1, 2], 0)
end

@test isequal(mean(Complex{Float64}[]), NaN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you intended to test var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Fixed.

@@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me
function _var(iterable, corrected::Bool, mean)
y = iterate(iterable)
if y === nothing
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))"))
T = eltype(iterable)
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN)
Copy link
Member

@nalimilan nalimilan Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this throws an error when calling zero if eltype returns Any. Maybe better call throw a more explicit error similar to the existing one?

Could use oftype.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it as is, because this is how var for AbstractArray is implemented and we want to be non-breaking (so it should have 100% the same behavior - the same outputs and the same errors - so using array or iterator is transparent).

For sure in Julia 2.0 approach we could consider cleaning it up in both places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that instead of letting zero throw the error, it would be clearer to check T === Any and throw a more explicit error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that T does not have to be Any. It can be an arbitrary type that is or is not a subtype of Number. And upfront we do not know if this type:

  • defines zero
  • has abs2 defined
  • implements a constructor from NaN

Any of those may or may not be true so the only way to be consistent is to repeat the same code that is used for a version of var defined for arrays (otherwise I will be able to create a situation where array and iterator var differs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We could still print a nice error in the most common case, but then we'd lose consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the point for now. As discussed in Julia 2.0 we can probably clean it up in both methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, would use oftype as @nalimilan suggested for clarity

@@ -86,6 +86,19 @@ end
@test ismissing(mean([missing, NaN]))
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0])
@test mean(skipmissing([1, missing, 2])) === 1.5
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im)
@test isequal(mean(skipmissing(Complex{Float64}[])), NaN+NaN*im)
@test mean(Complex{Float64}[]) isa Complex{Float64}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also test the value? Same in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is tested above in line 89 (unless you meant something else).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this one is, I originally spotted mean(Int[]). It would probably be clearer to have them side by side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All cleaned up now (hopefully 😄, as you have a keener eye for details 👍)

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me, but another review wouldn't hurt.

@bkamins
Copy link
Member Author

bkamins commented Sep 13, 2018

One CI error seems unrelated.

@StefanKarpinski
Copy link
Sponsor Member

Who would be a good person to review?

@nalimilan
Copy link
Member

@TotalVerb?

Copy link
Contributor

@TotalVerb TotalVerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm

There's a mean(::AbstractRange{<:Real}) method on line 134. Might be worth changing that behavior too for consistency, even though ranges can't have missings.

@@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me
function _var(iterable, corrected::Bool, mean)
y = iterate(iterable)
if y === nothing
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))"))
T = eltype(iterable)
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, would use oftype as @nalimilan suggested for clarity

@bkamins
Copy link
Member Author

bkamins commented Sep 14, 2018

Thank you. Changed to oftype in both places for consistency and changed mean of empty ranges (with added tests).

@@ -131,8 +131,8 @@ mean(A::AbstractArray; dims=:) = _mean(A, dims)
_mean(A::AbstractArray{T}, region) where {T} = mean!(Base.reducedim_init(t -> t/2, +, A, region), A)
_mean(A::AbstractArray, ::Colon) = sum(A) / length(A)

function mean(r::AbstractRange{<:Real})
isempty(r) && throw(ArgumentError("mean of an empty range is undefined"))
function Statistics.mean(r::AbstractRange{<:Real})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add a module prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. Will fix.

@bkamins
Copy link
Member Author

bkamins commented Sep 28, 2018

How do we want to proceed with this PR?

@nalimilan
Copy link
Member

Just merge?

@StefanKarpinski StefanKarpinski added the status:triage This should be discussed on a triage call label Sep 28, 2018
@StefanKarpinski
Copy link
Sponsor Member

I think just go ahead with this. We can always revert it.

@nalimilan
Copy link
Member

I like when everybody approves but nobody actually merges... :-)

@nalimilan nalimilan merged commit b519147 into JuliaLang:master Oct 11, 2018
@nalimilan nalimilan removed the status:triage This should be discussed on a triage call label Oct 11, 2018
@bkamins bkamins deleted the empty_collections branch October 11, 2018 19:20
fredrikekre added a commit that referenced this pull request Dec 5, 2018
changes between Julia 1.0 and 1.1, including:

- Custom .css-style for compat admonitions.

- Information about compat annotations to CONTRIBUTING.md.

- NEWS.md entry for PRs #30090, #30035, #30022, #29978,
  #29969, #29858, #29845, #29754, #29638, #29636, #29615,
  #29600, #29506, #29469, #29316, #29259, #29178, #29153,
  #29033, #28902, #28761, #28745, #28708, #28696, #29997,
  #28790, #29092, #29108, #29782

- Compat annotation for PRs #30090, #30013, #29978,
  #29890, #29858, #29827, #29754, #29679, #29636, #29623,
  #29600, #29440, #29316, #29259, #29178, #29157, #29153,
  #29033, #28902, #28878, #28761, #28708, #28156, #29733,
  #29670, #29997, #28790, #29092, #29108, #29782, #25278

- Documentation for broadcasting CartesianIndices (#30230).
- Documentation for Base.julia_cmd().
- Documentation for colon constructor of CartesianIndices (#29440).
- Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix).

- Run NEWS-update.jl.

Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
fredrikekre added a commit that referenced this pull request Dec 5, 2018
changes between Julia 1.0 and 1.1, including:

- Custom .css-style for compat admonitions.

- Information about compat annotations to CONTRIBUTING.md.

- NEWS.md entry for PRs #30090, #30035, #30022, #29978,
  #29969, #29858, #29845, #29754, #29638, #29636, #29615,
  #29600, #29506, #29469, #29316, #29259, #29178, #29153,
  #29033, #28902, #28761, #28745, #28708, #28696, #29997,
  #28790, #29092, #29108, #29782

- Compat annotation for PRs #30090, #30013, #29978,
  #29890, #29858, #29827, #29754, #29679, #29636, #29623,
  #29600, #29440, #29316, #29259, #29178, #29157, #29153,
  #29033, #28902, #28878, #28761, #28708, #28156, #29733,
  #29670, #29997, #28790, #29092, #29108, #29782, #25278

- Documentation for broadcasting CartesianIndices (#30230).
- Documentation for Base.julia_cmd().
- Documentation for colon constructor of CartesianIndices (#29440).
- Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix).

- Run NEWS-update.jl.

Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
fredrikekre added a commit that referenced this pull request Dec 5, 2018
Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including:

- Custom .css-style for compat admonitions.

- Information about compat annotations to CONTRIBUTING.md.

- NEWS.md entry for PRs #30090, #30035, #30022, #29978,
  #29969, #29858, #29845, #29754, #29638, #29636, #29615,
  #29600, #29506, #29469, #29316, #29259, #29178, #29153,
  #29033, #28902, #28761, #28745, #28708, #28696, #29997,
  #28790, #29092, #29108, #29782

- Compat annotation for PRs #30090, #30013, #29978,
  #29890, #29858, #29827, #29754, #29679, #29636, #29623,
  #29600, #29440, #29316, #29259, #29178, #29157, #29153,
  #29033, #28902, #28878, #28761, #28708, #28156, #29733,
  #29670, #29997, #28790, #29092, #29108, #29782, #25278

- Documentation for broadcasting CartesianIndices (#30230).
- Documentation for Base.julia_cmd().
- Documentation for colon constructor of CartesianIndices (#29440).
- Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix).

- Run NEWS-update.jl.


Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com>
Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants