Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix in(x, r::Range{<:Integer}) when x is not numeric #21728

Merged
merged 2 commits into from
Jul 21, 2017
Merged

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented May 6, 2017

Previously, checking that a string was in() a container would work for Array
but not for Range.

Another (possibly cleaner) approach would be to restrict the current definitions to x::Number, and introduce fallbacks returning false.

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another (possibly cleaner) approach would be to restrict the current definitions to x::Number, and introduce fallbacks returning false.

The alternative you propose indeed sounds cleaner and clearer :). (Knowing nothing about applicable, I wonder whether the added applicable calls and associated branches impact performance or are optimized away?) Best!

@nalimilan
Copy link
Member Author

The alternative you propose indeed sounds cleaner and clearer :).

Yeah, maybe. I was hesitant because of the possibility of non-Number types supporting ranges, like Date. Not sure.

(Knowing nothing about applicable, I wonder whether the added applicable calls and associated branches impact performance or are optimized away?) Best!

It should be resolved at compile time like method_exists, so there should be no runtime cost.

Copy link
Contributor

@TotalVerb TotalVerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the alternative better than this. This seems brittle for whatever reason.

base/range.jl Outdated
@@ -861,13 +861,15 @@ end
median(r::Range{<:Real}) = mean(r)

function in(x, r::Range)
applicable(-, x, first(r)) || return false
n = step(r) == 0 ? 1 : round(Integer,(x-first(r))/step(r))+1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially, does this branch potentially introduce type instability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It doesn't define any variable, and makes the function essentially equivalent to return false when the condition is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (preexisting) branch in the n = ... line, rather than the new applicable(... line :). To expand, down one path n = 1, whereas down the other n = round(Integer, .... If the round yields some non-Int Integer, n's type would depend on the branch? Best!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. You're right that when the condition is true, the rest of the body isn't optimized out as I would have expected. So n still exists and is inferred as Any; though the return value is correctly inferred as Bool.

More fundamentally, applicable/method_exists does not appear to be resolved at compile time. That's because #16422 hasn't been merged (I had forgotten about this). So I have no solution for now...

@Sacha0
Copy link
Member

Sacha0 commented May 7, 2017

I was hesitant because of the possibility of non-Number types supporting ranges, like Date.

Agreed, tricky that. I suppose this method is another instance where the ability to dispatch on provision of the necessary arithmetic operations would be useful.

@nalimilan
Copy link
Member Author

I think this illustrates the fact that the interfaces are not defined very clearly here. Ranges can be defined on any type which supports addition and subtraction. The ideal solution for this would be a trait.

I would be happy to use the cleaner Number approach, but that will force anybody willing to support ranges with a non-Number custom type to copy the very same definition of in, but with a different signature. Is that really what we want?

@Sacha0
Copy link
Member

Sacha0 commented May 7, 2017

I would be happy to use the cleaner Number approach, but that will force anybody willing to support ranges with a non-Number custom type to copy the very same definition of in, but with a different signature. Is that really what we want?

Perhaps for now keep the applicable(-, ... in the first method definition (where traits would be ideal), and in the second method definition (presently with applicable(isinteger, ...) use dispatch? (In the second method definition I see no disadvantage to using dispatch, given that checking applicable(isinteger, ... should be equivalent to dispatching on x::Integer?)

@nalimilan
Copy link
Member Author

Yes, for the second definition dispatch would work equally well. The first one is more tricky (especially if we can't get applicable to be resolved at compile time).

@Sacha0
Copy link
Member

Sacha0 commented May 7, 2017

Yes, for the second definition dispatch would work equally well. The first one is more tricky (especially if we can't get applicable to be resolved at compile time).

In that case, perhaps for now nix the change to the first method definition, use dispatch in the second, merge the result, and revisit the first method definition when relevant capabilities advance? :)

@kshyatt kshyatt added the collections Data structures holding multiple items, e.g. sets label May 7, 2017
@JeffBezanson
Copy link
Member

How did this come up? We do just give errors in some cases like this, for example searchsortedfirst([1,2,3], "") is similar.

applicable(isinteger, ... should be equivalent to dispatching on x::Integer

isinteger(2.0) is also true, so should probably dispatch on ::Number.

@nalimilan
Copy link
Member Author

How did this come up? We do just give errors in some cases like this, for example searchsortedfirst([1,2,3], "") is similar.

I discovered this when making a mistake (as often with corner cases like this). It's not a major issue, but I find it hard to justify the inconsistency with Array. A relatively credible use case would be this:

julia> Nullable() in [1, 2]
false

julia> Nullable() in 1:2
ERROR: MethodError: no method matching isinteger(::Nullable{Union{}})

This would become even more credible if we replace Nullable with Union{T, Null}, with == defined so that null in [1, 2, null] would be true (currently that's a NullException).

@nalimilan
Copy link
Member Author

I've dropped the problematic change for now. Though I still would be interested in suggestions about how to fix this for the non-integer case.

@nalimilan nalimilan changed the title Fix in(x, r::Range) when x is not numeric Fix in(x, r::Range{<:Integer}) when x is not numeric May 9, 2017
@JeffBezanson
Copy link
Member

Makes sense. Yes, it's a good question. Maybe we could dispatch as:

in(x::Union{Number,T}, r::Range{T}) where T

@nalimilan
Copy link
Member Author

But that wouldn't change anything for "a" in 1:3, right?

@JeffBezanson
Copy link
Member

If that definition doesn't match, we'd fall back to iterating over the range and comparing every element, or we could add a fallback that returns false.

@nalimilan
Copy link
Member Author

Clever. Indeed with #21256 the fallback based on any should be optimized at compile time if == isn't defined for a pair of immutable types (and falls back to ===).

Looking at this in more detail, it turns out we need to restrict the signature to Real, not Number, since isinteger(Complex(1, 0)) == true and yet < isn't supported. So I've added a specific method for Complex too. Also, the in(x::Union{Number,T}, r::Range{T}) where T trick doesn't work since -(::T, Number) is not always implemented, only -(::T, ::T) may be supported (e.g. for Date).

At least now the abstraction is correctly respected: the fallback is used for types for which we're not sure an optimized computation is possible given the interface.

base/range.jl Outdated
n = step(r) == 0 ? 1 : round(Integer,(x-first(r))/step(r))+1
n >= 1 && n <= length(r) && r[n] == x
end

in(x::Integer, r::AbstractUnitRange{<:Integer}) = (first(r) <= x) & (x <= last(r))
in(x, r::Range{T}) where {T<:Integer} =
in(x::Real, r::Range{T}) where {T<:Integer} =
Copy link
Member

@Sacha0 Sacha0 May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this signature simplify to in(x::Real, r::Range{<:Integer})? (Whoops, missed the usage of T in the method body, though that usage could instead be an eltype.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I kept it because T is used.

base/range.jl Outdated
@@ -860,13 +860,19 @@ end

median(r::Range{<:Real}) = mean(r)

function in(x, r::Range)
function in(x::Real, r::Range{<:Real})
n = step(r) == 0 ? 1 : round(Integer,(x-first(r))/step(r))+1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps create a helper function to avoid replication of the body of this / the below method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, are these two lines worth the additional complexity?

Copy link
Member

@Sacha0 Sacha0 May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single implementation in a helper function strikes me as less complex :). Moreover, having a single implementation guards against changes being made in one place but not the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the definitions are so close, I doubt it. Also it turns out it's harder than it seems, since their signatures are different: T doesn't need to be the same for both arguments with the Real method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the definitions are so close, I doubt it.

Even in the relatively short time I've been reviewing actively, IIRC I've seen such happen at least once :).

Also it turns out it's harder than it seems, since their signatures are different: T doesn't need to be the same for both arguments with the Real method.

Not certain I follow? Would e.g. the following not work?

in(x::Real, r::Range{<:Real}) = _therecanbeonlyone(x, r)
in(x::T, r::Range{<:T}) where {T} = _therecanbeonlyone(x, r)
function _therecanbeonlyone(x, r)
    n = step(r) == 0 ? 1 : round(Integer, (x - first(r))/step(r)) + 1
    n >= 1 && n <= length(r) && r[n] == x
end

Best! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! I don't know why I wanted to use a loop with @eval...

base/complex.jl Outdated
@@ -178,6 +178,10 @@ bswap(z::Complex) = Complex(bswap(real(z)), bswap(imag(z)))

isequal(z::Complex, w::Complex) = isequal(real(z),real(w)) & isequal(imag(z),imag(w))

in(x::Complex, r::Range{T}) where {T<:Integer} =
Copy link
Member

@Sacha0 Sacha0 May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this signature simplify to in(x::Complex, r::Range{<:Integer})? (Please ignore this comment if T turns out necessary in the method body.)

base/complex.jl Outdated
@@ -178,6 +178,10 @@ bswap(z::Complex) = Complex(bswap(real(z)), bswap(imag(z)))

isequal(z::Complex, w::Complex) = isequal(real(z),real(w)) & isequal(imag(z),imag(w))

in(x::Complex, r::Range{T}) where {T<:Integer} =
isinteger(x) && !isempty(r) && real(x) >= minimum(r) && real(x) <= maximum(r) &&
(mod(real(x), step(r)) - mod(first(r), step(r)) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equivalent method below for x::Real converts x to type T prior to the mod. Should that be done here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'll add it back since that must have been added for a reason (maybe to prevent overflow), but I don't know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in case only mod(::T, ::T) (rather than also mod(::typeof(x), ::T)) exists?

@Sacha0
Copy link
Member

Sacha0 commented May 10, 2017

Could the separate methods for z::Complex and x::Real not be fused into a single method for x::Number that includes the real calls from the z::Complex method? For x::Real, those calls should be noops? Best!

@nalimilan
Copy link
Member Author

Could the separate methods for z::Complex and x::Real not be fused into a single method for x::Number that includes the real calls from the z::Complex method? For x::Real, those calls should be noops? Best!

Unfortunately, Complex isn't defined at that point in the bootstrap.

@Sacha0
Copy link
Member

Sacha0 commented May 10, 2017

Unfortunately, Complex isn't defined at that point in the bootstrap.

Hm, pity that. DRYing those methods would be nice, as evidenced by the inadvertent conversion discrepancy creeping in above. (Is there a reasonable place to touch real prior to range.jl in bootstrap?)

base/range.jl Outdated
end
# This method needs to be defined separately since only -(::T, ::T) needs to be supported
# to create a range, but not necessarily -(::T, ::Real)
function in(x::T, r::Range{<:T}) where {T}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that method generates ambiguities with other methods defined in this file if I change <:T to T (as it should be). So we need to decide what choice is the most painful: requiring people to define this method themselves when they implement a custom non-Real type supporting ranges; or define this generic fallback here (current state of the PR), but require people to fix ambiguities when they need to define a custom method like this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed <:T to T and it doesn't seem to generate ambiguities now, so let's go with that.

@nalimilan
Copy link
Member Author

Actually I found a very simple way to DRY the Complex definition: define it as in(x::Complex, r::Range{<:Real}) = isreal(x) && real(x) in r. This is even better than the previous one, which was limited to integers.

So the only remaining issue is the ambiguities in(x::T, r::Range{T}) introduces.

@@ -178,6 +178,8 @@ bswap(z::Complex) = Complex(bswap(real(z)), bswap(imag(z)))

isequal(z::Complex, w::Complex) = isequal(real(z),real(w)) & isequal(imag(z),imag(w))

in(x::Complex, r::Range{<:Real}) = isreal(x) && real(x) in r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! :)

base/range.jl Outdated
@@ -860,13 +860,18 @@ end

median(r::Range{<:Real}) = mean(r)

function in(x, r::Range)
function _in_range(x, r::Range)
n = step(r) == 0 ? 1 : round(Integer,(x-first(r))/step(r))+1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tangential discussion re. the potential type instability in this line got buried above. Any thoughts re. addressing this type instability while you're performing surgery? :)

Copy link
Member Author

@nalimilan nalimilan May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I'll fix that once we have sorted out the other issues since the discussion is already long. I wonder what's the best way of fixing this. Maybe compute the second operand unconditionally and call one on that? That would also avoid a branch if we use ifelse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe compute the second operand unconditionally and call one on that?

I imagine the branch exists to avoid the potential division by zero (step(r)) in computing the second operand (and potential downstream issues)?

Perhaps continuing to branch on step(r) == 0, but eliminating n in the step(r) == 0 case would work well? For example, perhaps something along the lines of

function _in_range(x, r::Range)
    if step(r) == 0
        isempty(r) ? false : first(r) == x
    else
        n = round(Integer, (x - first(r)) / step(r)) + 1
        n >= 1 && n <= length(r) && r[n] == x
    end
end

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit doing this.

base/range.jl Outdated
in(x::Real, r::Range{<:Real}) = _in_range(x, r)
# This method needs to be defined separately since -(::T, ::T) can be implemented
# even if -(::T, ::Real) is not
in(x::T, r::Range{<:T}) where {T} = _in_range(x, r)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffBezanson What do you think about using Range{<:T} rather than Range{T} just to prevent ambiguities here (see my previous comment)? This feels like a hack to me and I'd be inclined to remove that method even if that forces people implementing non-Real types which support ranges to define it manually.

@nalimilan nalimilan force-pushed the nl/in branch 2 times, most recently from 19ef827 to 8bdeb8d Compare June 15, 2017 07:14
@nalimilan
Copy link
Member Author

nalimilan commented Jun 15, 2017

@Sacha0 Good to go now?

EDIT: if somebody merges this for me, better not squash the commits since the second one is not really related to the PR.

@Sacha0
Copy link
Member

Sacha0 commented Jun 15, 2017

@nanosoldier runbenchmarks(ALL, vs = ":master")

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo nanosoldier's signoff! :)

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@nalimilan
Copy link
Member Author

The only regression is in broadcast, and it doesn't seem related (though it's hard to tell).

@Sacha0
Copy link
Member

Sacha0 commented Jun 15, 2017

Imagine so, and @nanosoldier runbenchmarks("broadcast" && "dotop" , vs = ":master") can verify :).

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels


in(x::Integer, r::AbstractUnitRange{<:Integer}) = (first(r) <= x) & (x <= last(r))
in(x, r::Range{T}) where {T<:Integer} =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the fallback for in(x::Any, r::Range{<:Integer}) ?

Copy link
Member Author

@nalimilan nalimilan Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the generic iterator fallback: in(x, itr) = any(y -> y == x, itr). With #21402, it will be optimized out at compile time when types cannot possibly be equal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an open issue not a closed PR, so we don't have that yet? are any cases that would dispatch to this signature tracked in the benchmarks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's not merged yet, but that code path failed previously (which is the point of this PR), so I don't think performance matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkelman, does @nalimilan's response address your concerns? If so this PR seems in good shape? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a deprecation would be a bit strange but would at least make the change visible. let me take one more look at this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to take this case into account, where isinteger is defined but the type isn't a subtype of Real, it would be possible to dispatch on the isinteger trait, no? But it appears overkill to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which trait? isinteger is value dependent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let's say a runtime trait:

in(x, r::Range{T}) where {T<:Integer} = isinteger(x) ?
    _fast_real_method_below(x, r) :
    (x isa Real ? false : _slow_generic_iterator_method(x, r))

where _fast_real_method_below refers to in(x::Real, r::Range{T}) where {T<:Integer} defined below, where the isinteger test can be removed.
This would not work well for e.g. 1+0im, which is isinteger but not Real; then in _fast_real_method_below, the use of x can be replaced by Integer(x).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkelman are you fine with the state of this PR on this point?

@@ -104,6 +104,7 @@ end
isfinite(::Irrational) = true
isinteger(::Irrational) = false
iszero(::Irrational) = false
isinteger(::Irrational) = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already defined two lines above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely redundant, should be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's been added in another PR since I opened this one. I've just removed it.

end
in(x::Real, r::Range{<:Real}) = _in_range(x, r)
# This method needs to be defined separately since -(::T, ::T) can be implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this makes the assumption that in this case, -(::T, ::T) at least returns a Real? as well as step Range(::T)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that -(::T, ::T) returns something which can be divided by the range's step, and the result be rounded to Integer. But that shouldn't be an issue since this method will only be called if a Range{T} has been constructed, which means these operations are supported (or should be). The problem with the current implementation is that arithmetic operations could be called on any type which isn't supposed to support them at all.

@StefanKarpinski
Copy link
Member

We've got two approvals and green tests. Time to merge?

Previously, failures would happen when checking whether a non-real or
irrational value was in a range, contrary to what happens with an Array.
This was inconsistent with the fallback definition any(y -> y == x, itr).

To fix this, restrict the signature of optimized methods to Real, which are
the only types for which we can be certain the operations are supported.
n would be an Int when step(r) == 0, but possibly of another type when
step(r) != 0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants