Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 < missing < 3 fails #31171

Open
cstjean opened this issue Feb 25, 2019 · 20 comments
Open

1 < missing < 3 fails #31171

cstjean opened this issue Feb 25, 2019 · 20 comments
Labels
missing data Base.missing and related functionality

Comments

@cstjean
Copy link
Contributor

cstjean commented Feb 25, 2019

julia> 2 < missing < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

Should be missing, no?

@Keno
Copy link
Member

Keno commented Feb 25, 2019

The syntax has short-circuiting semantics [1] (i.e. the upper bound doesn't get evaluated if the first condition fails. Of course there could be some special exception for missing, but it would certainly make the lowering significantly more complicated.

[1] i.e. a < b < c is equivalent to a < b && b < c.

@cstjean
Copy link
Contributor Author

cstjean commented Feb 25, 2019

It explains this:

julia> missing < 1 < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

julia> 3 > 1 > missing
missing

@ararslan ararslan added the missing data Base.missing and related functionality label Feb 25, 2019
@nalimilan
Copy link
Member

nalimilan commented Feb 27, 2019

I agree it would be nice if this worked. For example a typical pattern would be filter(x -> 30 < x < 60, age) (or something similar on data frame rows) EDIT: this fails in the presence of missing values, it only makes sense with a special function which would skip missing values as in SQL or R.

But we probably don't want to special-case missing, so we would have to use & instead of && in all cases, which isn't possible before 2.0.

@nalimilan nalimilan added the triage This should be discussed on a triage call label Feb 27, 2019
@cstjean
Copy link
Contributor Author

cstjean commented Feb 28, 2019

FWIW this already works:

julia> 2 .< missing .< 3
missing

@ararslan
Copy link
Member

ararslan commented Mar 7, 2019

Triage discussed this for quite a while, and none of the options for implementing this are that appealing. We did not come to a conclusion in this discussion. Some of the options discussed were:

  • Special-case missing in the front-end (not really a good option)
  • Use & (integers have this return an integer, which is a problem)
  • Do nothing

Speaking on behalf of myself (not of triage), I think I'm beginning to favor doing nothing here. This may be one of the cases where users should be wary of missing values and use explicit ismissing checks as appropriate. It's not an ideal solution, but it's explicit.

@Keno
Copy link
Member

Keno commented Mar 7, 2019

I had proposed the following lowering of a < b < c, which some people liked, but didn't get much love otherwise.

#1 = a
#2 = b
r = #1 < #2
if r === true
     r = #2 < c
else
     r = &(r, true)
end
return r

@mbauman
Copy link
Member

mbauman commented Mar 7, 2019

I didn't quite catch it on the call; why is the last r = &(r, true) needed? Isn't the fact that < should really only return Bool (or missing-like singletons) enough?

@JeffBezanson
Copy link
Member

It at least requires the thing you return to support &, so it's similar to switching the lowering to use & except it's still lazy.

@mbauman
Copy link
Member

mbauman commented Mar 7, 2019

I get that, but it feels strange to me since we don't call & in the other branch. If we're going to do anything at all here I'd favor the simpler:

#1 = a
#2 = b
r = #1 < #2
if r === true
     r = #2 < c
end
return r

That said, I'm not wholly convinced we actually need to do something here. @nalimilan's filtering example would be compelling, except that filter wouldn't even work if its function returns missing (not even DataFrames' specialization). Typically if I do this as a scalar operation, I'll be branching on the result in short order. If I just want to store the result, I think it'd be much more common to just broadcast — which lowers to (a .< b) .& (b .< c).

@ararslan
Copy link
Member

ararslan commented Mar 7, 2019

If I just want to store the result, I think it'd be much more common to just broadcast

Not sure what you mean, could you elaborate?

@Keno
Copy link
Member

Keno commented Mar 7, 2019

I get that, but it feels strange to me since we don't call & in the other branch

Why? We force the LHS to return a boolean now, but not the RHS

@mbauman
Copy link
Member

mbauman commented Mar 7, 2019

If I just want to store the result, I think it'd be much more common to just broadcast

Not sure what you mean, could you elaborate?

There are two ways of using the result from a < b < c.

  • You can use it to make a decision. If we propagated a missing out of it, it would throw an error as soon as you tried to branch on if missing. Implementing this would just move the error one step down the road in an if a < b < c expression.
  • You can store the result or do some more computations on it later down the road (and don't branch on it). In my mind, this is much more common to be done on many values at once — and thus you'd typically just use broadcast.

Why? We force the LHS to return a boolean now, but not the RHS

It just gets back to my "what's the point" question. If it's to emulate & semantics, then it seems like we should actually call & on both branches. If it's to ensure we get a logicky value back from <, then that seems like the kind of thing we tend to say "don't do that" instead of adding complications like this — particularly complications in lowering.

@cstjean
Copy link
Contributor Author

cstjean commented Mar 7, 2019

For the sake of concreteness: IntervalSets.jl didn't support missing, and I found out that their methods:

in(v, I::TypedEndpointsInterval{:closed,:closed}) = leftendpoint(I)  v  rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:open}) = leftendpoint(I) < v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:closed,:open}) = leftendpoint(I)  v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:closed}) = leftendpoint(I) < v  rightendpoint(I)

would have worked fine already if this issue was fixed, but I had to add other methods to make it work:

in(::Missing, I::TypedEndpointsInterval{:closed,:closed}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:closed,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:closed}) = !isempty(I) && missing

It's not the end of the world of course. It does raise the issue of whether 3 < missing < 1 should be missing, or false.

@nalimilan
Copy link
Member

Sorry, my example with filter was indeed incorrect. I had a SQL- or R-like behavior in mind, where missing values are skipped, and which we should probably support for DataFrames (e.g. via an argument -- I can't find the place where we discussed this before). @cstjean's example is much better since it's not hypothetical.

@StefanKarpinski
Copy link
Member

It does raise the issue of whether 3 < missing < 1 should be missing, or false.

This is a really good point. If we're going to go out of our way to handle missing cleverly, should we try to handle this? Is that even a reasonable thing to try to do?

@nalimilan
Copy link
Member

I can't imagine returning anything other than missing, as that's consistent with what we do everywhere else. Incidentally that seems to be what @cstjean does in his example.

@mbauman
Copy link
Member

mbauman commented Mar 11, 2019

Nope, he elected to return false in the case akin to 3 < missing < 1 (isempty(I)) — and I agree with that choice.

@nalimilan
Copy link
Member

Ah, my bad, I hadn't realized the order of 3 and 1 was the question at hand here. Then I'd say either missing or false are reasonable. false would be clever, and consistent with the fact that we use all available information when computing the return value in three-valued logic (e.g. false & missing === false). But it would require a more complex lowering.

@cstjean
Copy link
Contributor Author

cstjean commented Mar 11, 2019

If we give up on the short-circuiting behaviour, couldn't a < b <= c lower to something like compare(a, <, b, <=, c), with compare being a generated function?

@oxinabox
Copy link
Contributor

3 < missing < 1 evaluting to false is similar to the interesting cases from #31066 (comment)

@test clamp(1, 52, missing) == 52
@test clamp(1000, missing, 101) == 101

@Keno Keno removed the triage This should be discussed on a triage call label Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

No branches or pull requests

8 participants