Should `allequal` use `==` instead of `isequal`? #49372

CameronBieganek · 2023-04-15T16:42:57Z

When I worked on the PR to add allequal (#43353), I used isequal instead of ==, because unique and allunique use isequal. However, in retrospect that seems like the wrong choice. It seems to me that users looking for allequal functionality will generally want to compare values with ==. In particular, it is unfortunate that allequal does not follow three-valued logic. So, we currently have the following behavior, which seems undesirable to me:

julia> allequal([NaN, NaN])
true

julia> allequal([missing, missing])
true

julia> allequal([1, missing])
false

I think for most use cases it would be better for those to return false, missing, and missing, respectively.

In the PR discussion, I mentioned the difference between using isequal and using ==, but nobody objected to using isequal. 😅

Unfortunately this would be a breaking change. I would almost consider the failure to propagate missingness a bug, but technically it's not a bug since the docstring does say that comparisons are made using isequal.

Would this qualify as one of those "technically breaking" changes that we could slip into a minor release?

The text was updated successfully, but these errors were encountered:

ericphanson · 2023-04-15T18:15:38Z

To me, it makes sense that allequal uses isequal, and those answers look reasonable. Do you have examples/situations where the other behavior would be better?

CameronBieganek · 2023-04-15T19:15:13Z

I don't have a particular use case in mind. I just have a gut feeling that comparison with == is more generally useful, especially since it propagates missingness.

Here's an example inspired by a question on Discourse. Suppose we have the following matrix:

m = [1        missing  NaN  3  4
     missing  missing  NaN  3  5]

And then we want to call findall(allequal, eachcol(m)). The problem is, there is no obvious right way to do it. It depends on what the user is looking for. We probably shouldn't have added allequal, because it's hard to predict which version of equality the user is looking for, whether it be ==, isequal, or something in between, like

function myequals(x, y)
    isnan(x) && isnan(y) && return true
    x == y
end

When I opened the issue and the PR, there were some objections to expanding the Base namespace, but I don't think anyone brought up the issue of ambiguity over the type of equality comparison to use.

ericphanson · 2023-04-15T19:56:58Z

The way I’ve used it is for sanity-checking data, as a complement to allunique. E.g. check the id column is all unique, check some supposedly constant column is allequal (or all-equal per group after grouping), etc. In this case, isequal semantics are generally what I want and are consistent with allunique. But I agree in general different uses will need different semantics.

aplavin · 2023-04-17T12:59:09Z

First of all, it would be extremely weird if a function named allequal used something other than isequal for comparisons.

Next is probably a matter of opinion, but would be nice not to pollute more and more fundamentally boolean function results with missings. For me, as someone who uses missings extremely rarely, it's already easy to forget and be surprised that == can return not a bool.

For those, presumably rare, cases when the difference actually matters, a slightly more verbose all(==(first(A)), A) should be fine, I think.

CameronBieganek · 2023-04-17T13:35:59Z

First of all, it would be extremely weird if a function named allequal used something other than isequal for comparisons.

all== is not a syntactically valid identifier, so what else could all(==(first(A)), A) possibly be named? The == symbol is read as "equals", after all. Also, many beginner Julia programmers are not even aware of isequal.

Next is probably a matter of opinion, but would be nice not to pollute more and more fundamentally boolean function results with missings.

all itself already propagates missing, so it shouldn't be too surprising if allequal also propagates missing.

julia> all([true, missing])
missing

Once you buy in to three-valued logic at the language level, it's hard to justify not using three-valued logic.

StefanKarpinski · 2023-04-17T14:05:48Z

I think this is a non-starter because == is not an equivalence relation, specifically for values like NaN it's non-reflexive. The only equality predicates that this can behave sanely with and be implemented efficiently with are the ones that are true equivalence relations and compatible with hashing, so isequal and ===.

aplavin · 2023-04-17T14:10:58Z

Once you buy in to three-valued logic at the language level, it's hard to justify not using three-valued logic.

The only, or at least the default, equality operator in Julia functions is most typically isequal -- due to the reasons outlined by @StefanKarpinski. For example, current allequal is the same as length(unique(A)) <= 1, and it's nice to have this consistency.

mikmoore · 2023-04-17T14:44:24Z

Would there be room for keyword arguments like our sorting functions have? I.e.,

allequal([(+0.0,1), (-0.0,2)]; by=first, eq=(==)) == true
allequal([(+0.0,1), (-0.0,2)]; by=first, eq=isequal) == false

I suppose this would also involve expanding isequal, unique, and maybe some others with matching keywords.

CameronBieganek · 2023-04-17T18:19:08Z

I'm fully aware that isequal is an equivalence relation and == is not. But I think the point is that the correct comparison to use depends highly on the context. In the context of data analysis and table operations, one usually expects 3VL to hold.

Let me sharpen the matrix example above to make it more concrete. I think this is the use case that originally motivated me to make the allequal PR. Suppose you are working on a predictive modeling task. You have assembled various data sources into a data frame containing all the features you wish to use. However, you typically don't want to use columns that are constant when training a machine learning model or performing a linear regression. So you use allequal to look for columns that are constant:

julia> df = DataFrame(a=[1, missing], b=[missing, missing], c=[NaN, NaN], d=[3, 3], e=[4, 5])
2×5 DataFrame
 Row │ a        b        c        d      e     
     │ Int64?   Missing  Float64  Int64  Int64 
─────┼─────────────────────────────────────────
   1 │       1  missing      NaN      3      4
   2 │ missing  missing      NaN      3      5

julia> allequal.(eachcol(df))
5-element BitVector:
 0
 1
 1
 1
 0

If you're not paying attention, you might think that column a is non-constant. But in fact the real answer (from the perspective of a data analyst) is missing. Here is what you would see if you used == instead:

julia> allequal3VL(itr) = all(==(first(itr)), itr);

julia> allequal3VL.(eachcol(df))
5-element Vector{Union{Missing, Bool}}:
      missing
      missing
 false
  true
 false

(Granted, in this scenario you probably also need to keep an eye out for columns that are all NaN. That would have to be a separate check.)

Would there be room for keyword arguments like our sorting functions have?

That seems reasonable to me. There is already an open PR to add a predicate version, allequal(f, itr), PR #47679. If we have the predicate version, we wouldn't need the by keyword argument, so your example would become allequal(first, itr; eq=(==)).

Are there any objections to adding an eq keyword argument? It does parallel nicely with the lt keyword argument in sort.

LilithHafner · 2023-04-18T17:40:28Z

This seems like a large API for a small feature. Is allequal(x; eq=(==)) that much better than all(==(first(x)), x)?

CameronBieganek · 2023-04-18T18:11:10Z

No, it's not much better. But it's a slippery slope. Once you have allequal, it starts to make sense to have an eq keyword argument.

I think I'm now of the opinion that we (I) shouldn't have added allequal to Base in the first place. 😬

ericphanson · 2023-04-18T18:39:07Z

This might not be very helpful, but I like allequal and didn't comment on the PR about which comparison because I thought isequal was obviously the right one to use, and I assume others did the same.

CameronBieganek · 2023-04-18T18:45:36Z

Ok, at least this issue has verified that some people prefer it the way it is now, which is good to know.

udohjeremiah added the equality Issues relating to equality relations: ==, ===, isequal label Apr 16, 2023

vtjnash closed this as completed Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `allequal` use `==` instead of `isequal`? #49372

Should `allequal` use `==` instead of `isequal`? #49372

CameronBieganek commented Apr 15, 2023

ericphanson commented Apr 15, 2023

CameronBieganek commented Apr 15, 2023

ericphanson commented Apr 15, 2023

aplavin commented Apr 17, 2023 •

edited

Loading

CameronBieganek commented Apr 17, 2023

StefanKarpinski commented Apr 17, 2023

aplavin commented Apr 17, 2023 •

edited

Loading

mikmoore commented Apr 17, 2023 •

edited

Loading

CameronBieganek commented Apr 17, 2023

LilithHafner commented Apr 18, 2023

CameronBieganek commented Apr 18, 2023

ericphanson commented Apr 18, 2023

CameronBieganek commented Apr 18, 2023

Should allequal use == instead of isequal? #49372

Should allequal use == instead of isequal? #49372

Comments

CameronBieganek commented Apr 15, 2023

ericphanson commented Apr 15, 2023

CameronBieganek commented Apr 15, 2023

ericphanson commented Apr 15, 2023

aplavin commented Apr 17, 2023 • edited Loading

CameronBieganek commented Apr 17, 2023

StefanKarpinski commented Apr 17, 2023

aplavin commented Apr 17, 2023 • edited Loading

mikmoore commented Apr 17, 2023 • edited Loading

CameronBieganek commented Apr 17, 2023

LilithHafner commented Apr 18, 2023

CameronBieganek commented Apr 18, 2023

ericphanson commented Apr 18, 2023

CameronBieganek commented Apr 18, 2023

Should `allequal` use `==` instead of `isequal`? #49372

Should `allequal` use `==` instead of `isequal`? #49372

aplavin commented Apr 17, 2023 •

edited

Loading

aplavin commented Apr 17, 2023 •

edited

Loading

mikmoore commented Apr 17, 2023 •

edited

Loading