Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize == Documentation to not be Statistics-Specific #53024

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ChrisRackauckas
Copy link
Member

The == documentation made a special-case for the statistics community, specifically allowing a separate contract on missing. This denies extendability and only focuses the needs of only one small subset of the Julia community, which is something that is generally frowned upon.

This change to the documentation instead describes the generalization of this process, where (::T1 == ::T2)::Union{Bool, promote_type(T2)}, which is a version of the contract that includes the missing case as part of the general rule, but covers other areas of the Julia community, including symbolic computing, statically-defined boolean types (Static.jl, FillArrays.jl), and makes a clear rule beyond just handling the missing special case.

The `==` documentation made a special-case for the statistics community, specifically allowing a separate contract on `missing`. This denies extendability and only focuses the needs of only one small subset of the Julia community, which is something that is generally frowned upon.

This change to the documentation instead describes the generalization of this process, where `(::T1 == ::T2)::Union{Bool, promote_type(T2)}`, which is a version of the contract that includes the `missing` case as part of the general rule, but covers other areas of the Julia community, including symbolic computing, statically-defined boolean types (Static.jl, FillArrays.jl), and makes a clear rule beyond just handling the `missing` special case.
@giordano giordano added the docs This change adds or pertains to documentation label Jan 23, 2024
@giordano giordano changed the title Generalize == Documentation to not be Statastics-Specific Generalize == Documentation to not be Statistics-Specific Jan 23, 2024
base/operators.jl Outdated Show resolved Hide resolved
@inkydragon inkydragon added the merge me PR is reviewed. Merge when all tests are passing label Jan 26, 2024
@nsajko
Copy link
Contributor

nsajko commented Jan 26, 2024

I don't have an opinion on whether this change is a good idea, but, considering that it changes public API, surely it's worthy of some discussion before it gets merged, or at least several approvals or something.

@nsajko nsajko added design Design of APIs or of the language itself equality Issues relating to equality relations: ==, ===, isequal labels Jan 26, 2024
@inkydragon inkydragon removed the merge me PR is reviewed. Merge when all tests are passing label Jan 26, 2024
@adienes
Copy link
Contributor

adienes commented Jan 26, 2024

if we have types A and B from different packages, which "owns" ==(::A, ::B) ? both could lay a claim but also both could be considered piracy? should this have a documented recommendation to occur in extension packages?

while on the topic, should something be required like

  • if ==(a::A, b::B) returns true, and convert(A, b) is defined, then it must be true that hash(a) == hash(convert(A, b)) and vice-versa?

@vtjnash
Copy link
Member

vtjnash commented Jan 26, 2024

if we have types A and B from different packages

It depends on which package has the dependency on the other. Only one of A or B should be able to define this, since only one of them can access both type definitions, and thus that is the one that "owns" it (without an extension package). If there is an extension package, then they need to coordinate on where that lives, so that it doesn't try to define it multiple times.

@mbauman
Copy link
Member

mbauman commented Jan 26, 2024

There's also the special consideration for collections with missings in the following sentence. See also #52495.

@mikmoore
Copy link
Contributor

mikmoore commented Jan 26, 2024

While this proposal loosens the written API, that API has already been profitably disregarded for years. As I recall, this issue was originally opened because Symbolics.jl violates the written API. There are plenty of other packages violating this, as well. For example, Convex.jl uses something like Base.==(::Variable,::Variable)::Constraint (and the same for <=, >=). Note that this violates the newly-proposed API as well as the stricter existing one. JuMP.jl does something similar, although only supports this within a macro (maybe? all the examples appear to use it within one, anyway).

If we're to have restrictions, shouldn't similar remarks be added to the inequality operators <=, >=, !=? I notice we document that != "Always gives the opposite answer as ==", which of course does not make any sense except for a Bool result (and so Missing does not obey this except for a specific interpretation where it is its own "opposite"). It should probably state that it gives the result of ! applied to the result of ==. If this is a MethodError, so what? At least the behavior is clear.

I think the most we could specify is that == should "often" return a Bool, but anything stricter is unlikely to gain adoption anyway. And even that much seems unnecessary. We already document an exception for Missing, but it seems rather restrictive to prevent others from creating their own exceptions. Aren't such exceptions what multiple dispatch is all about? One can argue that these others (and maybe Missing) are punning the true meaning of ==, but infix operators are just so darned convenient to write and read. Even Base couldn't resist puns like * for string concatenation (better than + but is it really necessary?).

I would advocate to go further than this PR and remove any "formal API" regarding return types altogether. Actually insisting on an API would require breaking changes across numerous popular packages in the ecosystem and I don't see a real gain. What does an API achieve or improve? Has the varied use of == been problematic?

@Seelengrab
Copy link
Contributor

Seelengrab commented Jan 27, 2024

IMO anything that returns a non-Bool for a comparison should be a different verb entirely. It's fine to pun on == in a macro based DSL (where the underlying call can be easily replaced by a dsl_equal), it's not fine to add actual methods to == disobeying its contract. The exception for missing is (from some POV) unfortunate, but not one that can be changed now. Still, loosening the contract of == to more or less "do whatever you want" is IMO the wrong move.

My reasoning for this is that loosening the expected contract of this means we're losing any expectation that == actually does a comparison with some logical result (either 2-valued or 3-valued logic).

@andrewjradcliffe
Copy link
Contributor

loosening the contract of == to more or less "do whatever you want" is the wrong move.

I prefer less ambiguity, so perhaps my $0.02 is misplaced.

That == may be other than a logical comparison, sanctioned by the documentation in Base, would make any and all code which involves == difficult to reason about.

Given the lack of static analysis tooling, a programmer has no way to check that uses of == actually perform a logical comparison other than run the program. Such an approach might be feasible if one writes programs under 10000 lines, but anything larger is unmaintainable. N.B. Cthulhu and JET will not solve the simple query: do all my uses of == perform 2 (or 3)-valued logical comparison?

@ChrisRackauckas
Copy link
Member Author

Any breaking change is a Julia v2.0 discussion which is not on topic here. Can we please move any discussion of potential breaking changes to a separate issue? Since there's no plan for the 2.0, that is pretty irrelevant here.

That == may be other than a logical comparison, sanctioned by the documentation in Base, would make any and all code which involves == difficult to reason about.

Julia has stated within its documentation that == is not guaranteed to give a bool and if a bool is required then use isequal. This is something that has been standard in the language since v0.6 IIRC. Given that, there are many packages out there which rely on the documented behavior, including all of the statistics and data science stack, many array libraries (FillArrays for example), AD libraries, DSL libraries (Symbolics.jl, Convex.jl, and many more), static computation libraries, etc., it would be breaking behavior at least hundreds of packages to change this to require a Bool output. Doing a large breaking change is a much different topic than what is being mentioned here and would require a v2.0.

Such an approach might be feasible if one writes programs under 10000 lines, but anything larger is unmaintainable.

There seem to be comments missing some context about why it's common for packages to have dispatches to == which do not return boolean, so that is probably worth detailing. Since isequal is documented as requiring a boolean output and it's documented that if you truly need a boolean you should use that, == is generally used throughout Julia as a tool for using types for changing the way that == lowers to a boolean. Different cases of that include:

  1. Lazy evaluation. Julia does not have lazy evaluation built into its system but one can use the type system in order to build a lazy evaluation system (which is what Broadcast does). Symbolics.jl is a case of lazy evaluation of == to equality, and IIUC Convex.jl is as well.
  2. Disconnection of representation from computation. For example, Trues(n) from FillArrays is a singleton that represents a vector of n bools. Trues(n) == Trues(n) outputs Trues(n), not Vector{Bool} for clear performance reasons.
  3. 3-valued logic. Allowing missing in the result for missing == true and similar cases.

These are cases which are "== in spirit", but allowing performance optimizations and delayed evaluation. When you then have to ensure lazy evaluation is made eager, that's isequal, but the point is that if the performance optimizations and delayed evaluations are just in representation not in computation then it's fairly rare that a user needs to force eager behavior. The user thus generally uses the more generic form except in the cases, and generally for any of these packages to work the form that is most common to the user needs to be the one that is dispatched on. The discussions in this thread thus can be boiled down to essentially stating the lazy evaluation should not be allowed on boolean comparisons, something that I think may not be actually meant but is just implied by the deeper reasoning of why the non-boolean cases exist.

@Seelengrab
Copy link
Contributor

To expand on what I wrote above, as far as I'm aware, it's only explicitly defined to give one non-Bool type, namely Missing iff one of the arguments isa Missing, and otherwise must be Bool:

help?> ==
search: == === !== = .= >= => <= !=

  ==(x, y)

[...]

The result is of type Bool, except when one of the operands is missing, in which case missing is returned (three-valued logic (https://en.wikipedia.org/wiki/Three-valued_logic)).

The fact that the ecosystem has ignored this and broken the contract in the standard execution/interpretation environment is IMO owed to the fact that Julia has been historically very lax in enforcing these kinds of contracts.

The proposed reading/interpretation from this PR would pretty explicitly move == from 3-valued logic to N-valued logic, since any user can define == on their types. That seems very far removed from the notion of "Generic equality operator".

The discussions in this thread thus can be boiled down to essentially stating the lazy evaluation should not be allowed on boolean comparisons, something that I think may not be actually meant

I do mean exactly that, in the sense that I think there should be no method of == that has anything other than Bool or Missing as its return type for the regular method tables. If you're in a particular execution environment (i.e., non-standard compilation/interpretation, through overdubbing, other method tables...), you can of course go wild and lazily interpret everything 🤷

@andrewjradcliffe
Copy link
Contributor

Disconnection of representation from computation.

Lazy evaluation operates at the level of expressions (or partially-evaluated forms), which, indeed Julia enables quite well through the representation of expressions as parametric types (with multiple dispatch to glue everything together).

The Trues(n) == Trues(n) example illustrates the problem: in order for dispatch to continue, the return value must be Trues(n), otherwise an expression such as (Trues(n) == Trues(n)) == Trues(n) would fail to continue being lazily-evaluated. For other operations such as add, sub, mul, div, etc., returning an expression-type is uncontroversial -- e.g. that LazyMat * LazyMat returns a LazyMat is natural. However, if == is permitted such behavior, then there is a very good case for the rest of the logical operators (at least <, <=, >, and >=). e.g. that Zeros(n) < Ones(n) should return Trues(n) is justifiable under the same logic, no?

I am loathe to pose questions as response to a PR, but a clear answer is necessary:

  • should expressions-masquerading-as-types be part of the core language, or limited to DSLs?

@jishnub
Copy link
Contributor

jishnub commented Feb 10, 2024

I'm a bit puzzled by the discussion above, since

julia> Trues(2) == Trues(2)
true

and not Trues(2). Is the broadcasted elementwise equality being referred to here?

julia> (Trues(2) .== Trues(2)) == Trues(2)
true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself docs This change adds or pertains to documentation equality Issues relating to equality relations: ==, ===, isequal
Projects
None yet
Development

Successfully merging this pull request may close these issues.