Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020

Open
chethega opened this issue Feb 12, 2018 · 5 comments
Labels
bug Indicates an unexpected problem or unintended behavior equality Issues relating to equality relations: ==, ===, isequal hashing

Comments

@chethega
Copy link
Contributor

hash, isequal and == are currently unable to handle self-referential recursive structures:

A=Any[]; push!(A,A); B=deepcopy(A);
isequal(A,B)
A==A
hash(A)

all give StackOverflowError.

What should we do?

We already have perfectly fine non-recursive isequal for everything but Array and Tuple; here I'll just discuss Array. Two arrays L and R are different under isequal if and only if one of there exists an index sequence i_1,..., i_n such that isequal(L[i_1][i_2]...[i_n], R[i_1][i_2]...[i_n])=false.

In other words: L and R are finite state machines over the alphabet Int and they are equality means "they accept the same language" (ok, strictly speaking they don't "accept", they are maps from integer sequences to Union{non_recursive, internal, out-of-bounds}).

There are various O(N) algorithms for testing equality of DFAs. We could define the hash for recursive structures by e.g. replacing each x with "recursive -k", where k>0 is minimal with x === y[i_1][i_2]...[i_k] and isequal(x,y). Hash computation should be O(N log(N)) using something like Hopcroft.

In principle, overhead for typical non-self-referential objects should be pretty small: If the types tell us that we are non-self-referential, then there is no overhead at all; otherwise, one would probably start and check self-referentiality on the way, and only switch to fancy union-find once we found the first loop.

So, what do you think?

(1) Is this the right notion of equality?
(2) If reasonably fast code existed, would it be worth the complexity to maintain/merge it?
(3) Is this worth the effort to implement?
(4) Could this be done post 1.0? Would moving from (maybe unreliable?) StackOverflowError to well-defined hash and equality be a breaking change?
(5) If this is a good plan for self-referentials, then there should be a good way for user-defined equality and hash to tie into the DFA equality/minimization algorithm; hence related to #4648. This could be easily done with an essence.

I don't propose to change the default isequal for user-defined types; that's a separate discussion.

@JeffBezanson
Copy link
Member

Yes I think it would be nice to implement this at some point. It could lead to hash values changing, but otherwise I believe is non-breaking and appropriate for a 1.x release.

Somewhat amusingly, femtolisp implements the union-find algorithm for this already (https://github.com/JeffBezanson/femtolisp/blob/master/equal.c).

@StefanKarpinski
Copy link
Member

It could lead to hash values changing,

To clarify, this means that hash values would change from Julia 1.x to 1.(x+1) right? Not that hash values would change during a running process.

@chethega
Copy link
Contributor Author

Somewhat amusingly, femtolisp implements the union-find algorithm for this already.
Nice!

To clarify, this means that hash values would change from Julia 1.x to 1.(x+1) right? Not that hash values would change during a running process.

The way I thought about it, the only change would be some hash values changing from StackOverflowError to some UInt64, in the update 1.x to 1.(x+1), and some objects that currently compare as StackOverflowError becoming either equal or non-equal in the update 1.x to 1.(x+1). [I think Jeff has the same maybe-plan for the maybe-eventual change, because there is no real choice what to return]

The price is that comparison/hashing becomes more complex and slightly slower for trees (DAGs would become maybe faster / maybe slower depending on details, and cyclic structures would be log-linear instead of StackOverflowError)

Even if this is considered non-breaking and postponed into the far future, then one should still probably keep this plan in mind for the API for user-defined types with non-default hash and equality: There needs to be some kind of hook to go from the user code into the future Base code that handles all the union-find fun (otherwise the cyclic user-type won't profit from the eventual enhancement).

@StefanKarpinski
Copy link
Member

We'll document that hash values may change in minor releases – people should not depend on hash values remaining the same or on dictionaries being iterated in the same order even if the same keys are inserted in the same order. We probably won't want to change this in patch version for the sake of being conservative but specific hash values are definitely not part of the stable API.

@JeffBezanson
Copy link
Member

this means that hash values would change from Julia 1.x to 1.(x+1) right?

Yes.

@StefanKarpinski StefanKarpinski added the equality Issues relating to equality relations: ==, ===, isequal label May 10, 2021
@nsajko nsajko added the bug Indicates an unexpected problem or unintended behavior label Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior equality Issues relating to equality relations: ==, ===, isequal hashing
Projects
None yet
Development

No branches or pull requests

5 participants