Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020

chethega · 2018-02-12T21:37:12Z

hash, isequal and == are currently unable to handle self-referential recursive structures:

A=Any[]; push!(A,A); B=deepcopy(A);
isequal(A,B)
A==A
hash(A)

all give StackOverflowError.

What should we do?

We already have perfectly fine non-recursive isequal for everything but Array and Tuple; here I'll just discuss Array. Two arrays L and R are different under isequal if and only if one of there exists an index sequence i_1,..., i_n such that isequal(L[i_1][i_2]...[i_n], R[i_1][i_2]...[i_n])=false.

In other words: L and R are finite state machines over the alphabet Int and they are equality means "they accept the same language" (ok, strictly speaking they don't "accept", they are maps from integer sequences to Union{non_recursive, internal, out-of-bounds}).

There are various O(N) algorithms for testing equality of DFAs. We could define the hash for recursive structures by e.g. replacing each x with "recursive -k", where k>0 is minimal with x === y[i_1][i_2]...[i_k] and isequal(x,y). Hash computation should be O(N log(N)) using something like Hopcroft.

In principle, overhead for typical non-self-referential objects should be pretty small: If the types tell us that we are non-self-referential, then there is no overhead at all; otherwise, one would probably start and check self-referentiality on the way, and only switch to fancy union-find once we found the first loop.

So, what do you think?

(1) Is this the right notion of equality?
(2) If reasonably fast code existed, would it be worth the complexity to maintain/merge it?
(3) Is this worth the effort to implement?
(4) Could this be done post 1.0? Would moving from (maybe unreliable?) StackOverflowError to well-defined hash and equality be a breaking change?
(5) If this is a good plan for self-referentials, then there should be a good way for user-defined equality and hash to tie into the DFA equality/minimization algorithm; hence related to #4648. This could be easily done with an essence.

I don't propose to change the default isequal for user-defined types; that's a separate discussion.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2018-02-12T22:22:52Z

Yes I think it would be nice to implement this at some point. It could lead to hash values changing, but otherwise I believe is non-breaking and appropriate for a 1.x release.

Somewhat amusingly, femtolisp implements the union-find algorithm for this already (https://github.com/JeffBezanson/femtolisp/blob/master/equal.c).

StefanKarpinski · 2018-02-12T22:30:59Z

It could lead to hash values changing,

To clarify, this means that hash values would change from Julia 1.x to 1.(x+1) right? Not that hash values would change during a running process.

chethega · 2018-02-12T23:15:28Z

Somewhat amusingly, femtolisp implements the union-find algorithm for this already.
Nice!

To clarify, this means that hash values would change from Julia 1.x to 1.(x+1) right? Not that hash values would change during a running process.

The way I thought about it, the only change would be some hash values changing from StackOverflowError to some UInt64, in the update 1.x to 1.(x+1), and some objects that currently compare as StackOverflowError becoming either equal or non-equal in the update 1.x to 1.(x+1). [I think Jeff has the same maybe-plan for the maybe-eventual change, because there is no real choice what to return]

The price is that comparison/hashing becomes more complex and slightly slower for trees (DAGs would become maybe faster / maybe slower depending on details, and cyclic structures would be log-linear instead of StackOverflowError)

Even if this is considered non-breaking and postponed into the far future, then one should still probably keep this plan in mind for the API for user-defined types with non-default hash and equality: There needs to be some kind of hook to go from the user code into the future Base code that handles all the union-find fun (otherwise the cyclic user-type won't profit from the eventual enhancement).

StefanKarpinski · 2018-02-12T23:56:22Z

We'll document that hash values may change in minor releases – people should not depend on hash values remaining the same or on dictionaries being iterated in the same order even if the same keys are inserted in the same order. We probably won't want to change this in patch version for the sake of being conservative but specific hash values are definitely not part of the stable API.

JeffBezanson · 2018-02-13T03:37:33Z

this means that hash values would change from Julia 1.x to 1.(x+1) right?

Yes.

nalimilan added the hashing label Feb 18, 2018

StefanKarpinski added the equality Issues relating to equality relations: ==, ===, isequal label May 10, 2021

nsajko added the bug Indicates an unexpected problem or unintended behavior label Jun 27, 2024

nsajko mentioned this issue Dec 7, 2024

deepcopy with non-trivial circular references fails in 1.11.2 #56775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020

Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020

chethega commented Feb 12, 2018

JeffBezanson commented Feb 12, 2018

StefanKarpinski commented Feb 12, 2018

chethega commented Feb 12, 2018

StefanKarpinski commented Feb 12, 2018

JeffBezanson commented Feb 13, 2018

Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020

Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020

Comments

chethega commented Feb 12, 2018

JeffBezanson commented Feb 12, 2018

StefanKarpinski commented Feb 12, 2018

chethega commented Feb 12, 2018

StefanKarpinski commented Feb 12, 2018

JeffBezanson commented Feb 13, 2018