-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equality and hash for self-referential recursive data-structures: Go Hopcroft? #26020
Comments
Yes I think it would be nice to implement this at some point. It could lead to hash values changing, but otherwise I believe is non-breaking and appropriate for a 1.x release. Somewhat amusingly, femtolisp implements the union-find algorithm for this already (https://github.com/JeffBezanson/femtolisp/blob/master/equal.c). |
To clarify, this means that hash values would change from Julia 1.x to 1.(x+1) right? Not that hash values would change during a running process. |
The way I thought about it, the only change would be some hash values changing from StackOverflowError to some UInt64, in the update 1.x to 1.(x+1), and some objects that currently compare as StackOverflowError becoming either equal or non-equal in the update 1.x to 1.(x+1). [I think Jeff has the same maybe-plan for the maybe-eventual change, because there is no real choice what to return] The price is that comparison/hashing becomes more complex and slightly slower for trees (DAGs would become maybe faster / maybe slower depending on details, and cyclic structures would be log-linear instead of StackOverflowError) Even if this is considered non-breaking and postponed into the far future, then one should still probably keep this plan in mind for the API for user-defined types with non-default hash and equality: There needs to be some kind of hook to go from the user code into the future Base code that handles all the union-find fun (otherwise the cyclic user-type won't profit from the eventual enhancement). |
We'll document that hash values may change in minor releases – people should not depend on hash values remaining the same or on dictionaries being iterated in the same order even if the same keys are inserted in the same order. We probably won't want to change this in patch version for the sake of being conservative but specific hash values are definitely not part of the stable API. |
Yes. |
hash
,isequal
and==
are currently unable to handle self-referential recursive structures:all give
StackOverflowError
.What should we do?
We already have perfectly fine non-recursive isequal for everything but
Array
andTuple
; here I'll just discussArray
. Two arraysL
andR
are different underisequal
if and only if one of there exists an index sequencei_1,..., i_n
such thatisequal(L[i_1][i_2]...[i_n], R[i_1][i_2]...[i_n])=false
.In other words: L and R are finite state machines over the alphabet
Int
and they are equality means "they accept the same language" (ok, strictly speaking they don't "accept", they are maps from integer sequences to Union{non_recursive, internal, out-of-bounds}).There are various O(N) algorithms for testing equality of DFAs. We could define the hash for recursive structures by e.g. replacing each
x
with "recursive -k", wherek>0
is minimal withx === y[i_1][i_2]...[i_k]
andisequal(x,y)
. Hash computation should be O(N log(N)) using something like Hopcroft.In principle, overhead for typical non-self-referential objects should be pretty small: If the types tell us that we are non-self-referential, then there is no overhead at all; otherwise, one would probably start and check self-referentiality on the way, and only switch to fancy union-find once we found the first loop.
So, what do you think?
(1) Is this the right notion of equality?
(2) If reasonably fast code existed, would it be worth the complexity to maintain/merge it?
(3) Is this worth the effort to implement?
(4) Could this be done post 1.0? Would moving from (maybe unreliable?) StackOverflowError to well-defined hash and equality be a breaking change?
(5) If this is a good plan for self-referentials, then there should be a good way for user-defined equality and hash to tie into the DFA equality/minimization algorithm; hence related to #4648. This could be easily done with an
essence
.I don't propose to change the default isequal for user-defined types; that's a separate discussion.
The text was updated successfully, but these errors were encountered: