-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax Operations.isTotal() to work with a deterministic automaton #13707
Conversation
I feel like with this code there are only two options:
a third option: |
OK I took a third option here in the latest commit. Javadoc is unchanged, we just detect the kind of "total" automaton being created by RegExp, too, without involving any scary algorithms. It just relaxes the check and detects total automaton that looks like this (with a-z alphabet for demonstration):
This is what happens with |
iterating again, I changed the code here to @dweiss solution and added tests based on his examples. it runs in linear time and space (bitset), and should work generally for any DFA, so IMO it is safe. |
heh that empty case got found in a different way by the randomized check. it isn't enough to do will fix. |
… could be unreachable)
ok, the tests have found all the fun with empty cases and I think it is ready for review. for background, this problem is similar to lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java Lines 821 to 858 in 5b125f3
to keep it simple, for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!! LGTM.
( I filed an issue < 24hrs ago, had a brief exchange of messages, then went to a concern followed by a late and too short sleep, to now wakeup the delightful code in this PR. That is why I love this project. ❤️ )
I love the symmetry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the simplicity of this algo, yay! I just got confused on whether we claim to support NFAs?
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java
Outdated
Show resolved
Hide resolved
I updated the javadocs: "The automaton must be deterministic, or this method may return false." |
Operations.isTotal currently returns
false
unless the DFA is minimal.This makes the method almost useless and we definitely don't want to encourage minimization just to make such a check.
Can we do a better job, e.g. return
true
for a non-minimal DFA?There's an example test added that fails without the change to demonstrate:
This is a draft PR because I still don't like that it uses
subsetOf
, the code literally makes a minimal "total DFA" and compares that the two automata recognize the same language. Because it is total, we only need to call subsetOf once, but I still don't like how heavy it is. Can we do better?See #13706 for more background