-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Appropriately fast LIKE
pattern compilation
#286
Conversation
In two ways: - Change fold/union operations to accumulate to a single list. - Replace *ordered* sets and maps to hash sets and maps. This results in a > 10x improvement in compiling large like patterns (i.e. 1000 characters and up).
Codecov Report
@@ Coverage Diff @@
## master #286 +/- ##
============================================
+ Coverage 82.44% 83.08% +0.64%
- Complexity 1202 1287 +85
============================================
Files 155 157 +2
Lines 9283 9745 +462
Branches 1522 1647 +125
============================================
+ Hits 7653 8097 +444
- Misses 1175 1190 +15
- Partials 455 458 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Added some of my own comments which I will address along with the other reviewer's comments. |
lang/src/org/partiql/lang/eval/like/CodepointCheckpointIterator.kt
Outdated
Show resolved
Hide resolved
I've done a fair bit of checking on the evaluation-time performance and memory consumption. That part of it seems exactly on-par with the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Not really an issue for me, more of a limitation, is that the recursive call pattern for dealing with %
can cause a stack overflow, the pathological examples would be a pattern with a series of leading %
.
@Test
fun stressTest() {
executePattern(parsePattern("%".repeat(4000) + "a", null), "a")
}
Which we could "compile" into a pattern with 1 leading %
.
If users end up writing such a pattern (I hope they do not) they can do the rewrite to get around it. :)
I've added a change to consider multiple consecutive |
Implements #284 Replaces previous LIKE implementation which is slow when compiling large patterns with wildcard characters `%` with another implementation that compiles the patterns in linear time and has similar performance characteristics at evaluation time.
Implements #284 Replaces previous LIKE implementation which is slow when compiling large patterns with wildcard characters `%` with another implementation that compiles the patterns in linear time and has similar performance characteristics at evaluation time.
Implements #284 Replaces previous LIKE implementation which is slow when compiling large patterns with wildcard characters `%` with another implementation that compiles the patterns in linear time and has similar performance characteristics at evaluation time.
Implements #284 Replaces previous LIKE implementation which is slow when compiling large patterns with wildcard characters `%` with another implementation that compiles the patterns in linear time and has similar performance characteristics at evaluation time.
Implements #284
This is able to compile the like pattern
%<n>%
where<n>
is 8000!
characters on my local machine in ~60ms. The previous implementation took too long to measure on my local machine, even after #279, because it used an an unoptimized state machine. I do not know if there is a name for the algorithm in this PR, but it does not use a state machine.I haven't yet analysed the performance of the evaluating the compiled pattern--I will need to do that before this can be merged.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.