Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize LIKE pattern compilation and make LIKE pattern compilation interruptible. #279

Closed
wants to merge 2 commits into from

Conversation

dlurton
Copy link
Member

@dlurton dlurton commented Sep 14, 2020

On master this expression takes on average 37 seconds to compile:

'foo' like '%<n>%' (<n> is 1500 ! characters)

With this change, on average it takes ~2.6 seconds to compile.

This is still an order of magnitude slower than it really needs be, but is (comparatively speaking) much better and all that I can do in the short term.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

In two ways:

- Change fold/union operations to accumulate to a single list.
- Replace *ordered* sets and maps to hash sets and maps.

This results in a > 10x improvement in compiling large like patterns
(i.e. 1000 characters and up).
@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2020

Codecov Report

Merging #279 into master will increase coverage by 0.01%.
The diff coverage is 84.61%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #279      +/-   ##
============================================
+ Coverage     82.44%   82.45%   +0.01%     
  Complexity     1202     1202              
============================================
  Files           155      155              
  Lines          9283     9293      +10     
  Branches       1522     1524       +2     
============================================
+ Hits           7653     7663      +10     
- Misses         1175     1176       +1     
+ Partials        455      454       -1     
Flag Coverage Δ Complexity Δ
#CLI 18.11% <ø> (ø) 19.00 <ø> (ø)
#EXAMPLES 76.01% <ø> (ø) 27.00 <ø> (ø)
#LANG 85.14% <84.61%> (+0.01%) 999.00 <0.00> (ø)
#PTS 100.00% <ø> (ø) 0.00 <ø> (ø)
#TEST_SCRIPT 79.68% <ø> (ø) 157.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
.../src/org/partiql/lang/eval/LikeMatchingAutomata.kt 79.45% <84.61%> (+1.51%) 0.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67c0e1c...4954e9c. Read the comment docs.

Comment on lines +416 to +422
nfaStates.forEach { state ->
addAll(state.getOutgoingStates(it))

if (Thread.interrupted()) {
throw InterruptedException()
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change in this PR that results in the biggest performance improvement. The other stuff is incremental.

@dlurton
Copy link
Member Author

dlurton commented Sep 24, 2020

Superseded by #286.

@dlurton dlurton closed this Sep 24, 2020
@dlurton dlurton deleted the optimize-like-final branch September 29, 2020 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants