Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements #1439

Merged
merged 30 commits into from
Apr 14, 2022
Merged

Conversation

manticore-projects
Copy link
Contributor

@manticore-projects manticore-projects commented Dec 10, 2021

Fixes #1397
Fixes #1438
Try to parse simple and fast first, before parsing complex
Update libraries to its latest versions
Remove JUnit4 Leftover

Add a 6 second timeout to the Parser and let it fail gracefully when a Statement can't be parsed within that time
Add a special test assuring the cleanup of the Parser objects after the timeout (no stale background processes)

Parallel Test execution
Gradle Caching
Explicitly request for latest JavaCC 7.0.10
Parallel Test execution
Gradle Caching
Explicitly request for latest JavaCC 7.0.10
Simplify the Primary Expression Production
Try to simple parse without Complex Expressions first, before parsing complex and slow (if supported by max nesting depth)
Add Test cases for issues JSQLParser#1397 and JSQLParser#1438
Update Libraries to its latest version
Remove JUnit 4 from Gradle
Let Parser timeout after 6 seconds and fail gently
Add a special test verifying the clean up after timeout
Copy link
Member

@wumpz wumpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem lies deeper. So this PR does somehow covers the symptoms but not the reasons. So we need to dig deeper.

id 'pmd'
id 'checkstyle'

// download the RR tools which have no Maven Repository
id "de.undercouch.download" version "4.1.2"
id "de.undercouch.download" version "latest.release"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Does this work the same way maven dependency versioning works like [1.0,2.0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know about Maven does (In fact I hate Maven. You can't even clean a project while offline. I went straight from ANT to Gradle, which "just understands and does what I want" instead of forcing me into any dogma). This latest.release automatically retrieves the Latest Release of a Gradle Plugin.

build.gradle Outdated
testImplementation 'org.apache.commons:commons-lang3:3.10'
testImplementation 'com.h2database:h2:1.4.200'
testImplementation 'commons-io:commons-io:2.11.0+'
testImplementation 'org.mockito:mockito-core:4.1.0+'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this version annotation 4.1.0+ mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is Gradle Sugar, which allows to use the latest update of a Library following "semantic versioning". In fact I applied it wrong. It should be either testImplementation 'commons-io:commons-io:2.11.+' for fetching latest release of the 2.11minor version or alternatively testImplementation 'commons-io:commons-io:2.+' for fetching the latest minor version of the 2 major version.

Found that also very handy for quickly updating Log4J2 recently :)


// first, try to parse fast and simple
try {
CCJSqlParser parser = newParser(sql).withAllowComplexParsing(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If complex parsing is initially switched off, it should not be switched on in this method. Right? However, this is only some kind of hack. The grammar itself is flawed if we need to do something like this. The porblem is the excessive use of Lookups, but if the grammar would be designed that a fixed lookup would be enough, then we do not need a switch like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in your observation, but I respectfully do not share your conclusion:

  1. It is not a hack but actually a completely valid approach (to my knowledge the only available method) to implement a IF ... THEN ... condition in the parser

  2. While the grammar certainly could be rewritten/optimized, I still claim that SQL is a bit exceptional:
    a) we try to be RDBMS/Dialect agnostic and so have to deal with a great flexibility in the Grammar
    b) SQL lacks reliable terminated Blocks, which makes it difficult to define Stop tokens. What do I mean with this: Obviously everything inside brackets ( ) is a block and the closing brackets ) terminates this block and we know that we can stop parsing an expression at the closing bracket. Unfortunately SQL was "designed" for assistant secretaries, who were supposed to query data without programming skills.

Best illustration is issue #1444:

select doc.fullName from XWikiDocument doc where doc.fullName like :fullName escape '/' and doc.author like :author escape '!'

Is this

select doc.fullName from XWikiDocument doc where ( doc.fullName like :fullName escape '/' ) 
and ( doc.author like :author escape '!' )

Or is this

select doc.fullName from XWikiDocument doc where doc.fullName like :fullName escape ( '/' 
and doc.author like :author escape '!' )

JavaCC by its nature lacks a lot of functionality to deal with those requirements. JavaCC 21 added some features for a reason and I believe ANTLR is there for a reason too.

Long story short:
Yes, you are right: too many lookup harms us badly. But I am also right: we do need those for parsing certain valid statements (look how functionality has grown since 4.0 and how the "new issue" rate went down since!)

So the suggest approach, while not a beauty at all solves that in a pragmatic way:
It tries to parse first without those slow lookup and only when this fails, it would activate the lookup -- because they are needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, if anyone can point me on how to avoid those lookup without losing functionality, I am all in!

# Conflicts:
#	src/test/java/net/sf/jsqlparser/statement/select/SelectTest.java
@manticore-projects
Copy link
Contributor Author

Totally unrelated: We pull the complete ANTLR as dependency for PMD, while we work with JavaCC in order to not use ANTLR -- I find that quite funny :)

pmd - The PMD libraries to be used for this project.
\--- net.sourceforge.pmd:pmd-java:6.41.0
     +--- net.sourceforge.pmd:pmd-core:6.41.0
     |    +--- org.antlr:antlr4-runtime:4.7.2
     |    +--- com.beust:jcommander:1.48
     |    +--- commons-io:commons-io:2.6
     |    +--- net.sourceforge.saxon:saxon:9.1.0.8
     |    +--- org.apache.commons:commons-lang3:3.8.1
     |    +--- org.ow2.asm:asm:9.2
     |    \--- com.google.code.gson:gson:2.8.5
     +--- net.sourceforge.saxon:saxon:9.1.0.8
     +--- org.ow2.asm:asm:9.2
     +--- commons-io:commons-io:2.6
     \--- org.apache.commons:commons-lang3:3.8.1

Unused imports
# Conflicts:
#	build.gradle
#	src/test/java/net/sf/jsqlparser/statement/select/SelectTest.java
# Conflicts:
#	build.gradle
#	src/main/jjtree/net/sf/jsqlparser/parser/JSqlParserCC.jjt
#	src/test/java/net/sf/jsqlparser/statement/select/SelectTest.java
Reduce test loops to fit intothe timeout
@wumpz wumpz merged commit 181a21a into JSQLParser:master Apr 14, 2022
@manticore-projects manticore-projects deleted the Issue1438 branch May 31, 2022 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants