Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dramatic performance degradation between JSqlParser 4.0 and 4.2 #1397

Closed
wumpz opened this issue Nov 2, 2021 · 10 comments · Fixed by #1439
Closed

dramatic performance degradation between JSqlParser 4.0 and 4.2 #1397

wumpz opened this issue Nov 2, 2021 · 10 comments · Fixed by #1439
Assignees
Labels

Comments

@wumpz
Copy link
Member

wumpz commented Nov 2, 2021

Some of our included PRs had a dramatic performance impact. So they need to be revisited. This log comes from the same SQL which uses non recursive large case when statements.

JSqlParser 4.2

466091 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_SpecialStringFunctionWithNamedParameters_4486_9_249())
869250 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_Column_1597_5_134())
870821 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_UserVariable_3831_5_215())
870821 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_UserVariable_3831_12_456())
1335347 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameList_1584_5_323())
1587076 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectName_1656_5_174())
1587076 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameExt_1686_7_409())
1587076 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectName_1656_6_396())
1587076 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameExt_1686_5_189())
1587076 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameWithoutValue_1612_5_601())
177987285 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_scan_token(int))

JSqlParser 4.0

75789 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3_28())
75995 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_Column_1307_5_114())
76148 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_UserVariable_3334_5_175())
76148 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_UserVariable_3334_12_358())
116525 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameList_1294_5_226())
154419 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectName_1350_5_133())
154419 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectName_1350_6_295())
154419 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameExt_1380_5_147())
154419 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameWithoutValue_1322_5_415())
154419 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_3R_RelObjectNameExt_1380_7_308())
9681645 - execution(private boolean net.sf.jsqlparser.parser.CCJSqlParser.jj_scan_token(int))

If you look at the methods call log, some methods are called 10 times and more of the version 4.0.

Maybe we have to rewind some PRs for this.

@wumpz wumpz added bug P1 high priority refactor labels Nov 2, 2021
@wumpz wumpz self-assigned this Nov 2, 2021
@wumpz
Copy link
Member Author

wumpz commented Nov 2, 2021

@manticore-projects We have to check those complex expression extension for case when constructs. I have a performance degradation of 10x and more here. At least I suspect those changes to be the problem.

@manticore-projects
Copy link
Contributor

Can you give me the particular SQL statements please, which you have used for the test and maybe point out how exactly you measured? Should we involve PMD as Test Dependency to measure impact of changes more reliably?
(Right now I can only use the "total build time" to estimate the impact of a certain change.)

Also, we would need to discuss the objectives "correctness" vs. "performance". In my own opinion, a parser must be as correct as possible at first, and only after that should be as fast as possible. I was reluctant to sacrifice correctness over performance. But of course we should look into optimizations, when a particular construct causes such performance deterioration.

@wumpz
Copy link
Member Author

wumpz commented Nov 2, 2021

Sure, but a degradation of 10x is huge. I am working on an example SQL. For statistics of this I wrote https://github.com/JSQLParser/javacc-perf-diag.

@manticore-projects
Copy link
Contributor

I was not aware, that there is another Performance related Test Suite aside our normal integration tests (with defined 2sec timeouts).

So far I was satisfied when the normal tests ran through, within 30 seconds in total and without a time out exception for a single test. (Just explaining where I come from.)

Of course I share your concern about the performance deterioration. In my opinion, it's a two step:

  1. we need a tool, which can run benchmarks across all GIT commits in order to establish a) a benchmark and b) identify a a commit which breaks a certain threshold (e.g. more than 20% slower) -- best case with identification of the statement

  2. when we have identified the commit and the statement, we can start to break our heads how we can get it faster

I believe investing into step 1 makes sense because it will be a re-occurring problem: Correctness unfortunately competes with performance and every single improvement on the Correctness will jeopardize the performance.

So we would need to run this benchmarking tool before we commit anything and make it part of the CI process.

It also means, that we should find a smart way to extract all SQL statements from the Test Suite. Currently they are burried inside the Java Code and not easy to access from outside.

@chensk0601
Copy link

any plans to fix this issue? I meet the same questions too.

@manticore-projects
Copy link
Contributor

any plans to fix this issue? I meet the same questions too.

With what query and which version on JSQLParser exactly please.

@wumpz
Copy link
Member Author

wumpz commented Nov 15, 2021

@chensk0601 The investigation is in progress ... I am right now collecting SQLs that have performance issues for version 4.2 and had no performance issues in version v4 and below. If you have some, feel free to add those at this issue.

@wumpz
Copy link
Member Author

wumpz commented Nov 16, 2021

Here is one SQL. The problem is, that parsing this using JSqlParser 4.2 uses nearly 10 times of token reads, lookaheads than version 4.0.

sanitized_biq_query_final.txt

Sure there could be a smaller example, but this is the start.

@manticore-projects
Copy link
Contributor

manticore-projects added a commit to manticore-projects/JSqlParser that referenced this issue Dec 10, 2021
Simplify the Primary Expression Production
Try to simple parse without Complex Expressions first, before parsing complex and slow (if supported by max nesting depth)
Add Test cases for issues JSQLParser#1397 and JSQLParser#1438
Update Libraries to its latest version
Remove JUnit 4 from Gradle
wumpz pushed a commit that referenced this issue Apr 14, 2022
* Adjust Gradle to JUnit 5

Parallel Test execution
Gradle Caching
Explicitly request for latest JavaCC 7.0.10

* Do not mark SpeedTest for concurrent execution

* Remove unused imports

* Adjust Gradle to JUnit 5

Parallel Test execution
Gradle Caching
Explicitly request for latest JavaCC 7.0.10

* Do not mark SpeedTest for concurrent execution

* Remove unused imports

* Performance Improvements

Simplify the Primary Expression Production
Try to simple parse without Complex Expressions first, before parsing complex and slow (if supported by max nesting depth)
Add Test cases for issues #1397 and #1438
Update Libraries to its latest version
Remove JUnit 4 from Gradle

* Appease PMD

* Update Gradle Plugins to its latest versions
Let Parser timeout after 6 seconds and fail gently
Add a special test verifying the clean up after timeout

* Revert unintended changes to the Test Resources

* Appease PMD/Codacy

* Correct the Gradle "+" dependencies

* Bump version to 4.4.-SNAPSHOT

* update build file

* revert unwarranted changes in test files

* strip the Exception Class Name from the Message

* maxDepth = 10 collides with the Parser Timeout = 6 seconds

* License Headers
Unused imports

* Bump version to 4.5-SNAPSHOT
Reduce test loops to fit intothe timeout
@chensk0601
Copy link

chensk0601 commented Apr 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants