-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow performance of nested parenthesis #294
Comments
Could you try the code snippet with the sslr toolkit. This is to validate if this is a parser issue. |
I'm happy to give it a go, but could you point me in the right direction on how to do that? |
I found a cxx version at http://mvnrepository.com/artifact/org.codehaus.sonar-plugins.cxx/sslr-cxx-toolkit/0.9 |
If you are checking out the code and building it you will find sslr-cxx-toolkit-0.9.2-SNAPSHOT.jar in the folder ...\sonar-cxx\sslr-cxx-toolkit\target |
I did a test. It is also slow in the toolkit. Seems to be an issue somewhere here: private static void expressions(LexerfulGrammarBuilder b) {
b.rule(primaryExpression).is(
b.firstOf(LITERAL,
CxxKeyword.THIS,
par_expression,
idExpression,
lambdaExpression,
// EXTENSION: gcc's statement expression: a compound statement enclosed in parentheses may appear as an expression
b.sequence("(", compoundStatement, ")"))
).skipIfOneChild();
b.rule(par_expression).is(b.sequence("(", expression, ")")); First suspicion: Has something to do with the gcc's statement expression? |
What does make you think so? |
Wow, thats slow... Worse, the runtime complexity is exponential, something like O(3^n). |
Have made a couple of measurements trying the 'gcc-expressions theory'. Ive measured parsing of the function posted by the OP, the 11 nested parantheses case. On my machine it takes: current trunk: 35 s so while it helps somewhat, it doesnt solve the problem. Id expect the running time somewhere in the tenths of milliseconds, maybe. |
Ive tried tags 0.9.1. and 0.9. While 0.9.1 is has essentially the same behavior as trunk (running time for the test case above is 30s), the 0.9 parses this expression immediately. From the pure intuition having not looked at the commit in between, I'd intuitively suspect the builtin checkers. |
Should be a case for git bisect. Starting with
|
Just as a quick note: You want to make sure that memoization is enabled on your grammar. It remembers the result of parsing something, if it already tried before. Without that, you can have exponentially growing parse time, for example : Add 1 level of parenthesis, multiply the parsing time by 2. |
Bisecting finished with this:
Which is not good news, as the behaviour is caused by the upgrade to a new SSLR version and is not directly in our hand... How to handle this? One option is to roll forward to either 0.19.2 or 0.20. The last one is AFAIR not exactly a drop-in replacement for 0.19.1 ... |
@dbolkensteyn: Thanks for the hint. It sounds like a possible cause. I'll have a look. |
@dbolkensteyn:
should suffice, shoudnt it? |
I just did a sample analysis after modifying CxxGrammarImpl.create() and CppGrammar.create() to call LexerfulGrammarBuilder.buildWithMemoizationOfMatchesForAllRules() instead of LexerfulGrammarBuilder.build(), and performance seems greatly improved: analysis time is down from 43minutes to 9minutes. I can see CxxSquidSensor now takes just 2min 52s, but I don't have the detail on the time is took before the change, and the code under analysis and cxx plugin may have been slightly modified between the two: more tests would be needed to verify if/how much performance increases, as well as check there is no difference in the results of the analysis, but this looks quite promising! Here is the simple patch I applied: diff --git a/cxx-squid/src/main/java/org/sonar/cxx/parser/CxxGrammarImpl.java b/cxx-squid/src/main/java/org/s
index cf5a460..000026c 100644
--- a/cxx-squid/src/main/java/org/sonar/cxx/parser/CxxGrammarImpl.java
+++ b/cxx-squid/src/main/java/org/sonar/cxx/parser/CxxGrammarImpl.java
@@ -294,7 +294,7 @@ public enum CxxGrammarImpl implements GrammarRuleKey {
b.setRootRule(translationUnit);
- return b.build();
+ return b.buildWithMemoizationOfMatchesForAllRules();
}
diff --git a/cxx-squid/src/main/java/org/sonar/cxx/preprocessor/CppGrammar.java b/cxx-squid/src/main/java/org
index 4ef216b..f135c84 100644
--- a/cxx-squid/src/main/java/org/sonar/cxx/preprocessor/CppGrammar.java
+++ b/cxx-squid/src/main/java/org/sonar/cxx/preprocessor/CppGrammar.java
@@ -111,7 +111,7 @@ public enum CppGrammar implements GrammarRuleKey {
b.setRootRule(preprocessorLine);
- return b.build();
+ return b.buildWithMemoizationOfMatchesForAllRules();
} |
More precisely, I get the following results for CxxSquidSensor:
And it seems the analysis reports the same issues and metrics in both versions. |
Patch here: #299 |
Thank for the patch, I see forward to testing it next week. |
Fixed with #299 |
I don't know if this a known feature/limitation, or bug, but I haven't seen anything about it while searching.
I have seen various people complaining of the symptom, but haven't seen a solution. If I've missed one I apologize.
At work we've been having a problem where analysis seemed to "stop" at one point for 1h 40mins and then continue. During the stop the java process is at 100%.
After some playing I narrowed it down to one file, then one function, then one statement, and then eventually to the simplest example I could.
This evening I've just installed sonarqube and the plugin at home and recreated the sample and it still does it, so I can now post a simple example here, and run tests to get more information if necessary.
The problem is seen when doing this code
slow.cpp:
Actually this also shows it, but throws a com.sonar.sslr.api.RecognitionException
Obviously the real code has a reason for the nested parenthesis, but this is a simple example that shows the slowdown.
I've include the full console output (showing the command as well) below.
But for the example above it is taking the CxxSquidSensor 33150 ms (33.1s) to analyse the one line.
If I change the number of nested parenthesis the timing changes as follows:
7 sets parenthesis : 733 ms (0.7s)
8 sets parenthesis : 1586 ms (1.6s)
9 sets parenthesis : 4000 ms (4s)
10 sets parenthesis : 11506 ms (11.5s)
11 sets parenthesis : 33150 ms (33.1s)
12 sets parenthesis : 102547 ms (1.7m)
13 sets parenthesis : 305545 ms (5.1m)
14 sets parenthesis : 891094 ms (14.9m)
So it appears each set is worse than double the analysis, approximately 3 fold. This doesn't seem right to me.
The real code has quite a few, and has code in them which is taking it to the 1h 40m.
Is there any reason you know that could cause this to happen, or something I could try?
That is something to try other than re-factoring our code, which in this case is probably possible and I will look at doing on Monday.
Regards
Gary
The console output (with debug) is:
The text was updated successfully, but these errors were encountered: