-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tips #1540
Comments
GitHub is not a place for performance and other questions.
Try to reduce the number of rules. For example, you can use token operationType
: 'query'
; But I'am not sure that you'll get the significant speedup with such small optimizations. At first glance this grammar looks good.
Within Java you can use the following code: String code = readFile(args[0]);
ANTLRInputStream codeStream = new ANTLRInputStream(code);
SeparatedLexer lexer = new SeparatedLexer(codeStream);
// Start lexer benchmark
List<? extends Token> tokens = lexer.getAllTokens();
// End lexer benchmark
ListTokenSource tokensSource = new ListTokenSource(tokens);
CommonTokenStream tokensStream = new CommonTokenStream(tokensSource);
SeparatedParser parser = new SeparatedParser(tokensStream);
// Start parser benchmark
ParserRuleContext ast = parser.rule1();
// End parser benchmark
String stringTree = ast.toStringTree(parser);
System.out.print("Tree " + stringTree); |
Apologies for asking performance problem. I was not aware of it. And thanks for giving example for lexer and parser separate benchmarking. |
with only lexing :
i found that lexing is taking most of the time : benchmark with only lexer for antlr :
with both lexer and parser :
I think that this is well known issue that lexing takes most of the time. |
@KvanTTT I personally think that it is totally appropriate to asking this type of performance issue question here. This does not look the type of question that could be answered by reading a faq. Ashish has reproducible code sitting in a public repository. |
@KvanTTT I agree with @millergarym 100% here. Not only are performance issues bugs, Google Groups is a useless piece of crap in which it's impossible to not lose formatting of the information @ashishnegi so meticulously put into tables. So either accept GitHub issues like these, or use something that's not utterly broken as a forum (Discourse works) |
@ashishnegi Can you try the following to rule out some more things?
|
@sharwell thanks for looking into this.
Here is the result of running |
I "benchmarked" one query Only lexing on
and with lexing and parsing :
If someone can proof read the cpp benchmark, it would be a double check. |
It could be as simple as a hash function for ATNConfigSet that is not optimal for our usage. @sharwell had to put in a murmur hash to get java speed. |
@parrt That would explain a case of slow behavior on the first pass. If the lexer remains slow on the exact same input after the first pass, it suggests that edges are getting dropped from the DFA somehow. Of course this always happens in cases where a semantic predicate is crossed, but there are no semantic predicates in the example grammar. |
Hmm...yeah, I'd have to compare the go/java runtime code. |
@sharwell (cc: @parrt, @pboyer ), Before going into what I found, here is the result.
The Go runtime is naive port of the Java runtime. It is a good start, but in many places it leaves a bit to be desired. The hashing code was an example of this. The port basically used The "real" issue in the lexer is the that static fields in the Java are ported to non-static fields in the Go and initialization is done per new lexer. This is somewhat understandable as Go doesn't have an explicit static keyword. To achieve static semantics in Go the fields needs to be at the 'package level' and initialized in a This will be a minor change in the template. I'll tidy up my murmur hash code, make this change and hopefully turn it into a PR next week. It would be nice to look for all similar issues in the template and runtime. I'll probably open a new issues for this, consolidating this and #1705. Cheers |
Uh, wow!
…On Thu, Mar 9, 2017, 8:32 PM Gary Miller ***@***.***> wrote:
@sharwell <https://github.com/sharwell> (cc: @parrt
<https://github.com/parrt>, @pboyer <https://github.com/pboyer> ),
Sam, I really need to pay more attention to your comments. Your comment
above only made sense to me this morning. That was after spending more time
than I should of improving the performance by 50% with murmur hash and then
hitting a wall.
Before going into what I found, here is the result.
old = current anltr4 Go runtime (non murmur hash)
new = minor change to generated lexer (still no murmur)
benchmark old ns/op new ns/op delta
BenchmarkLexer-4 99298 3188 -96.79%
benchmark old allocs new allocs delta
BenchmarkLexer-4 457 22 -95.19%
benchmark old bytes new bytes delta
BenchmarkLexer-4 16132 1200 -92.56%
The Go runtime is naive port of the Java runtime. It is a good start, but
in many places it leaves a bit to be desired.
The hashing code was an example of this. The port basically used String()
string (equivalent to String toString()) and then hashed the string. The
result was a large number of memory allocs.
The "real" issue in the lexer is the that static fields in the Java are
ported to non-static fields in the Go and initialization is done per new
lexer. This is somewhat understandable as Go doesn't have an explicit
static keyword. To achieve static semantics in Go the fields needs to be at
the 'package level' and initialized in a func init() { .. } (aka in Java
a static initializer).
This will be a minor change in the template. I'll tidy up my murmur hash
code, make this change and hopefully turn it into a PR next week.
It would be nice to look for all similar issues in the template and
runtime. I'll probably open a new issues for this, consolidating this and
#1705 <#1705>.
Cheers
Gary
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1540 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA37eYnOHuxqcIT9BEx_sFyqNfDzdEuQks5rkKgugaJpZM4LSugk>
.
|
@millergarym Do you have this in a branch? How can I help? I'll have some free time this weekend and it would be good to take a look. |
@pboyer Just saw your message. I don't have time right now so actually haven't test this. It should work as it is the changes I made to the generated code. |
@ashishnegi can you try latest HEAD? Just incorporated Go speed improvements. |
@parrt I benchmarked again on
Numbers are now within range of 5 times. This is definitely a very good improvement. 👍 Do i need to try any other branch ? |
It seems that the Go runtime is still 1~2x slower than Java. I did some test with the C grammar and a macro-expanded source file from the Lua project (~200KB). Both the Go version and the Java version just walk the C file. Noticeably the Go version spent over 40% of time on memory management. This is probably related to the binary-trees benchmark: the Go runtime eagerly scans all objects and pointers to free some garbage, but barely achieves anything in this case. Turning off GC is no good, either. Maybe some kind of memory pool? [1, 2] Among the 10 most expensive functions, 7 are related to GC:
In the cumulative list, GC work accounts for over 40% of execution time:
|
@wdscxsj can you please provide some more details.
If possible can you please include profile listing for the hot spots.
I'm not sure if |
@millergarym Thanks for your reply! Please see this repo: https://github.com/wdscxsj/antlrperfcomp. |
@wdscxsj a couple of things to note.
What use-case are you tested for? |
@millergarym Thanks again for the quick response. A little background may be helpful. I'm using ANTLR in a private project that does some on-and-off code analysis and conversion. The original Java implementation is to be ported to Go. We (I and my colleagues) found that the Go version was consistently slower than the original Java one. For example, for a middle-sized input, the lexing + parsing time (but dominantly lexing, as observed by ashishnegi) was 10.0s in Java and 27.9s in Go (or 125.2s for Go with ANTLR 4.6). Profiling indicates that the Go version spends too much time managing the memory. The memory consumption is significantly lower, but we just want it to run as fast. And by "1~2x slower", I meant "2 to 3 times as as slow" as Java. Have you tried to compare both runtimes' performance? Don't you think Go should run at roughly the same (if not a bit higher) speed as Java? |
@wdscxsj if Go is 2-3x slower on a single run then my guess would be that is would be even slower on a benchmark because of the JIT. I agree, in general if Go is slower than Java it is likely an implementation issue. (see http://benchmarksgame.alioth.debian.org/u64q/go.html) Can you please add the anltr jars (complete and source) you used to the repo. When, in the future I do another round of performance work this will be a valuable baseline. Cheers |
@millergarym The repo has been updated. Thanks. |
@millergarym I suppose this issue is (at least partially) caused by Go's GC design: golang/go#23044. There is discussion about adding a By the way, @sharwell's optimized Java version is lightning fast. Thank you very much! |
* Fixes #1154 * Generally helps catch trailing syntax errors * Performance-neutral relative to previous grammar * Recommended by antlr4 devs, can help performance in some cases * See antlr/antlr4#1540
* Fixes #1154 * Generally helps catch trailing syntax errors * Performance-neutral relative to previous grammar * Recommended by antlr4 devs, can help performance in some cases * See antlr/antlr4#1540
* Fixes #1154 * Tests that #1154 example produces SyntaxErrorsException * Generally helps catch trailing syntax errors * Performance-neutral relative to previous grammar * Recommended by antlr4 devs, can help performance in some cases * See antlr/antlr4#1540
* Fixes #1154 * Tests that #1154 example produces SyntaxErrorsException * Generally helps catch trailing syntax errors * Performance-neutral relative to previous grammar * Recommended by antlr4 devs, can help performance in some cases * See antlr/antlr4#1540
* Fixes #1154 * Tests that #1154 example produces SyntaxErrorsException * Generally helps catch trailing syntax errors * Performance-neutral relative to previous grammar * Recommended by antlr4 devs, can help performance in some cases * See antlr/antlr4#1540 (cherry picked from commit 8f108c1)
* Fixes #1154 * Tests that #1154 example produces SyntaxErrorsException * Generally helps catch trailing syntax errors * Performance-neutral relative to previous grammar * Recommended by antlr4 devs, can help performance in some cases * See antlr/antlr4#1540 (cherry picked from commit 8f108c1)
I ran the original benchmark against the current version of Antlr just to see how far it has improved since then:
Antlr is now down to being only ~4 times slower than the handwritten parser. |
Cool. @jcking is working on Go target as we speak. He just squeezed a big improvement out of C++ target for 4.10.1. |
@ashishnegi Please recheck your performance using the @dev branch of the go runtime - you should see big improvements. I just submitted PRs that correct most of the performance issues, and I will be doing more to improve it over time. |
The problem was a number of bugs in the runtime, which are now fixed. The runtime no longer generates millions of allocations for a start :) |
If you folks can confirm, I'll close :) |
Confirmed. With https://github.com/wdscxsj/antlrperfcomp, the Go dev runtime now runs faster than the Java runtime. On an old laptop the time cost is 1.3s against 2.1s. Need to try some other data later, but a huge performance boost indeed! Hats off to @jimidle for the wonderful work. Update: It seems the Java runtime may scale better on a multicore system. The previous result was from a '16 ultrabook running Window 10. On a 32-core Linux server, the Go version (with profiling turned off) is still slower than the Java version, and |
That's good to hear - we are getting somewhere with the go runtime :). I will look at your example grammar in case there is any low hanging fruit that I can find. |
I believe that there is problem somewhere that is causing a lot of allocations of SingltonPredictionContext in closureChecking... I need a little time to investigate that. GOMAXPROCS should not affect things per se, but it can indicate other problems in the code of course. Let's not close this just yet. |
By fixing #2016, I may have found why all the SingletonPredictionContexts are being created, but I will need a little time to work through it and improve it. |
Should I leave this one open @jimidle or start a new one? |
I think leave this because I think I might have found the reason we are not
quite there (though it is already a vast improvement of course).
…On Tue, Aug 30, 2022 at 23:54 Terence Parr ***@***.***> wrote:
Should I leave this one open @jimidle <https://github.com/jimidle> or
start a new one?
—
Reply to this email directly, view it on GitHub
<#1540 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMAAJ7K64MPFY4DUIZ3V3YVCFANCNFSM4C2K5ASA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That is one of the interesting threads to follow. Any updates? |
Yes. I have solved all the algorithm issues. Well to medium well formed
grammars are now fast in go. Poorly constructed grammars such as the MySQL
are still slower than Java. But I know why and am working on solving the
memory pressure on the GC. I need probably this week to solve that issue.
It’s not a difficult solution, but the algorithm wasn’t implemented in a
way that allows easy allocation tracking. So I have had to write some
analytic tools.
With that solved, I can finish some structural reconfigure that will make
quite some more impact on well formed grammars. After that, it will be CPU
to finish it off.
Then finally, documentation and any remains bugs. I’m leaving this open
until I can review all the different parts of this thread and make sure all
is good . The dev branch is very much better than before but I intend that
the go runtime will be the most performant runtime.
…On Mon, Apr 3, 2023 at 13:19 Ivan Suslov ***@***.***> wrote:
That is one of the interesting threads to follow. Any updates?
—
Reply to this email directly, view it on GitHub
<#1540 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMBR24ICVWYO6SDA3CDW7JMU7ANCNFSM4C2K5ASA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@wdscxsj I have come back to this to test the idea that the go runtime is somehow not able to scale when there are multiple cores. This seems to be a red herring and basically is just a function of context switching etc. If I lock your driver to a single thread and then run using hyperfine, we can see that the user and system time are affected by the context switching. So I am going to ask @parrt to close this now. However, there will still be performance improvements related to interface use and memory use (especially escapes to heap I hope) down the line. The allocation of ATNConfigs and the CPU in adaptive predict and executeATN takes all the time, and the GC has to track that. However the GC runs on separate threads anyway, so it might take CPU time from the system and will cause some stopping to do the collection, but if execution times are small then it doesn't really matter. Also, the benchmark here does not measure the difference between pre cache warmup and post cache warmup. So, let's close this as I don't think there are now any performance issues in this thread that I have not fixed. For people's interest, here are the results using hyperfine, which is much more sophisticated than say
You can see here that though the wall clock appears to be longer without
If we play with garbage collection:
So, garbage collection is not really a burden even at these fairly high parse times. The C grammar is fairly well written as it is by Sam, but I am sure that there are lots of improvements that could be made. I will try the parse in SLL mode. So using range on hyperfine, then we get this:
Which tells us: Summary
'GOMAXPROCS=6 ./antlrperfcomp test/input.c' ran
1.00 ± 0.02 times faster than 'GOMAXPROCS=5 ./antlrperfcomp test/input.c'
1.02 ± 0.02 times faster than 'GOMAXPROCS=9 ./antlrperfcomp test/input.c'
1.02 ± 0.03 times faster than 'GOMAXPROCS=7 ./antlrperfcomp test/input.c'
1.03 ± 0.02 times faster than 'GOMAXPROCS=8 ./antlrperfcomp test/input.c'
1.03 ± 0.04 times faster than 'GOMAXPROCS=4 ./antlrperfcomp test/input.c'
1.04 ± 0.02 times faster than 'GOMAXPROCS=13 ./antlrperfcomp test/input.c'
1.04 ± 0.02 times faster than 'GOMAXPROCS=11 ./antlrperfcomp test/input.c'
1.04 ± 0.03 times faster than 'GOMAXPROCS=10 ./antlrperfcomp test/input.c'
1.04 ± 0.02 times faster than 'GOMAXPROCS=20 ./antlrperfcomp test/input.c'
1.04 ± 0.04 times faster than 'GOMAXPROCS=12 ./antlrperfcomp test/input.c'
1.04 ± 0.02 times faster than 'GOMAXPROCS=22 ./antlrperfcomp test/input.c'
1.04 ± 0.03 times faster than 'GOMAXPROCS=23 ./antlrperfcomp test/input.c'
1.04 ± 0.03 times faster than 'GOMAXPROCS=16 ./antlrperfcomp test/input.c'
1.04 ± 0.03 times faster than 'GOMAXPROCS=21 ./antlrperfcomp test/input.c'
1.04 ± 0.03 times faster than 'GOMAXPROCS=19 ./antlrperfcomp test/input.c'
1.05 ± 0.03 times faster than 'GOMAXPROCS=24 ./antlrperfcomp test/input.c'
1.05 ± 0.02 times faster than 'GOMAXPROCS=18 ./antlrperfcomp test/input.c'
1.05 ± 0.02 times faster than 'GOMAXPROCS=17 ./antlrperfcomp test/input.c'
1.05 ± 0.02 times faster than 'GOMAXPROCS=15 ./antlrperfcomp test/input.c'
1.05 ± 0.03 times faster than 'GOMAXPROCS=14 ./antlrperfcomp test/input.c'
1.05 ± 0.04 times faster than 'GOMAXPROCS=25 ./antlrperfcomp test/input.c'
1.05 ± 0.03 times faster than 'GOMAXPROCS=3 ./antlrperfcomp test/input.c'
1.11 ± 0.02 times faster than 'GOMAXPROCS=2 ./antlrperfcomp test/input.c'
1.57 ± 0.35 times faster than 'GOMAXPROCS=1 ./antlrperfcomp test/input.c My system would use GOMAXPROCS=12 by default I think. So, please close this issue @parrt. |
Just as an added bonus, if you switch this driver in to SLL mode, then it is quite a bit quicker:
source, err := antlr.NewFileStream("test/input.c")
if err != nil {
fmt.Println("Could not open file", err)
return
}
tokens := antlr.NewCommonTokenStream(parsing.NewCLexer(source), 0)
p := parsing.NewCParser(tokens)
p.GetInterpreter().SetPredictionMode(antlr.PredictionModeSLL)
tree := p.CompilationUnit()
antlr.NewParseTreeWalker().Walk(new(parsing.BaseCListener), tree) |
We are planning to replace our hand written parser with ANTLR4 in production for parsing our graph database language spec. ANTLR grammar looks elegant and precise.
Our language spec is a variant of GraphQL. The user queries that we will have to parse look like :
We started benchmarking from the simplest subset grammar.
Benchmarks :
We expected these numbers to be under 0.05 ms. They are currently around 1.5 ms.
Here are comparisons with handwritten parser and antlr golang parser over practical queries :
Benchmarks :
Antlr4 is around 40x slower.
I have also tried
SLL
parsing. It also did not helped.Q i did not got any SLL parsing failure over multiple inputs. In what kind of query and grammar can SLL fail and we have to move to LL ? I am asking this to confirm if i am doing something wrong.
Can i get some performance tips ?
Also, How can we benchmark lexer and parser separately ?
The text was updated successfully, but these errors were encountered: