-
Notifications
You must be signed in to change notification settings - Fork 35
optin optimizations
CSLY comes with some optimizations that can be enabled for parsers. Theses optimizations are not always relevant according to your parser structure. An optimization may produce interesting results for an expression grammar and causes performance losses for a JSON parser (JSON is relatively easy to parse and does not need any special optimization to be fast enough)
Optimizations are enabled with the help of C# attributes on the parser class. Many optimizations can be used simultaneously.
For now 2 optimizations are available.
The idea of this optimization is to cache intermediary parsing result to limit backtracking, at the cost of memory use. When a grammar does not need too much bactracking this optimization might be a bad fit.
The C# attribute is UseMemoization
[UseMemoization]
public class MyParser {
// parser definition
}
Natively CSLY only use one token to choose production rules. With this optimization it can look ahead 2 tokens in some cases which might limit backtracking. Once again this optimization must be tested as it may not be a good choice
This optimization may be dwarfed by the Use Memoization optimization.
The C# attribute is BroadenTokenWindow
[BroadenTokenWindow]
public class MyParser {
// parser definition
}
here are the results of some benchmarking with both optimizations. It clearly shows that optimizations produce different results depending on grammar.
Method | UseMemoization | BroadenTokenWindow | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
TestJson | False | False | 88.18 ms | 1.689 ms | 1.579 ms | 9833.3333 | 4000.0000 | 1500.0000 | 56.26 MB |
TestJson | False | True | 87.36 ms | 1.686 ms | 2.524 ms | 10000.0000 | 3500.0000 | 1333.3333 | 58.68 MB |
TestJson | True | False | 152.28 ms | 5.975 ms | 17.143 ms | 12000.0000 | 4750.0000 | 2250.0000 | 68.88 MB |
TestJson | True | True | 164.18 ms | 4.283 ms | 12.495 ms | 12333.3333 | 4666.6667 | 2000.0000 | 71.29 MB |
JSON is a highly structured language. Some tokens like {
or [
uniquely denotes structures which leads to limited backtracking. Hence both Use Memoization and [toekn window broadening] do not add any performance. In fact they cause performance loss due to memory consumption (memoization) or processing time (token window)
Method | Memoize | Broaden | Mean | Error | StdDev | Median | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
TestWhile | False | False | 2.864 ms | 0.1012 ms | 0.2788 ms | 2.780 ms | 964.8438 | 304.6875 | - | 4.2 MB |
TestWhile | False | True | 2.963 ms | 0.1412 ms | 0.3816 ms | 2.883 ms | 949.2188 | 304.6875 | - | 4.24 MB |
TestWhile | True | False | 7.963 ms | 0.2302 ms | 0.6677 ms | 7.777 ms | 921.8750 | 453.1250 | 218.7500 | 5.62 MB |
TestWhile | True | True | 9.692 ms | 0.7434 ms | 2.1330 ms | 9.460 ms | 937.5000 | 421.8750 | 203.1250 | 5.65 MB |
As for JSON While language is a structured language. So the same as JSON is observed.
parser from #414
Method | UseMemoization | BroadenTokenWindow | Mean | Error | StdDev | Median | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
TestBackTrack | False | False | 927.96 ms | 33.831 ms | 94.307 ms | 898.35 ms | 183000.0000 | 39000.0000 | 7000.0000 | 978.48 MB |
TestBackTrack | False | True | 890.39 ms | 27.614 ms | 79.673 ms | 878.01 ms | 180000.0000 | 39000.0000 | 8000.0000 | 971.88 MB |
TestBackTrack | True | False | 32.14 ms | 1.680 ms | 4.849 ms | 31.75 ms | 2000.0000 | 2000.0000 | 1666.6667 | 23.81 MB |
TestBackTrack | True | True | 32.33 ms | 1.964 ms | 5.698 ms | 31.63 ms | 1785.7143 | 1714.2857 | 1428.5714 | 23.8 MB |
This parser is mainly an expression parser that does not use the expression parsing feature. It causes heavy backtracking because of the many precednece levels. Use Memoization limits the baktracking and toekn window broadening does not add evident benefice.