-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to .NET Core 2.2 and compiled regexes #14
Conversation
2.2 can be 3.0 now 😄 |
I'm sorry. I'll try to review it all soon. |
Thank you! I have merged it but I will update to 3.0. |
thanks @mariomka |
It would be great if you have time to update the numbers in the README using 3.0 -- although as I mentioned above, I do not think .NET looks great, it is less bad. We will look at what we can do in the next version. |
I have updated the benchmark using 3.0. |
Maybe I should add a benchmark with and without |
@mariomka does it make a significant difference? (is it slower?) I see it rarely used in .NET code, so I think "real world" is mostly without it. |
With With: C# .Net Core | 672.25 | 557.74 | 88.69 | 1318.68 Without: C# .Net Core | 1312.87 | 1080.66 | 94.72 | 2488.24 But if it isn't a real use case then I will remove it. |
Updated. |
cc @stephentoub for the interesting observation about ECMASCript setting perf above. |
The occurrences in the parser are more impactful, not because of parsing performance but because of choices made during parsing that affect execution. For example, when you write For .NET 5, the difference should be drastically reduced, with both getting much faster and the difference between them shrinking significantly. For example, when I run the benchmarks on my machine with .NET Core 3.1, I get:
and then when I set ECMAScript, I get:
However, with recent bits from master, I get this without ECMAScript:
and this with ECMAScript.
So, there's still a difference, because we still need to do slightly more work in the Unicode category case just in case the input isn't ASCII, but the gap is much smaller. (As an aside, the methodology used for the benchmarking isn't ideal if the goal is to measure steady-state performance. Right now it's invoking the benchmarked code just once, with the timing numbers including all of the time required to compile/JIT/etc. the regex. The numbers get much better on subsequent invocations.) (Also, we're still working on improving regex perf in .NET 5, so I expect we'll see these numbers get even better.) |
It seems the improvement in .NET is awesome! Please, ping me when version 5.0 is released for updating results. |
@mariomka thanks for helping us make it better! Would you consider updating your benchmark to allow creating/compiling the regex before measurement begins? This is more realistic: most apps will use the same regex many times so parsing/compilation is heavily amortized . |
I'm sorry. at this moment I don't consider update the benchmarks. It's not perfect but it's enough to get an idea. |
Sure! Sounds good. |
@stephentoub here are the results on my machine tfor 5.0 preview 3, with and without ECMAScript. PHP | 27.26 | 25.49 | 8.69 | 61.44 My machine is considerably slower than yours, based on the 3.1 numbers, so 5.0 seems to have improved a fair bit since your Jan numbers. But still a significant difference with and without ECMAScript |
Are each of these finding the same number of matches on all of the inputs? |
Yup, logged that:
|
BTW, I tried with/without ECMAScript on regexredux, and it has a negligible effect, as expected as it's using trivial ASCII character classes, instead of |
.NET Core 2.0 is out of support, so moving to latest 2.2
.NET Core 2.1 added support for compiled regexes (same as .NET Framework) so I'm reenabling that. Although .NET still trails the pack, this roughly doubles performance.