Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Unicode support for C# to match Java, Rust, C, C++ #26

Closed
wants to merge 1 commit into from

Conversation

danmoseley
Copy link
Contributor

@danmoseley danmoseley commented May 4, 2020

The C benchmark does not enable Unicode support (it doesn't pass PCRE_UTF* option) nor does the C++ Boost benchmark (it doesn't use u32regex) and Java doesn't either (it doesn't set Pattern.UNICODE_CHARACTER_CLASS nor add the ?U option). Rust (since #21) explicitly passes .unicode(false).

C# however is laboring with matching Unicode character classes. To be fair, this adds RegexOptions.ECMAScript. The flag has other effects, but disabling Unicode is the only effect that is relevant to this benchmark - the other ones relate to aspects of the pattern that aren't present in this benchmark's patterns.

This increases C# perf by about ~45% on 3.1. It's still some distance off Rust/C/C++ but it brings it in line with Java (at least with the improvements in .NET 5.0) which is nice.

cc @stephentoub

@mariomka
Copy link
Owner

mariomka commented May 4, 2020

Finally, the benchmark will go with the default configuration. I merged Rust update but I removed .unicode(false) statement.
But I have decided to create a branch allowing code and setting optimizations. Check #27, please.

@danmoseley
Copy link
Contributor Author

Sounds good @mariomka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants