Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.RegularExpressions performance regressions #65187

Closed
adamsitnik opened this issue Feb 11, 2022 · 5 comments
Closed

System.Text.RegularExpressions performance regressions #65187

adamsitnik opened this issue Feb 11, 2022 · 5 comments

Comments

@adamsitnik
Copy link
Member

Latest manual perf run (.NET 6 vs .NET 7 preview 1) has discovered multiple regressions in System.Text.RegularExpressions. Since most of them were not reported previously, I am opening the issue, so the owners of System.Text.RegularExpressions can triage them accordingly.

I've uploaded a full report here. It's sorted desc from biggest regression to biggest improvement.

The @performanceautofiler has not reported these regressions as they were most likely introduced before the benchmarks were added in November 2021 (historical data)

Sample repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count

Sample results:

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

Result Ratio Operating System Bit
Slower 0.43 Windows 11 X64
Slower 0.45 Windows 11 X64
Slower 0.46 Windows 10 X64
Slower 0.42 Windows 11 X64
Slower 0.46 Windows 10 X64
Slower 0.42 Windows 11 X64
Slower 0.43 Windows 11 X64
Slower 0.44 Windows 11 X64
Slower 0.43 Windows 11 X64
Slower 0.43 ubuntu 20.04 X64
Slower 0.45 ubuntu 18.04 X64
Slower 0.40 centos 7 X64
Slower 0.43 ubuntu 18.04 X64
Slower 0.44 alpine 3.13 X64
Slower 0.43 ubuntu 18.04 X64
Slower 0.44 ubuntu 20.04 X64
Slower 0.47 Windows 10 Arm64
Slower 0.32 Windows 10 X86
Slower 0.53 Windows 10 Arm
Slower 0.43 macOS Big Sur 11.6.3 X64
Slower 0.43 macOS Big Sur 11.4 X64

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

Result Ratio Operating System Bit
Slower 0.45 Windows 11 X64
Slower 0.48 Windows 11 X64
Slower 0.46 Windows 10 X64
Slower 0.48 Windows 11 X64
Slower 0.45 Windows 10 X64
Slower 0.46 Windows 11 X64
Slower 0.43 Windows 11 X64
Slower 0.44 Windows 11 X64
Slower 0.51 Windows 11 X64
Slower 0.51 ubuntu 20.04 X64
Slower 0.48 ubuntu 18.04 X64
Slower 0.53 centos 7 X64
Slower 0.51 ubuntu 18.04 X64
Slower 0.48 alpine 3.13 X64
Slower 0.47 ubuntu 18.04 X64
Slower 0.46 ubuntu 20.04 X64
Slower 0.57 Windows 10 Arm64
Slower 0.45 Windows 10 X86
Slower 0.59 Windows 10 Arm
Slower 0.52 macOS Big Sur 11.6.3 X64
Slower 0.52 macOS Big Sur 11.4 X64

(cc @stephentoub )

@ghost
Copy link

ghost commented Feb 11, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Latest manual perf run (.NET 6 vs .NET 7 preview 1) has discovered multiple regressions in System.Text.RegularExpressions. Since most of them were not reported previously, I am opening the issue, so the owners of System.Text.RegularExpressions can triage them accordingly.

I've uploaded a full report here. It's sorted desc from biggest regression to biggest improvement.

The @performanceautofiler has not reported these regressions as they were most likely introduced before the benchmarks were added in November 2021 (historical data)

Sample repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count

Sample results:

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

Result Ratio Operating System Bit
Slower 0.43 Windows 11 X64
Slower 0.45 Windows 11 X64
Slower 0.46 Windows 10 X64
Slower 0.42 Windows 11 X64
Slower 0.46 Windows 10 X64
Slower 0.42 Windows 11 X64
Slower 0.43 Windows 11 X64
Slower 0.44 Windows 11 X64
Slower 0.43 Windows 11 X64
Slower 0.43 ubuntu 20.04 X64
Slower 0.45 ubuntu 18.04 X64
Slower 0.40 centos 7 X64
Slower 0.43 ubuntu 18.04 X64
Slower 0.44 alpine 3.13 X64
Slower 0.43 ubuntu 18.04 X64
Slower 0.44 ubuntu 20.04 X64
Slower 0.47 Windows 10 Arm64
Slower 0.32 Windows 10 X86
Slower 0.53 Windows 10 Arm
Slower 0.43 macOS Big Sur 11.6.3 X64
Slower 0.43 macOS Big Sur 11.4 X64

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

Result Ratio Operating System Bit
Slower 0.45 Windows 11 X64
Slower 0.48 Windows 11 X64
Slower 0.46 Windows 10 X64
Slower 0.48 Windows 11 X64
Slower 0.45 Windows 10 X64
Slower 0.46 Windows 11 X64
Slower 0.43 Windows 11 X64
Slower 0.44 Windows 11 X64
Slower 0.51 Windows 11 X64
Slower 0.51 ubuntu 20.04 X64
Slower 0.48 ubuntu 18.04 X64
Slower 0.53 centos 7 X64
Slower 0.51 ubuntu 18.04 X64
Slower 0.48 alpine 3.13 X64
Slower 0.47 ubuntu 18.04 X64
Slower 0.46 ubuntu 20.04 X64
Slower 0.57 Windows 10 Arm64
Slower 0.45 Windows 10 X86
Slower 0.59 Windows 10 Arm
Slower 0.52 macOS Big Sur 11.6.3 X64
Slower 0.52 macOS Big Sur 11.4 X64

(cc @stephentoub )

Author: adamsitnik
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 11, 2022
@EgorBo
Copy link
Member

EgorBo commented Feb 11, 2022

Perf_Regex_Industry_RustLang_Sherlock should be slightly better in .NET 7.0 Preview 2 with #64872

But overall the benchmark hits the weakest spot in the new IndexOf(substring) algorithm - large text (0.5Mb) with low density of chars (upercased) 'S', 'H', 'I'. If we try to mitigate it we might lose significant wins for other cases, e.g.: #64381 (comment)

@stephentoub
Copy link
Member

But overall the benchmark hits the weakest spot in the new IndexOf(substring) algorithm - large text (0.5Mb) with low density of chars (upercased) 'S', 'H', 'I'.

While I might enjoy passing the blame to you for this one @EgorBo 😜 I don't think this is at all your fault.

First, for anyone reading, I'll call out that while this one specific test regressed, overall things have improved dramatically (and that's beyond all the previous improvements in .NET 5). Looking at @adamsitnik's cited report above, searching for Slower yields 474 results, Same yields 1,555 results, and Faster yields 2,318 results, so many more have improved than regressed. And some of those improvements are staggering, e.g.
432x faster:
image
150x faster:
image
48x faster:
image
Etc.

Regex is one of those areas where it's difficult to look at just one result and make claims about improvements or regressions, as many optimizations play a guessing game as to what scenarios are most common and optimize for those at the expense potentially of others (as is the case with many optimizations, something common improves significantly in exchange for something expected to be less common regressing a bit). In this particular case, the pattern "Sherlock|Holmes|Watson|Irene|Adler|John|Baker" suffers from a change that is typically beneficial but in this particular case is a step backwards. To handle finding a match, we want to skip ahead as quickly as possible to the next potential place that could possibly match. Previously, we would do that by looking at the first character of each branch (so 'S', 'H', 'W', 'I', 'A', 'J', and 'B'). This set is too large to be vectorized by IndexOfAny as it stands today, so we instead build a bitmap lookup and walk each character in the input looking for the first character that has a bit set. However, .NET 7 now recognizes that, although there are 7 branches, other positions have overlap, e.g. it sees that if it were to instead focus on the second letter in each branch rather than the first, that would yield a set 'h', 'o', 'a', 'r', 'd', which is only 5 characters, and IndexOfAny does vectorize when there are 5 inputs. So in .NET 7, rather than going character by character in the input, it instead does an IndexOfAny("adhor"). This is typically a good choice, because it's able to process 8 or 16 characters at a time rather than 1 at a time. But in this particular case, there are lots of those letters in the input, and so the vectorized loop ends up stopping more frequently to further examine whether it's a good place to try to do a full match, and that ends up costing us. Hence the regression. I knew this at the time I made the change, which improved other things as much and frequently more than this particular case. There's very likely tweaking that could be done in the heuristics to improve this, but it'd still be a guessing game around all the tradeoffs involved.

It'd be great if IndexOfAny could be improved further, as Regex now makes very heavy use of it. It'd also be really cool if someone could explore a .NET implementation of Teddy (#62447).

@joperezr
Copy link
Member

Thanks for the analysis and for the explanation for the specific case that was called out @stephentoub. Also, thanks @adamsitnik for running these tests and for the report.

Looks like at least for the 2 specific tests called out above, we probably would want to not fix those, since as @stephentoub points out, these were less common cases that got a bit regressed in order to provide a much larger win for more common cases.

@stephentoub is it feasible to plan to take a quick look to the biggest regressions of those 474 called out and ensuring that they are all cases of “optimizing for a more common case” as opposed to accidental regressions? Or are we confident enough that this is the case and that fir most accidental regression we already have a bug for logged? I mostly ask in order to know what to do with this issue.

@stephentoub
Copy link
Member

is it feasible to plan to take a quick look to the biggest regressions of those 474 called out and ensuring that they are all cases of “optimizing for a more common case” as opposed to accidental regressions?

Sure.

We can also probably invest in reducing Regex ctor overhead. We've driven down execution time in part by doing more analysis during construction, and we've considered that in general a worthwhile tradeoff, but we haven't spend much time seeing whether we could be doing the same analysis for less, or whether we might want to not do some of the analysis when using the interpreter, etc.

@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2022
@joperezr joperezr added this to the 7.0.0 milestone Jul 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants