System.Text.RegularExpressions performance regressions #65187

adamsitnik · 2022-02-11T12:00:35Z

Latest manual perf run (.NET 6 vs .NET 7 preview 1) has discovered multiple regressions in System.Text.RegularExpressions. Since most of them were not reported previously, I am opening the issue, so the owners of System.Text.RegularExpressions can triage them accordingly.

I've uploaded a full report here. It's sorted desc from biggest regression to biggest improvement.

The @performanceautofiler has not reported these regressions as they were most likely introduced before the benchmarks were added in November 2021 (historical data)

Sample repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count

Sample results:

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

Result	Ratio	Operating System	Bit
Slower	0.43	Windows 11	X64
Slower	0.45	Windows 11	X64
Slower	0.46	Windows 10	X64
Slower	0.42	Windows 11	X64
Slower	0.46	Windows 10	X64
Slower	0.42	Windows 11	X64
Slower	0.43	Windows 11	X64
Slower	0.44	Windows 11	X64
Slower	0.43	Windows 11	X64
Slower	0.43	ubuntu 20.04	X64
Slower	0.45	ubuntu 18.04	X64
Slower	0.40	centos 7	X64
Slower	0.43	ubuntu 18.04	X64
Slower	0.44	alpine 3.13	X64
Slower	0.43	ubuntu 18.04	X64
Slower	0.44	ubuntu 20.04	X64
Slower	0.47	Windows 10	Arm64
Slower	0.32	Windows 10	X86
Slower	0.53	Windows 10	Arm
Slower	0.43	macOS Big Sur 11.6.3	X64
Slower	0.43	macOS Big Sur 11.4	X64

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

Result	Ratio	Operating System	Bit
Slower	0.45	Windows 11	X64
Slower	0.48	Windows 11	X64
Slower	0.46	Windows 10	X64
Slower	0.48	Windows 11	X64
Slower	0.45	Windows 10	X64
Slower	0.46	Windows 11	X64
Slower	0.43	Windows 11	X64
Slower	0.44	Windows 11	X64
Slower	0.51	Windows 11	X64
Slower	0.51	ubuntu 20.04	X64
Slower	0.48	ubuntu 18.04	X64
Slower	0.53	centos 7	X64
Slower	0.51	ubuntu 18.04	X64
Slower	0.48	alpine 3.13	X64
Slower	0.47	ubuntu 18.04	X64
Slower	0.46	ubuntu 20.04	X64
Slower	0.57	Windows 10	Arm64
Slower	0.45	Windows 10	X86
Slower	0.59	Windows 10	Arm
Slower	0.52	macOS Big Sur 11.6.3	X64
Slower	0.52	macOS Big Sur 11.4	X64

(cc @stephentoub )

ghost · 2022-02-11T12:00:38Z

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Latest manual perf run (.NET 6 vs .NET 7 preview 1) has discovered multiple regressions in System.Text.RegularExpressions. Since most of them were not reported previously, I am opening the issue, so the owners of System.Text.RegularExpressions can triage them accordingly.

I've uploaded a full report here. It's sorted desc from biggest regression to biggest improvement.

The @performanceautofiler has not reported these regressions as they were most likely introduced before the benchmarks were added in November 2021 (historical data)

Sample repro:

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 net7.0 --filter System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count

Sample results:

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

Result	Ratio	Operating System	Bit
Slower	0.43	Windows 11	X64
Slower	0.45	Windows 11	X64
Slower	0.46	Windows 10	X64
Slower	0.42	Windows 11	X64
Slower	0.46	Windows 10	X64
Slower	0.42	Windows 11	X64
Slower	0.43	Windows 11	X64
Slower	0.44	Windows 11	X64
Slower	0.43	Windows 11	X64
Slower	0.43	ubuntu 20.04	X64
Slower	0.45	ubuntu 18.04	X64
Slower	0.40	centos 7	X64
Slower	0.43	ubuntu 18.04	X64
Slower	0.44	alpine 3.13	X64
Slower	0.43	ubuntu 18.04	X64
Slower	0.44	ubuntu 20.04	X64
Slower	0.47	Windows 10	Arm64
Slower	0.32	Windows 10	X86
Slower	0.53	Windows 10	Arm
Slower	0.43	macOS Big Sur 11.6.3	X64
Slower	0.43	macOS Big Sur 11.4	X64

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

Result	Ratio	Operating System	Bit
Slower	0.45	Windows 11	X64
Slower	0.48	Windows 11	X64
Slower	0.46	Windows 10	X64
Slower	0.48	Windows 11	X64
Slower	0.45	Windows 10	X64
Slower	0.46	Windows 11	X64
Slower	0.43	Windows 11	X64
Slower	0.44	Windows 11	X64
Slower	0.51	Windows 11	X64
Slower	0.51	ubuntu 20.04	X64
Slower	0.48	ubuntu 18.04	X64
Slower	0.53	centos 7	X64
Slower	0.51	ubuntu 18.04	X64
Slower	0.48	alpine 3.13	X64
Slower	0.47	ubuntu 18.04	X64
Slower	0.46	ubuntu 20.04	X64
Slower	0.57	Windows 10	Arm64
Slower	0.45	Windows 10	X86
Slower	0.59	Windows 10	Arm
Slower	0.52	macOS Big Sur 11.6.3	X64
Slower	0.52	macOS Big Sur 11.4	X64

(cc @stephentoub )

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Text.RegularExpressions`, `tenet-performance`
Milestone:	-

EgorBo · 2022-02-11T12:09:24Z

Perf_Regex_Industry_RustLang_Sherlock should be slightly better in .NET 7.0 Preview 2 with #64872

But overall the benchmark hits the weakest spot in the new IndexOf(substring) algorithm - large text (0.5Mb) with low density of chars (upercased) 'S', 'H', 'I'. If we try to mitigate it we might lose significant wins for other cases, e.g.: #64381 (comment)

stephentoub · 2022-02-11T14:49:22Z

But overall the benchmark hits the weakest spot in the new IndexOf(substring) algorithm - large text (0.5Mb) with low density of chars (upercased) 'S', 'H', 'I'.

While I might enjoy passing the blame to you for this one @EgorBo 😜 I don't think this is at all your fault.

First, for anyone reading, I'll call out that while this one specific test regressed, overall things have improved dramatically (and that's beyond all the previous improvements in .NET 5). Looking at @adamsitnik's cited report above, searching for Slower yields 474 results, Same yields 1,555 results, and Faster yields 2,318 results, so many more have improved than regressed. And some of those improvements are staggering, e.g.
432x faster:

150x faster:

48x faster:

Etc.

Regex is one of those areas where it's difficult to look at just one result and make claims about improvements or regressions, as many optimizations play a guessing game as to what scenarios are most common and optimize for those at the expense potentially of others (as is the case with many optimizations, something common improves significantly in exchange for something expected to be less common regressing a bit). In this particular case, the pattern "Sherlock|Holmes|Watson|Irene|Adler|John|Baker" suffers from a change that is typically beneficial but in this particular case is a step backwards. To handle finding a match, we want to skip ahead as quickly as possible to the next potential place that could possibly match. Previously, we would do that by looking at the first character of each branch (so 'S', 'H', 'W', 'I', 'A', 'J', and 'B'). This set is too large to be vectorized by IndexOfAny as it stands today, so we instead build a bitmap lookup and walk each character in the input looking for the first character that has a bit set. However, .NET 7 now recognizes that, although there are 7 branches, other positions have overlap, e.g. it sees that if it were to instead focus on the second letter in each branch rather than the first, that would yield a set 'h', 'o', 'a', 'r', 'd', which is only 5 characters, and IndexOfAny does vectorize when there are 5 inputs. So in .NET 7, rather than going character by character in the input, it instead does an IndexOfAny("adhor"). This is typically a good choice, because it's able to process 8 or 16 characters at a time rather than 1 at a time. But in this particular case, there are lots of those letters in the input, and so the vectorized loop ends up stopping more frequently to further examine whether it's a good place to try to do a full match, and that ends up costing us. Hence the regression. I knew this at the time I made the change, which improved other things as much and frequently more than this particular case. There's very likely tweaking that could be done in the heuristics to improve this, but it'd still be a guessing game around all the tradeoffs involved.

It'd be great if IndexOfAny could be improved further, as Regex now makes very heavy use of it. It'd also be really cool if someone could explore a .NET implementation of Teddy (#62447).

joperezr · 2022-02-11T15:10:05Z

Thanks for the analysis and for the explanation for the specific case that was called out @stephentoub. Also, thanks @adamsitnik for running these tests and for the report.

Looks like at least for the 2 specific tests called out above, we probably would want to not fix those, since as @stephentoub points out, these were less common cases that got a bit regressed in order to provide a much larger win for more common cases.

@stephentoub is it feasible to plan to take a quick look to the biggest regressions of those 474 called out and ensuring that they are all cases of “optimizing for a more common case” as opposed to accidental regressions? Or are we confident enough that this is the case and that fir most accidental regression we already have a bug for logged? I mostly ask in order to know what to do with this issue.

stephentoub · 2022-02-14T18:04:29Z

is it feasible to plan to take a quick look to the biggest regressions of those 474 called out and ensuring that they are all cases of “optimizing for a more common case” as opposed to accidental regressions?

Sure.

We can also probably invest in reducing Regex ctor overhead. We've driven down execution time in part by doing more analysis during construction, and we've considered that in general a worthwhile tradeoff, but we haven't spend much time seeing whether we could be doing the same analysis for less, or whether we might want to not do some of the analysis when using the interpreter, etc.

adamsitnik added area-System.Text.RegularExpressions tenet-performance Performance related issue labels Feb 11, 2022

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 11, 2022

adamsitnik mentioned this issue Feb 11, 2022

.NET 7.0 Preview 1 Microbenchmarks Performance Study Report #65200

Closed

13 tasks

joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2022

joperezr added this to the 7.0.0 milestone Jul 7, 2022

stephentoub closed this as completed Jul 12, 2022

ghost locked as resolved and limited conversation to collaborators Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System.Text.RegularExpressions performance regressions #65187

System.Text.RegularExpressions performance regressions #65187

adamsitnik commented Feb 11, 2022

ghost commented Feb 11, 2022

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

EgorBo commented Feb 11, 2022

stephentoub commented Feb 11, 2022

joperezr commented Feb 11, 2022

stephentoub commented Feb 14, 2022

System.Text.RegularExpressions performance regressions #65187

System.Text.RegularExpressions performance regressions #65187

Comments

adamsitnik commented Feb 11, 2022

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

ghost commented Feb 11, 2022

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock|Holmes|Watson|Irene|Adler|John|Baker", Options: Compiled)

System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: Compiled)

EgorBo commented Feb 11, 2022

stephentoub commented Feb 11, 2022

joperezr commented Feb 11, 2022

stephentoub commented Feb 14, 2022