-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64: Regressions in System.Text.RegularExpressions. #64381
Comments
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsRun Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock*' PayloadsHistogramSystem.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "The", Options: Compiled)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig*' PayloadsHistogramSystem.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "Twain", Options: None)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRun Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock*' PayloadsHistogramSystem.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "The", Options: Compiled)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig*' PayloadsHistogramSystem.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "Twain", Options: None)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
Introduced by #63285 |
cc: @EgorBo |
Looking at it now, might be |
@kunalspathak, is this only on ARM? I was actually expecting the cited benchmarks to get better, not worse. Do we see any differences on other platforms? |
We only triaged ARM issues today, usually it takes more days for auto-perf-filer to recognize regressions/improvements. We should wait till Tuesday's Perf-Triage (x64) |
I took a look at couple of regressions and it feels like it's the worst possible case for the new algorithm - huge input (594915 chars) and we search for a value that is never found there I'll check whether data alignment will help or will change the algorithm to always use UPD data alignment won't work, one of the vectors will almost always be unaligned |
Hmm. Isn't that part of the goal of this algorithm, to do well in this case? e.g. where there's frequently a 'T' in the input but rarely followed by 'n' four chars later? |
@stephentoub The old algorithm does stop 151 times and does SequenceEquals but it's not a lot for 0.5mb data. Unfortunately, we also can't just use the single-char |
So basically the new algorithm is almost always faster except very large data where the old one doesn't stop often. The best we can do is always use the old one for very large data, at least we won't regress anybody (but also won't benefit for cases where the old one will have to stop very often in that large data) |
In large data where the old algorithm wouldn't stop often, are there even then circumstances where the new algorithm would win - where the search pattern was long and ultimately did not match, perhaps? |
Why is the "very large" relevant here? isn't it really about the density of matches of the first element, regardless of length (above the vectorized threshold)? Also, some of the regressions involved characters that are presumably much more frequent, like 'S'? |
Yes, if the value is also large it gives the new algorithm an advantage.
You're right 👍
Well, there are 837 'S' symbols in that text (594915 char symbols) leading to 'S' per 711 chars on average = 88 LoadVectors for the new algorithm |
Btw, 'S' case is optimized with #64872
\Core_Root_PR\corerun.exe - is #64872 |
Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "The", Options: Compiled)
Description of detection logic
Description of detection logic
Description of detection logic
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "Twain", Options: None)
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: