[Perf -26%] System.Tests.Perf_String (4) #37816

DrewScoggins · 2020-06-12T18:41:06Z

Run Information

Architecture	x64
OS	Windows 10.0.18362
Changes	diff

Regressions in System.Tests.Perf_String

Benchmark	Baseline	Test	Test/Base	Baseline Outlier
ToLower	19.36 ns	22.83 ns	1.18	False
IndexerCheckBoundCheckHoist	32.35 ns	47.41 ns	1.47	False
ToUpper	19.69 ns	22.59 ns	1.15	False
Replace_Char	12.59 ns	15.80 ns	1.25	False

Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_String*';

Histogram

System.Tests.Perf_String.ToLower(s: "TEST")

[17.383 ; 18.296) | @@
[18.296 ; 19.756) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[19.756 ; 21.029) | @@@@@
[21.029 ; 22.525) | 
[22.525 ; 23.985) | @@@@@@@@@@@@@@@@@
[23.985 ; 25.194) | @
[25.194 ; 25.850) | 
[25.850 ; 27.310) | @

System.Tests.Perf_String.IndexerCheckBoundCheckHoist

[30.905 ; 35.815) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[35.815 ; 40.085) | 
[40.085 ; 45.827) | @@@@
[45.827 ; 50.738) | @@@@@@@@@@@@@@@@@@@@@@@@@

System.Tests.Perf_String.ToUpper(s: "TeSt")

[17.170 ; 18.265) | @@@
[18.265 ; 19.772) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[19.772 ; 21.112) | @@@@
[21.112 ; 22.508) | 
[22.508 ; 24.015) | @@@@@@@@@@@@@@@@@
[24.015 ; 25.273) | @
[25.273 ; 26.343) | 
[26.343 ; 27.850) | @

System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')

[11.532 ; 13.074) | @@@@@@@@@@@@@@@@
[13.074 ; 14.615) | 
[14.615 ; 16.631) | @@@@@@@@@@@@@@@@@
[16.631 ; 18.420) | @@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

DrewScoggins · 2020-06-12T18:41:15Z

@adamsitnik

danmoseley · 2020-06-12T19:26:10Z

@DrewScoggins is there some way to close the gap in the graph? Are we missing all perf data from Dec-April or were just these tests broken?

Ideally we would see which build it regressed in.

DrewScoggins · 2020-06-13T00:14:34Z

The tests are not broken. We are also not missing any data. In general, for comparisons that span many months we try not to put every point in the graph. I will regenerate the graphs that do not show the actual move and update the issues accordingly.

danmoseley · 2020-06-30T16:47:27Z

Thanks @DrewScoggins. Looks like you did this. Can you clarify the Legend on the graph? The purple and orange lines have build numbers, but I think I would expect the build number to be changing every day. My mental picture is there's a single line, with a datapoint every day, is that wrong?

danmoseley · 2020-06-30T16:48:19Z

In the case of scenarios like this that are likely affected by the ICU change, a superimposed Linux measurement would likely be interesting - for example, we might see that Windows now matches Linux. (Whether we're OK with that is a separate issue)

DrewScoggins · 2020-07-01T22:57:14Z

Yeah, I should probably change the legend as it doesn't make sense here outside the context of the full report. You are correct in thinking that these are about one build per day. Also yeah, I can take a look at building out graphs that have the Windows vs Linux comparison so we can see how we look compared to that as well. I will post those in the relevant issues.

danmoseley · 2020-07-01T23:24:40Z

One other feedback, it would be good in these issue to clarify what the diff link represents. In the graphs above, it seems clear that there are 3 distinct regressions across these scenarios -- Dec, early Jan, and early March. The single diff link provided above covers 7 months , and it's out of the dotnet/installer repo. It's so huge that I get a Unicorn trying to view it. What might be ideal is 3 diff links, each of ~1 week or less out of the dotnet/runtime repo.

cc @adamsitnik @tannergooding since I know you guys are collectively tuning these issues.

danmoseley · 2020-08-12T23:36:26Z

@tarekgh @safern are you tracking this as part of ICU perf remediation work? It would be good to know how we stand with respect to NLS-ICU perf gap. I know @tarekgh you mentioned an effort to cache casing table.

tarekgh · 2020-08-13T00:15:15Z

@danmosemsft I am focusing on the ordinal ignore casing (the high priority issue). This issue is not ordinal. I cannot cache all possible casing for all possible cultures. The best we can do here is to look if there is a way to optimize when calling ICU.

adamsitnik · 2020-08-26T07:19:52Z

FWIW The Replace_Char benchmark regressed for most of the configs:

System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')

Conclusion	Base	Diff	Base/Diff	Modality	Operating System	Bit	Processor Name	Base Runtime	Diff Runtime
Slower	26.14	28.24	0.93		Windows 10.0.18363.1016	Arm	Microsoft SQ1 3.0 GHz	.NET Core 3.1.6	5.0.100-rc.1.20413.9
Slower	12.01	14.30	0.84		Windows 10.0.18363.959	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20404.3
Slower	12.34	15.04	0.82		Windows 10.0.18363.959	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20418.3
Slower	14.66	17.56	0.83		Windows 10.0.19041.450	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20413.9
Same	13.48	14.02	0.96		Windows 10.0.19041.450	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	.NET Core 3.1.6	5.0.100-rc.1.20419.9
Slower	11.52	13.17	0.87		Windows 10.0.19042	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	5.0.100-rc.1.20418.3
Same	14.34	15.29	0.94	bimodal	Windows 10.0.19041.450	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	.NET Core 3.1.6	5.0.100-rc.1.20419.14
Slower	16.14	17.13	0.94		Windows 10.0.18363.959	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20420.14
Slower	18.58	21.28	0.87		Windows 10.0.19041.450	X86	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20419.5
Slower	35.09	53.71	0.65	several?	ubuntu 18.04	Arm64	Unknown processor	.NET Core 3.1.6	6.0.100-alpha.1.20421.6
Slower	14.77	17.57	0.84		ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	.NET Core 3.1.6	5.0.100-rc.1.20403.23
Slower	16.83	21.49	0.78		macOS Mojave 10.14.5	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	.NET Core 3.1.6	5.0.100-rc.1.20404.2
Slower	11.55	21.53	0.54		alpine 3.11	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	6.0.100-alpha.1.20421.6
Slower	16.97	18.69	0.91		ubuntu 18.04	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	.NET Core 3.1.6	5.0.100-rc.1.20418.3

jeffhandley · 2020-09-02T18:51:40Z

@tarekgh Do you want to leave this open for 5.0.0 or should we move to 6.0.0/Future?

tarekgh · 2020-09-02T19:26:13Z

@jeffhandley please move them to 6.0 for now and we can bring it back if we'll address it soon.

Gnbrkm41 · 2020-09-04T15:59:07Z

I suspect that #279 impacted the performance for this particular case - the minimum size of the string for entering the vectorised path is 8 chars (SSE2/NEON) or 16 chars (if AVX2 is available), and the additional checks / etc may have introduced additional operations that regresses the path for short strings.

Do we believe this change is worth it? While it definitely does benefit long strings, it might be useful to collect informations on the length of the string that this method usually gets called with, and see if the majority of the strings are short enough to not enter the vectorised path.

Another thing to consider might be whether we think the difference in actual nanoseconds are significant enough, since while those added initial set-up overheads are likely to overwhelm the difference in percentages it might be not that much of the time lost. I'm seeing regressions of between 0.6ns best case and 18ns worst case (assuming those are in nanoseconds).

I think this is something that's going to show up fairly frequently with vectorisations of variable-length data in general. For example, we had the same issue with BitArray: #37813

tannergooding · 2020-09-04T16:26:09Z

I wonder if there is something we could do for vectorized code paths that would allow the initial size check to be optimized away.

That is, much of the code is in a pattern such as:

if (Avx2.IsSupported && (data.length >= 32))
{
    VectorizedPath();
}
else
{
    SoftwareFallback();
}

If data.length is a known constant, then inlining the method would allow it to be a direct call to SoftwareFallback or to VectorizedPath which would lessen the impact that vectorization has on short inputs.
However, a non-constant length may make it undesirable to inline due to increased bloat....

C++ allows you to specialize for these cases using templates, but we don't have an equivalent in .NET today and so you can't really say "Do this if the input is 4 bytes and this if the input is greater than 128 bytes"

tarekgh · 2020-09-04T17:10:07Z

@Gnbrkm41 it may better we confirm first #279 is the culprit and then we can try to think how we fix it. Is this something you can help confirming first?

Gnbrkm41 · 2020-09-04T17:20:16Z

@tarekgh, Looks like I didn't forget to include a benchmark in the original PR fortunately so yes: dotnet/coreclr#27798

Specifically, dotnet/coreclr#27798 (comment) is the relevant part. Quoting the comment from below - you can see the exact same case regressing on my machine which has i7-8700. Not exactly sure which version of Windows I used at the time but it was one of the 10 Insider builds.

C:\Users\gotos\source\repos\performance\src\tools\ResultsComparer>dotnet run --base C:\Users\gotos\desktop\replace-before --diff C:\users\gotos\desktop\replace-after --threshold 5%
summary:
better: 4, geomean: 2.626
worse: 1, geomean: 1.204
total diff: 5

Slower	diff/base	Base Median (ns)	Diff Median (ns)	Modality
`"Hello".Replace('l', '!')`	1.20	10.49	12.63

Faster	base/diff	Base Median (ns)	Diff Median (ns)	Modality
`"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut id tempor metus.".Replace('z', 'y')`	5.19	46.50	8.97
`"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut id tempor metus.".Replace('e', 'E')`	2.79	66.68	23.93
`"This is a very nice sentence".Replace('z', 'y')`	2.21	14.65	6.62
`"This is a very nice sentence".Replace('i', 'I')`	1.49	28.97	19.47	several?

(Also ran "Hello".Replace('a', 'b'), but there was no difference at 5% threshold)

tarekgh · 2020-09-04T17:34:38Z

Thanks @Gnbrkm41 for the details.

Do we have some idea about the min string length that we start see the perf improvement with?

@tannergooding looks data.Length looks is not constant I guess which means inlining the method wouldn't help according to your comment. is there any other idea can be tried?

CC @GrabYourPitchforks @jkotas

jkotas · 2020-09-04T18:09:22Z

This method allocates new strings. It makes it unlikely to be used in the most performance critical scenarios. It is nice that it is vectorized and significantly faster on long strings now, but I do not think it is worth it to spend time on this or making the code more complex to match the earlier performance for small strings.

tarekgh · 2020-09-04T18:16:41Z

sounds good. I am closing this one and we can revisit if the issue raised in any customer scenario.

DrewScoggins added the tenet-performance-benchmarks Issue from performance benchmark label Jun 12, 2020

Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jun 12, 2020

danmoseley mentioned this issue Jul 2, 2020

Regression in System.Tests.Perf_String.ToLowerInvariant #31453

Closed

DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark and removed tenet-performance-benchmarks Issue from performance benchmark labels Jul 7, 2020

joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020

joperezr added this to the 5.0.0 milestone Jul 7, 2020

DrewScoggins added os-windows arch-x64 labels Jul 10, 2020

jeffhandley modified the milestones: 5.0.0, 6.0.0 Sep 2, 2020

adamsitnik mentioned this issue Sep 4, 2020

.NET 5.0 Microbenchmarks Performance Study Report #41871

Closed

21 tasks

tarekgh closed this as completed Sep 4, 2020

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf -26%] System.Tests.Perf_String (4) #37816

[Perf -26%] System.Tests.Perf_String (4) #37816

DrewScoggins commented Jun 12, 2020 •

edited

Loading

Histogram

System.Tests.Perf_String.ToLower(s: "TEST")

System.Tests.Perf_String.IndexerCheckBoundCheckHoist

System.Tests.Perf_String.ToUpper(s: "TeSt")

System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')

Docs

DrewScoggins commented Jun 12, 2020

danmoseley commented Jun 12, 2020

DrewScoggins commented Jun 13, 2020

danmoseley commented Jun 30, 2020

danmoseley commented Jun 30, 2020

DrewScoggins commented Jul 1, 2020

danmoseley commented Jul 1, 2020

danmoseley commented Aug 12, 2020

tarekgh commented Aug 13, 2020

adamsitnik commented Aug 26, 2020

jeffhandley commented Sep 2, 2020

tarekgh commented Sep 2, 2020

Gnbrkm41 commented Sep 4, 2020 •

edited

Loading

tannergooding commented Sep 4, 2020

tarekgh commented Sep 4, 2020

Gnbrkm41 commented Sep 4, 2020

tarekgh commented Sep 4, 2020

jkotas commented Sep 4, 2020

tarekgh commented Sep 4, 2020

[Perf -26%] System.Tests.Perf_String (4) #37816

[Perf -26%] System.Tests.Perf_String (4) #37816

Comments

DrewScoggins commented Jun 12, 2020 • edited Loading

Run Information

Regressions in System.Tests.Perf_String

Repro

Histogram

System.Tests.Perf_String.ToLower(s: "TEST")

System.Tests.Perf_String.IndexerCheckBoundCheckHoist

System.Tests.Perf_String.ToUpper(s: "TeSt")

System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')

Docs

DrewScoggins commented Jun 12, 2020

danmoseley commented Jun 12, 2020

DrewScoggins commented Jun 13, 2020

danmoseley commented Jun 30, 2020

danmoseley commented Jun 30, 2020

DrewScoggins commented Jul 1, 2020

danmoseley commented Jul 1, 2020

danmoseley commented Aug 12, 2020

tarekgh commented Aug 13, 2020

adamsitnik commented Aug 26, 2020

System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')

jeffhandley commented Sep 2, 2020

tarekgh commented Sep 2, 2020

Gnbrkm41 commented Sep 4, 2020 • edited Loading

tannergooding commented Sep 4, 2020

tarekgh commented Sep 4, 2020

Gnbrkm41 commented Sep 4, 2020

tarekgh commented Sep 4, 2020

jkotas commented Sep 4, 2020

tarekgh commented Sep 4, 2020

DrewScoggins commented Jun 12, 2020 •

edited

Loading

Gnbrkm41 commented Sep 4, 2020 •

edited

Loading