-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Unroll Equals and StartsWith for constant strings and spans. [0..32] chars #65288
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis PR unrolls This PR handles the following APIs: String.Equals(string s1, string s2)
String.Equals(string s1, string s2, StringComparison.Ordinal)
strObj.Equals(string s1)
strObj.Equals(string s1, StringComparison.Ordinal)
strObj.StartsWith(string s1, StringComparison.Ordinal)
MemoryExtensions.SequenceEqual<char>(ROS<char> s1, ROS<char> s2)
MemoryExtensions.Equals(ROS<char> s1, ROS<char> s2, StringComparison.Ordinal)
MemoryExtensions.StartsWith(ROS<char> s1, ROS<char> s2, StringComparison.Ordinal) It unrolls & vectorizes them when either s1 or s2 are constants for [0..32] utf16 chars range using SWAR, SSE, AVX or AdvSimd (arm64). The existing code can be re-used to handle:
Motivation:I've seen quite a few Also, #65222 relies on this PR. Example:bool Test1(string s) => s.StartsWith("https://", StringComparison.Ordinal);
bool Test2(string s) => s == "Access-Control-Allow-Credentials"; New codegen: ; Method Tests:Test1(System.String):bool
vzeroupper
cmp dword ptr [rcx+8], 8 ;; <-- Length >= 8
jne SHORT G_M46665_IG04
vmovdqu xmm0, xmmword ptr [rcx+12]
vpxor xmm0, xmm0, xmmword ptr [reloc @RWD00]
vptest xmm0, xmm0
sete al
movzx rax, al
jmp SHORT G_M46665_IG05 ;; we end up doing 'movzx rax, al' twice :(
G_M46665_IG04:
xor eax, eax
G_M46665_IG05:
movzx rax, al
ret
RWD00 dq 0070007400740068h, 002F002F003A0073h
; Total bytes of code: 41
; Method Tests:Test2(System.String):bool
vzeroupper
test rcx, rcx ;; <-- is null?
je SHORT G_M48330_IG06
cmp dword ptr [rcx+8], 32 ;; <-- Length == 32
jne SHORT G_M48330_IG05
vmovdqu ymm0, ymmword ptr[rcx+12]
vpxor ymm0, ymm0, ymmword ptr[reloc @RWD00]
vmovdqu ymm1, ymmword ptr[rcx+44] ;; <- btw, the "pipeline" issue
vpxor ymm1, ymm1, ymmword ptr[reloc @RWD32]
vpor ymm0, ymm0, ymm1
vptest ymm0, ymm0
sete al
movzx rax, al
jmp SHORT G_M48330_IG07
G_M48330_IG05:
xor eax, eax
jmp SHORT G_M48330_IG07
G_M48330_IG06:
xor eax, eax
G_M48330_IG07:
movzx rax, al
vzeroupper
ret
RWD00 dq 0065006300630041h, 0043002D00730073h, 00720074006E006Fh, 0041002D006C006Fh
RWD32 dq 0077006F006C006Ch, 006500720043002Dh, 0074006E00650064h, 0073006C00610069h
; Total bytes of code: 70 There are minor block layout issues but they're unrelated. The algorithm has a pretty simple heuristic (budget) to avoid doing too many unrolls in a single method. Simple benchmark[Benchmark]
[Arguments("https://google.com")]
public bool StartsWith(string s) => s.StartsWith("https://", StringComparison.Ordinal);
NOTE: PS: Spans aren't enabled in the current commit - there is a small issue with them I am trying to resolve.
|
public static int Main() | ||
{ | ||
int testCount = 0; | ||
foreach (var method in typeof(Tests).GetMethods()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way (that avoids reflection and repeating Equals_# methods) could be by marking ValidateEquals
method public
and dup the string instances:
int testCount = 0;
foreach (string testStr in Tests.s_TestData)
{
testCount++;
string newStr = new string(testStr.ToCharArray());
Tests.ValidateEquals(testStr == newStr, testStr, newStr);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to use SG/T4 generation? Because I do need explicit literals in methods to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit literals
Ah, ok. 👍
(SG/T4 would pull in more infra code here than necessary. I was thinking if intention is to test other instance of same string, then we can clone it via char[] overload and reduce LOC)
I assume this is the heuristic? |
Remember that the "take advantage of the fact that strings are null-terminated" trick won't work with spans. Spans aren't guaranteed null-terminated. Edit: there are a few other patterns that can be optimized as well. For example, |
There are methods like this in which Roslyn ends up emitting hundreds of str == "...:" and it's not clear what to do with them. Anyway, maybe the limit is not worth it at all, I'll check the diffs.
Yeah I don't rely on it at all here - it's left up-for-grabs 🙂 Even Length = 3 case is handled via two Int32 loads.
That one is already optimized in Main via |
src/libraries/System.Private.CoreLib/src/System/StringComparison.cs
Outdated
Show resolved
Hide resolved
Neat! That'll teach me to look at main sources rather than 6.0 sources. 😁 |
I think I've added everything I wanted to add, it now handles Span too, full list of APIs: // 1) String.Equals(string, string) // which means `str == "..."` too
// 2) String.Equals(string, string, StringComparison.Ordinal)
// 3) str.Equals(string)
// 4) str.Equals(String, StringComparison.Ordinal)
// 5) str.StartsWith(string, StringComparison.Ordinal)
// 6) MemoryExtensions.SequenceEqual<char>(ROS<char>, ROS<char>)
// 7) MemoryExtensions.Equals(ROS<char>, ROS<char>, StringComparison.Ordinal)
// 8) MemoryExtensions.StartsWith<char>(ROS<char>, ROS<char>)
// 9) MemoryExtensions.StartsWith(ROS<char>, ROS<char>, StringComparison.Ordinal) all of them work when one of the arguments (left or right) is a string literal or I do plan to add This PR needs more tests for all kinds of cases and an additional test for cross-page boundary (just in case) and I need to investigate the diffs (SPMI ones are empty because of lack of needed context - |
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
/azp run runtime-coreclr outerloop, runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 2 pipeline(s). |
Improvement on alpine-x64 dotnet/perf-autofiling-issues#3978 |
This PR unrolls
Equals
/StartsWith
for "half" constant strings and spans in [0..32] chars length.It does pretty much the same job as #64821 but in JIT this time to avoid problems listed there (e.g. low inliner's budget and throughput penalty). I decided to create a new PR to keep that as a reference ("what can be done in pure C#").
This PR handles the following APIs:
It unrolls & vectorizes them when either s1 or s2 are constants for [0..32] utf16 chars range using SWAR, SSE, AVX or AdvSimd (arm64).
The existing code can be re-used to handle:
Span.CopyTo
OrdinalIgnoreCase
for any case this PR already handles for just OrdinalEndsWith
Motivation:
I've seen quite a few
SequenceEqual
against constant data in high-perf scenarios e.g. "is input one of the well-known headers?" or e.g. TechEmpower: https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/CSharp/aspnetcore/PlatformBenchmarks/BenchmarkApplication.cs#L98 - mostly on UTF8 inputs (I plan to support those eventually), but UTF16 are also not rare.Also, #65222 relies on this PR (it introduced a couple of regressions in order to make code cleaner and BE-friendly and expects this PR to fix them)
Example:
New codegen:
There are minor block layout issues but they're unrelated. The algorithm has a pretty simple heuristic (budget) to avoid doing too many unrolls in a single method.
Simple benchmark