-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT output for various empty string test methods extremely suboptimal #63201
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsDescriptionThere is no 'standard' way to test if a string is empty. I've seen I tested the various ways of performing said comparisons, including against ConfigurationTested x86, x64, .NET Framework, and Roslyn. DataThe test code was as follows: using System.Linq;
public static class C {
public static bool Equality0(string s) => s == "";
public static bool Equality1(string s) => s.Equals("");
public static bool Equality2(string s) => object.Equals(s, "");
public static bool Equality3(string s) => string.Equals(s, "");
public static bool Equality4(string s) => s == string.Empty;
public static bool Equality5(string s) => s.Equals(string.Empty);
public static bool Equality6(string s) => object.Equals(s, string.Empty);
public static bool Equality7(string s) => string.Equals(s, string.Empty);
public static bool Equality8(string s) => s.Length == 0;
public static bool Equality9(string s) => !s.Any();
public static bool NullEquality0(string s) => string.IsNullOrEmpty(s);
public static bool NullEquality1(string s) => (s?.Length ?? 0) == 0;
public static bool NullEquality2(string s) => (s is null) ? true : s.Length == 0;
public static bool NullEquality3(string s) => !(s is string t && t.Length != 0);
} AnalysisThe output of these was as follows, for the x64 JIT, with an analysis of ; Core CLR 6.0.21.52210 on amd64
C.Equality0(System.String)
; It first checks if the object and String.Empty are the same object
L0000: push rax
L0001: mov rax, 0x2ec8b763020 ; String.Empty
L000b: cmp rcx, [rax]
L000e: je short L0053
; If they are not, it then checks if the object is null
L0010: test rcx, rcx
L0013: je short L001b
; If it is not, it then checks if its length is 0
L0015: cmp dword ptr [rcx+8], 0
L0019: je short L0022
L001b: xor eax, eax
L001d: add rsp, 8
L0021: ret
; If the length _is_ 0, it then uses `SequenceEqual` to check if it has the same (non-existent) data as String.Empty
L0022: lea rdx, [rcx+0xc]
L0026: mov r8, 0x2ec8b763020 ; String.Empty
L0030: mov r8, [r8]
L0033: add r8, 0xc
L0037: mov [rsp], r8
L003b: mov ecx, [rcx+8]
L003e: add ecx, ecx
L0040: mov r8d, ecx
L0043: mov rcx, rdx
L0046: mov rdx, [rsp]
L004a: add rsp, 8
L004e: jmp 0x00007ffae0dfafa0 ; System.SpanHelpers.SequenceEqual(Byte ByRef, Byte ByRef, UIntPtr)
L0053: mov eax, 1
L0058: jmp short L001d
C.Equality1(System.String)
; I'm not sure what this `cmp` is for, its result is clobbered immediately
L0000: cmp [rcx], ecx
; It checks if the object and String.Empty are the same object (inefficiently)
L0002: mov r8, 0x2ec8b763020 ; String.Empty
L000c: mov rdx, [r8]
L000f: cmp rcx, rdx
L0012: je short L0035
; If it is not, it then checks if its length is 0
L0014: cmp dword ptr [rcx+8], 0
L0018: je short L001e
L001a: xor eax, eax
L001c: jmp short L003a
; If the length _is_ 0, it then uses `SequenceEqual` to check if it has the same (non-existent) data as String.Empty
L001e: lea rax, [rcx+0xc]
L0022: add rdx, 0xc
L0026: mov r8d, [rcx+8]
L002a: add r8d, r8d
L002d: mov rcx, rax
L0030: jmp 0x00007ffae0dfafa0 ; System.SpanHelpers.SequenceEqual(Byte ByRef, Byte ByRef, UIntPtr)
L0035: mov eax, 1
L003a: ret
; largely the same
C.Equality2(System.String)
L0000: mov rax, 0x2ec8b763020 ; String.Empty
L000a: mov rdx, [rax]
L000d: cmp rcx, rdx
L0010: jne short L0019
L0012: mov eax, 1
L0017: jmp short L002d
L0019: test rcx, rcx
L001c: je short L0045
L001e: cmp rcx, rdx
L0021: je short L0049
L0023: mov eax, [rcx+8]
L0026: cmp eax, [rdx+8]
L0029: je short L002e
L002b: xor eax, eax
L002d: ret
L002e: lea rax, [rcx+0xc]
L0032: add rdx, 0xc
L0036: mov r8d, [rcx+8]
L003a: add r8d, r8d
L003d: mov rcx, rax
L0040: jmp 0x00007ffae0dfafa0 ; System.SpanHelpers.SequenceEqual(Byte ByRef, Byte ByRef, UIntPtr)
L0045: xor eax, eax
L0047: jmp short L002d
L0049: mov eax, 1
L004e: jmp short L002d
; largely the same
C.Equality3(System.String)
L0000: push rax
L0001: mov rax, 0x2ec8b763020 ; String.Empty
L000b: cmp rcx, [rax]
L000e: je short L0053
L0010: test rcx, rcx
L0013: je short L001b
L0015: cmp dword ptr [rcx+8], 0
L0019: je short L0022
L001b: xor eax, eax
L001d: add rsp, 8
L0021: ret
L0022: lea rdx, [rcx+0xc]
L0026: mov r8, 0x2ec8b763020 ; String.Empty
L0030: mov r8, [r8]
L0033: add r8, 0xc
L0037: mov [rsp], r8
L003b: mov ecx, [rcx+8]
L003e: add ecx, ecx
L0040: mov r8d, ecx
L0043: mov rcx, rdx
L0046: mov rdx, [rsp]
L004a: add rsp, 8
L004e: jmp 0x00007ffae0dfafa0 ; System.SpanHelpers.SequenceEqual(Byte ByRef, Byte ByRef, UIntPtr)
L0053: mov eax, 1
L0058: jmp short L001d
; Forwards to `System.String.Equals`
C.Equality4(System.String)
L0000: mov rdx, 0x2ec8b763020 ; String.Empty
L000a: mov rdx, [rdx]
L000d: jmp 0x00007ffae0df89f0 ; call System.String.Equals(System.String, System.String)
C.Equality5(System.String)
L0000: mov rdx, 0x2ec8b763020 ; String.Empty
L000a: mov rdx, [rdx]
L000d: mov rax, [0x7ffae0e9d770] ; I'm not sure
L0014: cmp [rcx], ecx
L0016: jmp rax
C.Equality6(System.String)
; Checks if they're literally the same object
L0000: mov rdx, 0x2ec8b763020 ; String.Empty
L000a: mov rdx, [rdx]
L000d: cmp rcx, rdx
L0010: jne short L0019
L0012: mov eax, 1
L0017: jmp short L002a
; null test
L0019: test rcx, rcx
L001c: je short L0028
; I'm guessing a jump to Object.Equals?
L001e: mov rax, [0x7ffae0e9d750]
L0025: jmp rax
L0028: xor eax, eax
L002a: ret
; Forwards to `System.String.Equals`
C.Equality7(System.String)
L0000: mov rdx, 0x2ec8b763020 ; String.Empty
L000a: mov rdx, [rdx]
L000d: jmp 0x00007ffae0df89f0 ; System.String.Equals(System.String, System.String)
; Just returns if length was zero; how I would expect _each_ of these to work.
C.Equality8(System.String)
L0000: cmp dword ptr [rcx+8], 0
L0004: sete al
L0007: movzx eax, al
L000a: ret
; Makes a call to `Linq.Enumerable.Any`, and returns 1 if it's zero, otherwise it returns 0.
C.Equality9(System.String)
L0000: sub rsp, 0x28
L0004: call 0x00007ffae62b6178 ; System.Linq.Enumerable.Any[[System.Char, System.Private.CoreLib]](System.Collections.Generic.IEnumerable`1<Char>)
L0009: test eax, eax
L000b: sete al
L000e: movzx eax, al
L0011: add rsp, 0x28
L0015: ret
; Checks if it's null, then checks the length for zero. Fairly optimal.
C.NullEquality0(System.String)
L0000: test rcx, rcx
L0003: je short L000e
L0005: cmp dword ptr [rcx+8], 0
L0009: je short L000e
L000b: xor eax, eax
L000d: ret
L000e: mov eax, 1
L0013: jmp short L000d ; I am unsure why this is a `jmp` to a `ret` instead of a `ret`.
; Very similar to NullEquality0
C.NullEquality1(System.String)
L0000: test rcx, rcx
L0003: jne short L0009
L0005: xor eax, eax
L0007: jmp short L000c
L0009: mov eax, [rcx+8]
L000c: test eax, eax
L000e: sete al
L0011: movzx eax, al
L0014: ret
; Very similar to NullEquality0
C.NullEquality2(System.String)
L0000: test rcx, rcx
L0003: je short L0010
L0005: cmp dword ptr [rcx+8], 0
L0009: sete al
L000c: movzx eax, al
L000f: ret
L0010: mov eax, 1
L0015: ret
; Very similar to NullEquality0
C.NullEquality3(System.String)
L0000: test rcx, rcx
L0003: je short L0010
L0005: cmp dword ptr [rcx+8], 0
L0009: sete al
L000c: movzx eax, al
L000f: ret
L0010: mov eax, 1
L0015: ret Basically, for any string test that isn't just For such common patterns which generates fairly predictable IL, I would expect those to be hardcoded in some fashion. There also appears to be a codegen pattern where it really likes generating public static bool EQ0Test(string s) {
s = "";
return Equality0(s);
} resulted in: C.EQ0Test(System.String)
L0000: jmp short L0003
L0002: ret
L0003: mov eax, 1
L0008: jmp short L0002 Whereas what I would expect to generate would be: C.EQ0Test(System.String)
L0000: mov eax, 1
L0005: ret It instead immediately jumps to the
|
Duplicates #41779 (I'll try to file a PR out of my prototype in #41779 (comment))
Thanks, this one is interesting, sharplab: https://sharplab.io/#v2:C4LghgzgtgPgAgBgARwIwDoAyBLAdgRwG4BYAKDLgGYVUA2FAJiQGEkBvMpLlategIwD2ggDZIAovgCuYEdmABPBAAo0yCAEokAXgB8SCDu1IARCZKluSTtyo0BwseICKCACoBTCMFWp1Gmy4OUgBIK0NjMwsw2wB2CWlZeSVlTQsrAF8yDKA=== UPD ^ is fixed in .NET 7.0 |
I was actually surprised that Since all empty strings are represented by the same object, why can't the jit use this information? |
That feels like rather a pretty fragile statement. @jkotas can we count on it? |
Yes, it feels rather fragile. I do not think it is a good idea to use this as an invariant in the JIT. |
Here's another couple patterns that also result in terrible codegen, in case it's useful 😄 static bool NullEquality1(string s) => s is not (null or "");
static bool NullEquality2(string s) => s is not null and not ""; L0000: push rsi
L0001: sub rsp, 0x30
L0005: test rcx, rcx
L0008: je short L002f
L000a: mov rax, 0x2ec8b763020
L0014: cmp rcx, [rax]
L0017: je short L006c
L0019: cmp dword ptr [rcx+8], 0
L001d: je short L0037
L001f: xor esi, esi
L0021: test esi, esi
L0023: sete al
L0026: movzx eax, al
L0029: add rsp, 0x30
L002d: pop rsi
L002e: ret
L002f: xor eax, eax
L0031: add rsp, 0x30
L0035: pop rsi
L0036: ret
L0037: lea rdx, [rcx+0xc]
L003b: mov r8, 0x2ec8b763020
L0045: mov r8, [r8]
L0048: add r8, 0xc
L004c: mov [rsp+0x28], r8
L0051: mov ecx, [rcx+8]
L0054: add ecx, ecx
L0056: mov r8d, ecx
L0059: mov rcx, rdx
L005c: mov rdx, [rsp+0x28]
L0061: call 0x00007ffae0dfafa0
L0066: movzx esi, al
L006a: jmp short L0021
L006c: mov esi, 1
L0071: jmp short L0021 |
I think we can close this one, for majority of cases it's already optimized to just There are a few cases which might need additional work like |
Description
There is no 'standard' way to test if a string is empty. I've seen
str == ""
,str == string.Empty
,str.Equals("")
,str.Length == 0
, and so on.I tested the various ways of performing said comparisons, including against
null
and empty, in SharpLab, and was rather aghast at the results.Configuration
Tested x86, x64, .NET Framework, and Roslyn.
Data
SharpLab link to test.
The test code was as follows:
Analysis
The output of these was as follows, for the x64 JIT, with an analysis of
Equality0
:Basically, for any string test that isn't just
s.Length == 0
, it generates rather atrocious code.s == ""
is one of the most common ways to check a string for emptiness, and it generates the worst code that in the case where it is empty, takes the slowest-possible path.For such common patterns which generates fairly predictable IL, I would expect those to be hardcoded in some fashion.
There also appears to be a codegen pattern where it really likes generating
jmp
instructions that go directly toret
instructions, instead of justret
ing. I was able to come up with code that made this obvious:resulted in:
Whereas what I would expect to generate would be:
It instead immediately jumps to the
mov
, and then jumps to theret
.The text was updated successfully, but these errors were encountered: