JIT perf: Span.Slice(0, ...) emits unnecessary 64-bit extensions #69200

GrabYourPitchforks · 2022-05-11T18:10:31Z

When calling Span<T>.Slice(offset, length), the internal implementation 64-bit extends offset and length, then compares their sum against span.Length. This is an optimization that allows us to get away with a single comparison.

public class C {
    public static void M(ReadOnlySpan<byte> span, int offset, int length) {
        DoIt(span.Slice(offset, length));
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void DoIt(ReadOnlySpan<byte> span)
    {
    }
}

C.M(System.ReadOnlySpan`1<Byte>, Int32, Int32)
    L0000: sub rsp, 0x38
    L0004: xor eax, eax
    L0006: mov [rsp+0x28], rax
    L000b: mov eax, edx  ; rax := 64-bit zero-extension of offset
    L000d: mov edx, r8d  ; rdx := 64-bit zero-extension of length 
    L0010: add rdx, rax  ; rdx := sum of offset and length
    L0013: mov r9d, [rcx+8]  ; r9 := 64-bit zero-extension of span.Length
    L0017: cmp rdx, r9  ; bounds check
    L001a: ja short L003c
    L001c: mov rcx, [rcx]
    L001f: add rcx, rax
    L0022: mov [rsp+0x28], rcx
    L0027: mov [rsp+0x30], r8d
    L002c: lea rcx, [rsp+0x28]
    L0031: call 0x00007ffbf1df0020
    L0036: nop
    L0037: add rsp, 0x38
    L003b: ret
    L003c: call 0x00007ffbe86e5778
    L0041: int3

However, when start = 0 is a known constant, JIT still emits the 64-bit extension before performing the comparison. This is unnecessary because a comparison of two zero-extended values will yield the same result as a comparison of the original values. It also burns a register.

    public static void M(ReadOnlySpan<byte> span, int length) {
        DoIt(span.Slice(0, length));
    }

C.M(System.ReadOnlySpan`1<Byte>, Int32)
    L0000: sub rsp, 0x38
    L0004: xor eax, eax
    L0006: mov [rsp+0x28], rax
    L000b: mov eax, edx  ; rax := unnecessary 64-bit zero-extension of length
    L000d: mov r8d, [rcx+8]  ; r8 := unnecessary 64-bit zero-extension of span.Length
    L0011: cmp rax, r8  ; bounds check
    L0014: ja short L0032
    L0016: mov rcx, [rcx]
    L0019: mov [rsp+0x28], rcx
    L001e: mov [rsp+0x30], edx
    L0022: lea rcx, [rsp+0x28]
    L0027: call 0x00007ffbf1e10020
    L002c: nop
    L002d: add rsp, 0x38
    L0031: ret
    L0032: call 0x00007ffbe86e5778
    L0037: int3

In the above example, I would have expected the three annotated instructions to be collapsed into the single instruction cmp edx, dword ptr [rcx + 8].

The text was updated successfully, but these errors were encountered:

ghost · 2022-05-11T18:10:35Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When calling Span<T>.Slice(offset, length), the internal implementation 64-bit extends offset and length, then compares their sum against span.Length. This is an optimization that allows us to get away with a single comparison.

public class C {
    public static void M(ReadOnlySpan<byte> span, int offset, int length) {
        DoIt(span.Slice(offset, length));
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void DoIt(ReadOnlySpan<byte> span)
    {
    }
}

C.M(System.ReadOnlySpan`1<Byte>, Int32, Int32)
    L0000: sub rsp, 0x38
    L0004: xor eax, eax
    L0006: mov [rsp+0x28], rax
    L000b: mov eax, edx  ; rax := 64-bit zero-extension of offset
    L000d: mov edx, r8d  ; rdx := 64-bit zero-extension of length 
    L0010: add rdx, rax  ; rdx := sum of offset and length
    L0013: mov r9d, [rcx+8]  ; r9 := 64-bit zero-extension of span.Length
    L0017: cmp rdx, r9  ; bounds check
    L001a: ja short L003c
    L001c: mov rcx, [rcx]
    L001f: add rcx, rax
    L0022: mov [rsp+0x28], rcx
    L0027: mov [rsp+0x30], r8d
    L002c: lea rcx, [rsp+0x28]
    L0031: call 0x00007ffbf1df0020
    L0036: nop
    L0037: add rsp, 0x38
    L003b: ret
    L003c: call 0x00007ffbe86e5778
    L0041: int3

However, when start = 0 is a known constant, JIT still emits the 64-bit extension before performing the comparison. This is unnecessary because a comparison of two zero-extended values will yield the same result as a comparison of the original values. It also burns a register.

    public static void M(ReadOnlySpan<byte> span, int length) {
        DoIt(span.Slice(0, length));
    }

C.M(System.ReadOnlySpan`1<Byte>, Int32)
    L0000: sub rsp, 0x38
    L0004: xor eax, eax
    L0006: mov [rsp+0x28], rax
    L000b: mov eax, edx  ; rax := unnecessary 64-bit zero-extension of length
    L000d: mov r8d, [rcx+8]  ; r8 := unnecessary 64-bit zero-extension of span.Length
    L0011: cmp rax, r8  ; bounds check
    L0014: ja short L0032
    L0016: mov rcx, [rcx]
    L0019: mov [rsp+0x28], rcx
    L001e: mov [rsp+0x30], edx
    L0022: lea rcx, [rsp+0x28]
    L0027: call 0x00007ffbf1e10020
    L002c: nop
    L002d: add rsp, 0x38
    L0031: ret
    L0032: call 0x00007ffbe86e5778
    L0037: int3

In the above example, I would have expected the three annotated instructions to be collapsed into the single instruction cmp edx, dword ptr [rcx + 8].

Author:	GrabYourPitchforks
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2022-05-11T19:51:27Z

Yes, this looks quite easy. We end up with

               [000021] N---G+-N-U-                         └──▌  LE        int   
               [000015] -----+---U-                            ├──▌  CAST      long <- ulong <- uint
               [000003] -----+-----                            │  └──▌  LCL_VAR   int    V01 arg1         
               [000020] ----G+---U-                            └──▌  CAST      long <- ulong <- uint
               [000092] n---G+-----                               └──▌  IND       int   
               [000094] -----+-----                                  └──▌  ADD       byref 
               [000019] -----+-----                                     ├──▌  LCL_VAR   byref  V00 arg0         
               [000093] -----+-----                                     └──▌  CNS_INT   long   8 field offset Fseq[_length]

in morph.
I'll see if I can make some time for this.

Fix dotnet#69200

Fix #69200

GrabYourPitchforks added tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels May 11, 2022

ghost added the untriaged New issue has not been triaged by the area owner label May 11, 2022

GrabYourPitchforks mentioned this issue May 11, 2022

Optimize Span.Slice by using IsKnownConstant for start = 0 #69193

Closed

jakobbotsch self-assigned this May 11, 2022

jakobbotsch removed the untriaged New issue has not been triaged by the area owner label May 11, 2022

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue May 12, 2022

Use IntegralRange in fgOptimizeRelationalComparisonWithCasts

92b31ec

Fix dotnet#69200

jakobbotsch mentioned this issue May 12, 2022

Use IntegralRange in fgOptimizeRelationalComparisonWithCasts #69247

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 12, 2022

JulieLeeMSFT added this to the 7.0.0 milestone May 20, 2022

jakobbotsch closed this as completed in #69247 May 23, 2022

jakobbotsch added a commit that referenced this issue May 23, 2022

Use IntegralRange in fgOptimizeRelationalComparisonWithCasts (#69247)

534cfe3

Fix #69200

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 23, 2022

ghost locked as resolved and limited conversation to collaborators Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT perf: Span.Slice(0, ...) emits unnecessary 64-bit extensions #69200

JIT perf: Span.Slice(0, ...) emits unnecessary 64-bit extensions #69200

GrabYourPitchforks commented May 11, 2022

ghost commented May 11, 2022

jakobbotsch commented May 11, 2022

JIT perf: Span.Slice(0, ...) emits unnecessary 64-bit extensions #69200

JIT perf: Span.Slice(0, ...) emits unnecessary 64-bit extensions #69200

Comments

GrabYourPitchforks commented May 11, 2022

ghost commented May 11, 2022

jakobbotsch commented May 11, 2022