Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT code optimization inconsistency #51587

Closed
giladfrid009 opened this issue Apr 20, 2021 · 5 comments · Fixed by #78874
Closed

JIT code optimization inconsistency #51587

giladfrid009 opened this issue Apr 20, 2021 · 5 comments · Fixed by #78874
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@giladfrid009
Copy link

giladfrid009 commented Apr 20, 2021

Description

Whenever we have a type switch in a method, and the type is known compile-time, then the wrong cases get jitted away.

Indeed this happens for the following implementations:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T AddSwitch(T left, T right)
{       
    return (left, right) switch
    {
        (byte L, byte R) => (T)(object)(byte)(L + R),
        (sbyte L, sbyte R) => (T)(object)(sbyte)(L + R),
        (ushort L, ushort R) => (T)(object)(ushort)(L + R),
        (short L, short R) => (T)(object)(short)(L + R),
        (uint L, uint R) => (T)(object)(uint)(L + R),
        (int L, int R) => (T)(object)(int)(L + R),
        (ulong L, ulong R) => (T)(object)(ulong)(L + R),
        (long L, long R) => (T)(object)(long)(L + R),
        (float L, float R) => (T)(object)(float)(L + R),
        (double L, double R) => (T)(object)(double)(L + R),
        _ => throw new NotSupportedException(typeof(T).Name)
    };
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T AddIfs(T left, T right)
{       
    if (typeof(T) == typeof(byte)) return (T)(object)(byte)((byte)(object)left + (byte)(object)right);
    if (typeof(T) == typeof(sbyte)) return (T)(object)(sbyte)((sbyte)(object)left + (sbyte)(object)right);
    if (typeof(T) == typeof(ushort)) return (T)(object)(ushort)((ushort)(object)left + (ushort)(object)right);
    if (typeof(T) == typeof(short)) return (T)(object)(short)((short)(object)left + (short)(object)right);
    if (typeof(T) == typeof(uint)) return (T)(object)((uint)(object)left + (uint)(object)right);
    if (typeof(T) == typeof(int)) return (T)(object)((int)(object)left + (int)(object)right);
    if (typeof(T) == typeof(ulong)) return (T)(object)((ulong)(object)left + (ulong)(object)right);
    if (typeof(T) == typeof(long)) return (T)(object)((long)(object)left + (long)(object)right);
    if (typeof(T) == typeof(float)) return (T)(object)((float)(object)left + (float)(object)right);
    if (typeof(T) == typeof(double)) return (T)(object)((double)(object)left + (double)(object)right);

    throw new NotSupportedException(typeof(T).Name);
}

JIT code:

SimpleSimd.NumOps`1[[System.Int32, System.Private.CoreLib]].AddSwitch(Int32, Int32)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: lea eax, [ecx+edx]
    L0006: pop ebp
    L0007: ret

SimpleSimd.NumOps`1[[System.Int32, System.Private.CoreLib]].AddIfs(Int32, Int32)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: lea eax, [ecx+edx]
    L0006: pop ebp
    L0007: ret
  • The problem is, that while AddIfs properly gets inlined, AddSwitch does not.

Version using AddIfs:

SimpleSimd.SimdOps`1[[System.Int32, System.Private.CoreLib]].Sum(System.ReadOnlySpan`1<Int32> ByRef)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: vzeroupper
    L0008: mov eax, [ecx]
    L000a: mov edx, [ecx+4]
    L000d: xor ecx, ecx
    L000f: vxorps ymm0, ymm0, ymm0
    L0013: mov esi, edx
    L0015: sar esi, 0x1f
    L0018: and esi, 7
    L001b: add esi, edx
    L001d: sar esi, 3
    L0020: test esi, esi
    L0022: jle short L0037
    L0024: mov edi, ecx
    L0026: shl edi, 5
    L0029: vmovupd ymm1, [eax+edi]
    L002e: vpaddd ymm0, ymm0, ymm1
    L0032: inc ecx
    L0033: cmp ecx, esi
    L0035: jl short L0024
    L0037: vpmulld ymm0, ymm0, [SimpleSimd.SimdOps`1[[System.Int32, System.Private.CoreLib]].Sum(System.ReadOnlySpan`1<Int32> ByRef)]
    L0040: vphaddd ymm0, ymm0, ymm0
    L0045: vphaddd ymm0, ymm0, ymm0
    L004a: vextractf128 xmm1, ymm0, 1
    L0050: vpaddd xmm0, xmm0, xmm1
    L0054: vmovd esi, xmm0
    L0058: shl ecx, 3
    L005b: cmp ecx, edx
    L005d: jge short L0069
    L005f: mov edi, [eax+ecx*4]
    L0062: add esi, edi
    L0064: inc ecx
    L0065: cmp ecx, edx
    L0067: jl short L005f
    L0069: mov eax, esi
    L006b: vzeroupper
    L006e: pop esi
    L006f: pop edi
    L0070: pop ebp
    L0071: ret

Version using AddSwitch:

SimpleSimd.SimdOps`1[[System.Int32, System.Private.CoreLib]].Sum(System.ReadOnlySpan`1<Int32> ByRef)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: push ebx
    L0006: vzeroupper
    L0009: mov esi, ecx
    L000b: mov edi, [esi]
    L000d: mov ecx, [esi+4]
    L0010: xor ebx, ebx
    L0012: vxorps ymm0, ymm0, ymm0
    L0016: mov edx, ecx
    L0018: sar edx, 0x1f
    L001b: and edx, 7
    L001e: add edx, ecx
    L0020: sar edx, 3
    L0023: test edx, edx
    L0025: jle short L003a
    L0027: mov eax, ebx
    L0029: shl eax, 5
    L002c: vmovupd ymm1, [edi+eax]
    L0031: vpaddd ymm0, ymm0, ymm1
    L0035: inc ebx
    L0036: cmp ebx, edx
    L0038: jl short L0027
    L003a: vpmulld ymm0, ymm0, [SimpleSimd.SimdOps`1[[System.Int32, System.Private.CoreLib]].Sum(System.ReadOnlySpan`1<Int32> ByRef)]
    L0043: vphaddd ymm0, ymm0, ymm0
    L0048: vphaddd ymm0, ymm0, ymm0
    L004d: vextractf128 xmm1, ymm0, 1
    L0053: vpaddd xmm0, xmm0, xmm1
    L0057: vmovd eax, xmm0
    L005b: shl ebx, 3
    L005e: cmp ebx, ecx
    L0060: jge short L0073
    L0062: mov edx, [edi+ebx*4]
    L0065: mov ecx, eax
    L0067: call dword ptr [0x822ec34]
    L006d: inc ebx
    L006e: cmp ebx, [esi+4]
    L0071: jl short L0062
    L0073: vzeroupper
    L0076: pop ebx
    L0077: pop esi
    L0078: pop edi
    L0079: pop ebp
    L007a: ret

The complete example can be found in SharpLab in the following link: Click me

Expected behavior:

both get inlined.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 20, 2021
@AndyAyersMS
Copy link
Member

Same issue as in #41692. The method is large (~900+ bytes of IL) and runs afoul of the budget check.

BudgetCheck: for IL Size 927, timeDelta 1840 +  currentEstimate 357 > currentBudget 1678

INLINER: during 'fgInline' result 'failed this call site' reason 'inline exceeds budget' for 'SimdOps`1:Sum(byref):int' calling 'NumOps`1:AddSwitch(int,int):int'

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Apr 21, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Apr 21, 2021
@AndyAyersMS AndyAyersMS self-assigned this Apr 21, 2021
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 21, 2021
… opportunities

Look for IL patterns in an inlinee that are indicative of type folding. Also track
when an inlinee has calls that are intrinsics.

Use this to boost the inlinee profitability estimate.

Also, allow an aggressive inline inline candidate method to go over the budget when
it contains type folding or intrinsics (and so the method may be simpler than it appears).
Remove the old criteria (top-level inline) which was harder to reason about.

Closes dotnet#43761.
Closes dotnet#41692.
Closes dotnet#51587.
Closes dotnet#47434.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 21, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 10, 2021
@AndyAyersMS
Copy link
Member

@EgorBo perhaps you can follow up here?

@AndyAyersMS
Copy link
Member

Seems like we should now have the extended default policy override DefaultPolicy::BudgetCheck and allow force inlines with (sufficient) foldable opportunities.

One way to model this is to try and guestimate the effective IL size up front, and just use that for the existing budget check: assume each foldable reduces the amount of IL by some fixed heuristic fraction f. Then if there are N foldables, the effective IL size is something like (1 - f)^N. So, for instance, with IL size of 100, 3 foldables, f = 0.25; we'd have effective IL size of 100 * (1 - 0.25) ^ 3 = 42.

(Note we don't have a flow based model to project how much each folding opportunity would "save" in terms of imported IL, and building such models is somewhat tricky).

Another option would be to keep the budget check as is but allow going over budget under more conditions, related to the relative amount of foldables per byte of IL, or some other similar measure.

@EgorBo
Copy link
Member

EgorBo commented Jul 7, 2021

One way to model this is to try and guestimate the effective IL size up front, and just use that for the existing budget check: assume each foldable reduces the amount of IL by some fixed heuristic fraction f.

That is exactly what I wanted to try, I just didn't include it in my pr because "run out of budget" didn't pop up in TE benchmarks for me, but still worth doing of course.

Without analyzing the IL, we can at least count on "NonGeneric calls Generic" and const arg is passed.

@AndyAyersMS
Copy link
Member

Not sure we're going to get to this in 6.0. So moving to future.

@AndyAyersMS AndyAyersMS modified the milestones: 6.0.0, Future Jul 12, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 26, 2022
@EgorBo EgorBo modified the milestones: Future, 8.0.0 Nov 26, 2022
@EgorBo EgorBo assigned EgorBo and unassigned AndyAyersMS Nov 26, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
3 participants