JIT emitting unnecessary code for non-emitted method call #34641

michkot · 2020-04-07T16:29:45Z

Both x86 and x64 JIT emits unnecessary stack manipulation code (if I read it correctly) for the following (+- minimal example) code snippets of generic code with some asm-compile-time deduction. It seems that it emits the stack manipulation code required for method-call that is asm-compile-time unreachable.
Tested in Sharplab, Core CLR v4.700.19.57202.
sharplab.io link with more faulty variatons

Note: I stumbled opon this, when optimizing hot-path "is default" generic check, because I was not satisfied with this suggestion: https://stackoverflow.com/questions/65351/null-or-default-comparison-of-generic-argument-in-c-sharp/864860#864860

class C<TVal> and TVal can be any class at runtime

only wrong in x86:

  public static bool IsDefaultNormalDirect(TVal val)
  {
     bool isTValReferenceType = default(TVal) == null;
     if (isTValReferenceType)
     {
        return (val == null);
     }
     return val.Equals(default(TVal));
  }

x86:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalDirect(System.__Canon)
    L0000: push ebp <<---
    L0001: mov ebp, esp <<---
    L0003: test ecx, ecx
    L0005: setz al
    L0008: movzx eax, al
    L000b: pop ebp <<---
    L000c: ret

x64:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalDirect(System.__Canon)
    L0000: test rdx, rdx
    L0003: setz al
    L0006: movzx eax, al
    L0009: ret

======

wrong in both:

  public static bool IsDefaultValueTypes(TVal val)
  {
    return val.Equals(default(TVal));
  }
  public static bool IsDefaultNormalA(TVal val)
  {
     bool isTValReferenceType = default(TVal) == null;
     if (isTValReferenceType)
     {
        return (val == null);
     }
     return IsDefaultValueTypes(val);
  }

x86:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalA(System.__Canon)
    L0000: push ebp <<---
    L0001: mov ebp, esp <<---
    L0003: push eax <<---
    L0004: mov [ebp-0x4], edx <<---
    L0007: test ecx, ecx
    L0009: setz al
    L000c: movzx eax, al
    L000f: mov esp, ebp <<---
    L0011: pop ebp <<---
    L0012: ret

x64:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalA(System.__Canon)
    L0000: push rax <<---
    L0001: mov [rsp], rcx <<---
    L0005: test rdx, rdx
    L0008: setz al
    L000b: movzx eax, al
    L000e: add rsp, 0x8 <<---
    L0012: ret

The text was updated successfully, but these errors were encountered:

EgorBo · 2020-04-07T17:54:36Z

I guess it's #32094 (comment)

AndyAyersMS · 2020-04-07T18:01:11Z

I guess it's #32094 (comment)

Right, the "wrong in both case" seems to be the jit saving the generic context to report it to the VM.

The earlier cases are something different, likely we think we need a frame at one point early on and then optimize something, and don't realize we no longer need a frame.

michkot · 2020-04-09T18:11:03Z

In the #32094 (comment) it's a generic method body. Here, in the second example, it is method in a generic class that (might) call another method of the same class. Does it make a difference?

Why am I asking: The think that catches my eye is that the x64 version is short when the eliminated call is non-generic (first example). If the eliminated call is generic (second example), the "generic context" is stuck there.
Is there a contract saying the "generic context" must be there for the VM if there is a generic CALL inside the method body in IL? //edit: The generic parametr itself is used in both examples though (initobj !TVal).

I am sorry if I am wrong with reasoning.

AndyAyersMS · 2020-04-09T19:10:30Z

I took a closer look at the second case and we should actually be able to remove the context reporting. There are two conditions in which we report the context (store it to the stack) -- either if the context is used somewhere in the method, or if the VM requests that the jit report the context.

In the second example the VM does not request reporting. Initially the context looks like it is used but after some optimization we remove all the uses. Unfortunately we never re-check if the context is still used and go by our early impression. I think we can fix this.

AndyAyersMS · 2020-04-09T22:11:21Z

I have a fix, but it doesn't fire very often on FX (4 methods out of 240K).

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -36 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
          -9 : System.Collections.Concurrent.dasm (-0.00% of base)
          -9 : System.Collections.Immutable.dasm (-0.00% of base)
          -9 : System.Linq.Expressions.dasm (-0.00% of base)
          -9 : System.Linq.Parallel.dasm (-0.00% of base)
4 total files with Code Size differences (4 improved, 0 regressed), 260 unchanged.
Top method improvements (bytes):
          -9 (-17.65% of base) : System.Collections.Concurrent.dasm - Partitioner:GetDefaultChunkSize():int (7 methods)
          -9 (-18.75% of base) : System.Collections.Immutable.dasm - ImmutableExtensions:IsValueType():bool (7 methods)
          -9 (-1.11% of base) : System.Linq.Expressions.dasm - ReadOnlyCollectionBuilder`1:ValidateNullValue(Object,String) (7 methods)
          -9 (-17.65% of base) : System.Linq.Parallel.dasm - Scheduling:GetDefaultChunkSize():int (7 methods)
Top method improvements (percentages):
          -9 (-18.75% of base) : System.Collections.Immutable.dasm - ImmutableExtensions:IsValueType():bool (7 methods)
          -9 (-17.65% of base) : System.Collections.Concurrent.dasm - Partitioner:GetDefaultChunkSize():int (7 methods)
          -9 (-17.65% of base) : System.Linq.Parallel.dasm - Scheduling:GetDefaultChunkSize():int (7 methods)
          -9 (-1.11% of base) : System.Linq.Expressions.dasm - ReadOnlyCollectionBuilder`1:ValidateNullValue(Object,String) (7 methods)
4 total methods with Code Size differences (4 improved, 0 regressed), 243786 unchanged.

Some of the accounting we do (namely lvaGenericsContextUseCount) seems flawed in other ways; in particular if we clone trees containing context references we don't bump the count, so it can be an under-estimate.

Also there is some interesting codegen in the non-shared cases that I want to look at, seems like we ought to be able to get rid of some boxing there.

@davidwrighton @janvorli is there some clear picture of when the jit must report the generic context in gc info, outside of the cases where the VM requests the reporting? Comments in the jit code refer to collectable assemblies and suggest we're over-reporting:

runtime/src/coreclr/src/jit/importer.cpp

Lines 1976 to 1983 in 9e608fc

    
           GenTree* Compiler::getRuntimeContextTree(CORINFO_RUNTIME_LOOKUP_KIND kind) 
        
           { 
        
               GenTree* ctxTree = nullptr; 
        
               // Collectible types requires that for shared generic code, if we use the generic context parameter 
        
               // that we report it. (This is a conservative approach, we could detect some cases particularly when the 
        
               // context parameter is this that we don't need the eager reporting logic.) 
        
               lvaGenericsContextUseCount++;

When the context is passed as a separate arg it is fairly easy for us to accurately determine the use count after optimization. When the context comes from this we currently don't distinguish uses of this that are intended as the starting point for runtime lookups from other uses of this, so figuring out the use count after we've optimized is not as simple. Easiest thing would be to over-count and so over-report; if we could restrict this over-counting to just methods from collectable assemblies that would probably be reasonable.

Wondering if (a) we could flag methods that are in collectable assemblies for the jit, and relax reporting for assemblies that aren't collectable, and possibly (b) relax reporting of the this case in collectable assemblies, when the VM does not ask for reporting.

davidwrighton · 2020-04-09T23:54:10Z

(a) The collectible assemblies case is for scenarios like List<T>.Add compiled into List<Canon>.Add where T is some class that is a canonical type. So, the collectability of the generic context arg is unrelated to the collectability of the code the jit is producing. So, no, flagging collectible methods in the jit doesn't help.

(b) Relaxing reporting of this in these methods is only viable if the JIT can be taught to keep reporting this as live until the last use of the generic dictionary is complete (for collectible assembleis scenarios). This is a totally weird thing to do as I understand it, but could in theory be done.

Imagine a method that looks like

class C<T>
{
    void Method(object o)
    {
        T t = (T)o;
        Console.WriteLine(t);
    }
}

Now, the JIT will generate something like the following psuedocode for C<string>.Method

prolog
MethodTable *pMT = this->MethodTablePointer
if (pMT->GenericDictionarySlot(0) == 0)
{
    call FillInGenericDictionarySlot(pMT, 0);
}
MethodTable *pMT_T = pMT->GenericDictionarySlot(0)
call CastClass(pMT_T, t)
*** At this point the this pointer is no longer required to be reported to the GC
Console.WriteLine(t)
epilog

Note that the point at which the this pointer no longer needs to be reported isn't really related to any action performed on the this pointer, but is actually associated with the last use of a value pulled out of a data structure referenced by the this pointer. For simplicities sake, when implementing collectible types, I did not attempt to model this in the jit, and simply went the route of blanket reporting the this pointer, which was a far simpler model to validate for correctness. Since then, it appears that someone attempted to be a bit smarter by building the counting construct, but as you say it doesn't appear to work all that well.

I'll have to do some research to find out about the non-collectability reasons for reporting the generics context.

davidwrighton · 2020-04-10T00:00:03Z

One note, is that for the purposes of collectability there is no need for the JIT to use the generic context reporting stack slot for reporting the this pointer, and could simply report it via completely normal GC reporting, it just needs to be reported until after the last pointer pulled from a generic dictionary is no longer in use.

AndyAyersMS · 2020-04-10T00:23:49Z

Ok, thanks.

It appears that someone attempted to be a bit smarter

I added the "ref counting" aspect in dotnet/coreclr#9756 as often times after inlining or various opts we end up not needing the runtime lookup at all, and I wanted to fully dead code them. Didn't realize until now that this may sometimes under-count.

AndyAyersMS · 2020-04-10T17:51:18Z

Also there is some interesting codegen in the non-shared cases that I want to look at, seems like we ought to be able to get rid of some boxing there.

This turns out to be one of the multi-use box cases (see #9118) that we don't yet handle. Root method boxes, inlinee uses box in both type test and unboxing. We can clean some of this up, but are not able to undo the box.

The generics context is reported by the jit specially whenever it feeds into runtime lookups, as expansion of those lookups can expose pointers into runtime data structures, and we don't want those data structures to be collected if jitted code is still using them. Sometimes uses of the context are optimized away, and reporting costs code size and GC space, so we don't want to report the context unless there is an actual use. This change revises how the jit keeps track of context use -- instead of trying to incrementally ref count uses of the generics context, we now just leverage existing passes which do local accounting. Initial motivation for this came from dotnet#34641 where the context use was over-reported, but investigation showed we also some times under-report as the context var could be cloned without changing the ref count. So this change fixes both under and over reporting. Closes dotnet#34641.

AndyAyersMS · 2020-04-10T19:16:37Z

#34827 should fix the second case -- haven't looked at the first case yet, and may not get to it for a while, since it is x86 only.

The generics context is reported by the jit specially whenever it feeds into runtime lookups, as expansion of those lookups can expose pointers into runtime data structures, and we don't want those data structures to be collected if jitted code is still using them. Sometimes uses of the context are optimized away, and reporting costs code size and GC space, so we don't want to report the context unless there is an actual use. This change revises how the jit keeps track of context use -- instead of trying to incrementally ref count uses of the generics context, we now just leverage existing passes which do local accounting. Initial motivation for this came from #34641 where the context use was over-reported, but investigation showed we also some times under-report as the context var could be cloned without changing the ref count. So this change fixes both under and over reporting. Closes #34641.

Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 7, 2020

BruceForstall added this to the Future milestone Apr 7, 2020

BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 7, 2020

AndyAyersMS mentioned this issue Apr 10, 2020

JIT: revise how the jit tracks use of generics context #34827

Merged

michkot mentioned this issue Apr 13, 2020

JIT: "default(T) == null" does not emit a constant when T is Nullable<> #34906

Closed

AndyAyersMS closed this as completed in #34827 Apr 23, 2020

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT emitting unnecessary code for non-emitted method call #34641

JIT emitting unnecessary code for non-emitted method call #34641

michkot commented Apr 7, 2020

EgorBo commented Apr 7, 2020

AndyAyersMS commented Apr 7, 2020

michkot commented Apr 9, 2020 •

edited

Loading

AndyAyersMS commented Apr 9, 2020

AndyAyersMS commented Apr 9, 2020

davidwrighton commented Apr 9, 2020

davidwrighton commented Apr 10, 2020

AndyAyersMS commented Apr 10, 2020

AndyAyersMS commented Apr 10, 2020

AndyAyersMS commented Apr 10, 2020

JIT emitting unnecessary code for non-emitted method call #34641

JIT emitting unnecessary code for non-emitted method call #34641

Comments

michkot commented Apr 7, 2020

EgorBo commented Apr 7, 2020

AndyAyersMS commented Apr 7, 2020

michkot commented Apr 9, 2020 • edited Loading

AndyAyersMS commented Apr 9, 2020

AndyAyersMS commented Apr 9, 2020

davidwrighton commented Apr 9, 2020

davidwrighton commented Apr 10, 2020

AndyAyersMS commented Apr 10, 2020

AndyAyersMS commented Apr 10, 2020

AndyAyersMS commented Apr 10, 2020

michkot commented Apr 9, 2020 •

edited

Loading