Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT emitting unnecessary code for non-emitted method call #34641

Closed
michkot opened this issue Apr 7, 2020 · 10 comments · Fixed by #34827
Closed

JIT emitting unnecessary code for non-emitted method call #34641

michkot opened this issue Apr 7, 2020 · 10 comments · Fixed by #34827
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@michkot
Copy link

michkot commented Apr 7, 2020

Both x86 and x64 JIT emits unnecessary stack manipulation code (if I read it correctly) for the following (+- minimal example) code snippets of generic code with some asm-compile-time deduction. It seems that it emits the stack manipulation code required for method-call that is asm-compile-time unreachable.
Tested in Sharplab, Core CLR v4.700.19.57202.
sharplab.io link with more faulty variatons

Note: I stumbled opon this, when optimizing hot-path "is default" generic check, because I was not satisfied with this suggestion: https://stackoverflow.com/questions/65351/null-or-default-comparison-of-generic-argument-in-c-sharp/864860#864860

class C<TVal> and TVal can be any class at runtime

only wrong in x86:

  public static bool IsDefaultNormalDirect(TVal val)
  {
     bool isTValReferenceType = default(TVal) == null;
     if (isTValReferenceType)
     {
        return (val == null);
     }
     return val.Equals(default(TVal));
  }

x86:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalDirect(System.__Canon)
    L0000: push ebp <<---
    L0001: mov ebp, esp <<---
    L0003: test ecx, ecx
    L0005: setz al
    L0008: movzx eax, al
    L000b: pop ebp <<---
    L000c: ret

x64:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalDirect(System.__Canon)
    L0000: test rdx, rdx
    L0003: setz al
    L0006: movzx eax, al
    L0009: ret

======

wrong in both:

  public static bool IsDefaultValueTypes(TVal val)
  {
    return val.Equals(default(TVal));
  }
  public static bool IsDefaultNormalA(TVal val)
  {
     bool isTValReferenceType = default(TVal) == null;
     if (isTValReferenceType)
     {
        return (val == null);
     }
     return IsDefaultValueTypes(val);
  }

x86:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalA(System.__Canon)
    L0000: push ebp <<---
    L0001: mov ebp, esp <<---
    L0003: push eax <<---
    L0004: mov [ebp-0x4], edx <<---
    L0007: test ecx, ecx
    L0009: setz al
    L000c: movzx eax, al
    L000f: mov esp, ebp <<---
    L0011: pop ebp <<---
    L0012: ret

x64:

C`1[[System.__Canon, System.Private.CoreLib]].IsDefaultNormalA(System.__Canon)
    L0000: push rax <<---
    L0001: mov [rsp], rcx <<---
    L0005: test rdx, rdx
    L0008: setz al
    L000b: movzx eax, al
    L000e: add rsp, 0x8 <<---
    L0012: ret
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 7, 2020
@EgorBo
Copy link
Member

EgorBo commented Apr 7, 2020

I guess it's #32094 (comment)

@AndyAyersMS
Copy link
Member

I guess it's #32094 (comment)

Right, the "wrong in both case" seems to be the jit saving the generic context to report it to the VM.

The earlier cases are something different, likely we think we need a frame at one point early on and then optimize something, and don't realize we no longer need a frame.

@BruceForstall BruceForstall added this to the Future milestone Apr 7, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 7, 2020
@michkot
Copy link
Author

michkot commented Apr 9, 2020

In the #32094 (comment) it's a generic method body. Here, in the second example, it is method in a generic class that (might) call another method of the same class. Does it make a difference?

Why am I asking: The think that catches my eye is that the x64 version is short when the eliminated call is non-generic (first example). If the eliminated call is generic (second example), the "generic context" is stuck there.
Is there a contract saying the "generic context" must be there for the VM if there is a generic CALL inside the method body in IL? //edit: The generic parametr itself is used in both examples though (initobj !TVal).

I am sorry if I am wrong with reasoning.

@AndyAyersMS
Copy link
Member

I took a closer look at the second case and we should actually be able to remove the context reporting. There are two conditions in which we report the context (store it to the stack) -- either if the context is used somewhere in the method, or if the VM requests that the jit report the context.

In the second example the VM does not request reporting. Initially the context looks like it is used but after some optimization we remove all the uses. Unfortunately we never re-check if the context is still used and go by our early impression. I think we can fix this.

@AndyAyersMS
Copy link
Member

I have a fix, but it doesn't fire very often on FX (4 methods out of 240K).

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -36 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
          -9 : System.Collections.Concurrent.dasm (-0.00% of base)
          -9 : System.Collections.Immutable.dasm (-0.00% of base)
          -9 : System.Linq.Expressions.dasm (-0.00% of base)
          -9 : System.Linq.Parallel.dasm (-0.00% of base)
4 total files with Code Size differences (4 improved, 0 regressed), 260 unchanged.
Top method improvements (bytes):
          -9 (-17.65% of base) : System.Collections.Concurrent.dasm - Partitioner:GetDefaultChunkSize():int (7 methods)
          -9 (-18.75% of base) : System.Collections.Immutable.dasm - ImmutableExtensions:IsValueType():bool (7 methods)
          -9 (-1.11% of base) : System.Linq.Expressions.dasm - ReadOnlyCollectionBuilder`1:ValidateNullValue(Object,String) (7 methods)
          -9 (-17.65% of base) : System.Linq.Parallel.dasm - Scheduling:GetDefaultChunkSize():int (7 methods)
Top method improvements (percentages):
          -9 (-18.75% of base) : System.Collections.Immutable.dasm - ImmutableExtensions:IsValueType():bool (7 methods)
          -9 (-17.65% of base) : System.Collections.Concurrent.dasm - Partitioner:GetDefaultChunkSize():int (7 methods)
          -9 (-17.65% of base) : System.Linq.Parallel.dasm - Scheduling:GetDefaultChunkSize():int (7 methods)
          -9 (-1.11% of base) : System.Linq.Expressions.dasm - ReadOnlyCollectionBuilder`1:ValidateNullValue(Object,String) (7 methods)
4 total methods with Code Size differences (4 improved, 0 regressed), 243786 unchanged.

Some of the accounting we do (namely lvaGenericsContextUseCount) seems flawed in other ways; in particular if we clone trees containing context references we don't bump the count, so it can be an under-estimate.

Also there is some interesting codegen in the non-shared cases that I want to look at, seems like we ought to be able to get rid of some boxing there.

@davidwrighton @janvorli is there some clear picture of when the jit must report the generic context in gc info, outside of the cases where the VM requests the reporting? Comments in the jit code refer to collectable assemblies and suggest we're over-reporting:

GenTree* Compiler::getRuntimeContextTree(CORINFO_RUNTIME_LOOKUP_KIND kind)
{
GenTree* ctxTree = nullptr;
// Collectible types requires that for shared generic code, if we use the generic context parameter
// that we report it. (This is a conservative approach, we could detect some cases particularly when the
// context parameter is this that we don't need the eager reporting logic.)
lvaGenericsContextUseCount++;

When the context is passed as a separate arg it is fairly easy for us to accurately determine the use count after optimization. When the context comes from this we currently don't distinguish uses of this that are intended as the starting point for runtime lookups from other uses of this, so figuring out the use count after we've optimized is not as simple. Easiest thing would be to over-count and so over-report; if we could restrict this over-counting to just methods from collectable assemblies that would probably be reasonable.

Wondering if (a) we could flag methods that are in collectable assemblies for the jit, and relax reporting for assemblies that aren't collectable, and possibly (b) relax reporting of the this case in collectable assemblies, when the VM does not ask for reporting.

@davidwrighton
Copy link
Member

(a) The collectible assemblies case is for scenarios like List<T>.Add compiled into List<Canon>.Add where T is some class that is a canonical type. So, the collectability of the generic context arg is unrelated to the collectability of the code the jit is producing. So, no, flagging collectible methods in the jit doesn't help.

(b) Relaxing reporting of this in these methods is only viable if the JIT can be taught to keep reporting this as live until the last use of the generic dictionary is complete (for collectible assembleis scenarios). This is a totally weird thing to do as I understand it, but could in theory be done.

Imagine a method that looks like

class C<T>
{
    void Method(object o)
    {
        T t = (T)o;
        Console.WriteLine(t);
    }
}

Now, the JIT will generate something like the following psuedocode for C<string>.Method

prolog
MethodTable *pMT = this->MethodTablePointer
if (pMT->GenericDictionarySlot(0) == 0)
{
    call FillInGenericDictionarySlot(pMT, 0);
}
MethodTable *pMT_T = pMT->GenericDictionarySlot(0)
call CastClass(pMT_T, t)
*** At this point the this pointer is no longer required to be reported to the GC
Console.WriteLine(t)
epilog

Note that the point at which the this pointer no longer needs to be reported isn't really related to any action performed on the this pointer, but is actually associated with the last use of a value pulled out of a data structure referenced by the this pointer. For simplicities sake, when implementing collectible types, I did not attempt to model this in the jit, and simply went the route of blanket reporting the this pointer, which was a far simpler model to validate for correctness. Since then, it appears that someone attempted to be a bit smarter by building the counting construct, but as you say it doesn't appear to work all that well.

I'll have to do some research to find out about the non-collectability reasons for reporting the generics context.

@davidwrighton
Copy link
Member

One note, is that for the purposes of collectability there is no need for the JIT to use the generic context reporting stack slot for reporting the this pointer, and could simply report it via completely normal GC reporting, it just needs to be reported until after the last pointer pulled from a generic dictionary is no longer in use.

@AndyAyersMS
Copy link
Member

Ok, thanks.

It appears that someone attempted to be a bit smarter

I added the "ref counting" aspect in dotnet/coreclr#9756 as often times after inlining or various opts we end up not needing the runtime lookup at all, and I wanted to fully dead code them. Didn't realize until now that this may sometimes under-count.

@AndyAyersMS
Copy link
Member

Also there is some interesting codegen in the non-shared cases that I want to look at, seems like we ought to be able to get rid of some boxing there.

This turns out to be one of the multi-use box cases (see #9118) that we don't yet handle. Root method boxes, inlinee uses box in both type test and unboxing. We can clean some of this up, but are not able to undo the box.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 10, 2020
The generics context is reported by the jit specially whenever it feeds into
runtime lookups, as expansion of those lookups can expose pointers into runtime
data structures, and we don't want those data structures to be collected if
jitted code is still using them.

Sometimes uses of the context are optimized away, and reporting costs
code size and GC space, so we don't want to report the context unless there
is an actual use.

This change revises how the jit keeps track of context use -- instead of trying
to incrementally ref count uses of the generics context, we now just leverage
existing passes which do local accounting.

Initial motivation for this came from dotnet#34641 where the context use was
over-reported, but investigation showed we also some times under-report as
the context var could be cloned without changing the ref count.

So this change fixes both under and over reporting.

Closes dotnet#34641.
@AndyAyersMS
Copy link
Member

#34827 should fix the second case -- haven't looked at the first case yet, and may not get to it for a while, since it is x86 only.

AndyAyersMS added a commit that referenced this issue Apr 23, 2020
The generics context is reported by the jit specially whenever it feeds into
runtime lookups, as expansion of those lookups can expose pointers into runtime
data structures, and we don't want those data structures to be collected if
jitted code is still using them.

Sometimes uses of the context are optimized away, and reporting costs
code size and GC space, so we don't want to report the context unless there
is an actual use.

This change revises how the jit keeps track of context use -- instead of trying
to incrementally ref count uses of the generics context, we now just leverage
existing passes which do local accounting.

Initial motivation for this came from #34641 where the context use was
over-reported, but investigation showed we also some times under-report as
the context var could be cloned without changing the ref count.

So this change fixes both under and over reporting.

Closes #34641.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants