Remove the `OBJ(ADDR)` wrapping created for calls by the importer #51569

SingleAccretion · 2021-04-20T16:51:55Z

Reproduction:

public static int ObjAddrWrapping(Vector<int> d)
{
    d = Vector<int>.Zero;

    if (d == Vector<int>.Zero)
        return 2;

    return 3;
}

The nodes in question:

    [ 2]  13 (0x00d) call 0A000006
In Compiler::impImportCall: opcode is call, kind=0, callRetType is bool, structSize is 0
Named Intrinsic System.Numerics.Vector`1.op_Equality: Recognized
    [ 1]  18 (0x012) brfalse.s

STMT00001 (IL 0x007...  ???)
               [000011] ------------              *  JTRUE     void  
               [000010] ------------              \--*  EQ        int   
               [000008] ------------                 +--*  HWINTRINSIC bool   int op_Equality
               [000007] n-----------                 |  +--*  OBJ       simd16<System.Numerics.Vector`1[Int32]>
               [000006] ------------                 |  |  \--*  ADDR      byref 
               [000004] -------N----                 |  |     \--*  LCL_VAR   simd16<System.Numerics.Vector`1[Int32]> V00 arg0         
               [000005] ------------                 |  \--*  HWINTRINSIC simd16 int get_Zero
               [000009] ------------                 \--*  CNS_INT   int    0

This happens because impSIMDPopStack calls impNormStructVal which wraps local variables in the OBJ(ADDR) sequence, presumably to avoid calls with mistyped arguments leading to bad codegen. The supposition of this issue is that this wrapping can be avoided when possible, as it blocks downstream optimizations.

category:cq
theme:importer
skill-level:expert
cost:medium
impact:small

The text was updated successfully, but these errors were encountered:

AndyAyersMS · 2021-04-22T19:40:09Z

@sandreenko perhaps this is something you can investigate?

sandreenko · 2021-04-22T21:05:30Z

I have investigated this in the past, there are 2 issues:

arm64 put arg is kind of primitive and can work only with "OBJ" nodes, not very hard to fix;
morph calls "morphArguments" before it sees which are passed by ref so for:

ASG(LCL_VAR 16byte struct V01, CNST_INT 0)
CALL(LCL_VAR 16 byte struct V01)

if we don't have OBJ(ADDR()) sugar we do const propagation:

CALL(CNST_INT 0)

and then there is no handling that the argument has to be passed by reference, a right solution was described here #1231 (comment) but has not been implemented yet

what we want after assertion prop:

CALL(INIT_VAL(CNS_INT(0)))

see that it is passed by ref, spill it:

ASG(LCL_VAR 16 byte temp V02, INIT_VAL(CNS_INT(0)))
CALL(ADDR(V02))

of course, we can change fgMorphArgs to catch this case without introducing INIT_VAL(CNS_INT(0)) but fgMorphArgs is kind of fragile nowadays.

@SingleAccretion could you please show IR after morph and what optimization does not happen because of OBJ(ADDR())?

SingleAccretion · 2021-04-22T21:36:59Z

could you please show IR after morph

@sandreenko Of course.

*************** Finishing PHASE Morph - Global
Trees after Morph - Global

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..014)-> BB03 ( cond )                     i 
BB02 [0001]  1       BB01                  1       [014..016)        (return)                     i 
BB03 [0002]  1       BB01                  1       [016..018)        (return)                     i 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..014) -> BB03 (cond), preds={} succs={BB02,BB03}

***** BB01
STMT00000 (IL 0x000...0x005)
               [000003] -A---+------              *  ASG       simd16 (copy)
               [000001] D----+-N----              +--*  LCL_VAR   simd16<System.Numerics.Vector`1[Int32]> V00 arg0         
               [000000] -----+------              \--*  HWINTRINSIC simd16 int get_Zero

***** BB01
STMT00001 (IL 0x007...0x012)
               [000011] -----+------              *  JTRUE     void  
               [000010] J----+-N----              \--*  EQ        int   
               [000008] -----+------                 +--*  HWINTRINSIC bool   int op_Equality
               [000007] n----+------                 |  +--*  OBJ       simd16<System.Numerics.Vector`1[Int32]>
               [000006] -----+------                 |  |  \--*  ADDR      byref 
               [000004] -----+-N----                 |  |     \--*  LCL_VAR   simd16<System.Numerics.Vector`1[Int32]> V00 arg0         
               [000005] -----+------                 |  \--*  HWINTRINSIC simd16 int get_Zero
               [000009] -----+------                 \--*  CNS_INT   int    0

------------ BB02 [014..016) (return), preds={BB01} succs={}

***** BB02
STMT00003 (IL 0x014...0x015)
               [000015] -----+------              *  RETURN    int   
               [000014] -----+------              \--*  CNS_INT   int    2

------------ BB03 [016..018) (return), preds={BB01} succs={}

***** BB03
STMT00002 (IL 0x016...0x017)
               [000013] -----+------              *  RETURN    int   
               [000012] -----+------              \--*  CNS_INT   int    3

-------------------------------------------------------------------------------------------------------------------

what optimization does not happen because of OBJ(ADDR())?

As far as I can tell, this doesn't impact anything today, because we don't really do folding for vectors, but it does mean the trees become somewhat opaque to VN.

N001 [000004]   LCL_VAR   V00 arg0         u:2 (last use) => $100 {HWI_Vector128_get_Zero()}
N002 [000006]   ADDR      => $180 {PtrToLoc($40, $0)}
N003 [000007]   OBJ       => $141 {141}
N004 [000005]   HWINTRINSIC => $100 {HWI_Vector128_get_Zero()}
N005 [000008]   HWINTRINSIC => $1c0 {HWI_Vector128_op_Equality($141, $100)}
N006 [000009]   CNS_INT   0 => $40 {IntCns 0}
N007 [000010]   EQ        => $200 {EQ($1c0, $40)}

***** BB01, STMT00001(after)
N008 ( 15, 13) [000011] ------------              *  JTRUE     void  
N007 ( 13, 11) [000010] J------N----              \--*  EQ        int    $200
N005 ( 11,  9) [000008] ------------                 +--*  HWINTRINSIC bool   int op_Equality $1c0
N003 (  9,  7) [000007] n-----------                 |  +--*  OBJ       simd16<System.Numerics.Vector`1[Int32]> $141
N002 (  3,  3) [000006] ------------                 |  |  \--*  ADDR      byref  $180
N001 (  3,  2) [000004] -------N----                 |  |     \--*  LCL_VAR   simd16<System.Numerics.Vector`1[Int32]> V00 arg0         u:2 (last use) $100
N004 (  1,  1) [000005] ------------                 |  \--*  HWINTRINSIC simd16 int get_Zero $100
N006 (  1,  1) [000009] ------------                 \--*  CNS_INT   int    0 $40

I think the issues you describe are much more general than this one, because it is quite specific to the exact setup I had for the reproduction (sorry for not calling this out, I did not have a good handle on this myself :(): parameter as a local + Unix ABI (so no implicit byrefs in the picture).

There is no such wrapping for locals:

public static int ObjAddrWrapping(int a)
{
    var c = Vector<int>.Zero;

    if (a is 0)
        return 0;
 
    if (c == Vector<int>.Zero)
        return 2;

    return 3;
}

is imported as one would expect:

    [ 2]  17 (0x011) call 0A000006
In Compiler::impImportCall: opcode is call, kind=0, callRetType is bool, structSize is 0
Named Intrinsic System.Numerics.Vector`1.op_Equality: Recognized
    [ 1]  22 (0x016) brfalse.s

STMT00002 (IL 0x00B...  ???)
               [000013] ------------              *  JTRUE     void  
               [000012] ------------              \--*  EQ        int   
               [000010] ------------                 +--*  HWINTRINSIC bool   int op_Equality
               [000008] ------------                 |  +--*  LCL_VAR   simd16<System.Numerics.Vector`1[Int32]> V01 loc0         
               [000009] ------------                 |  \--*  HWINTRINSIC simd16 int get_Zero
               [000011] ------------                 \--*  CNS_INT   int    0

And so it is really me noticing this unexpected IR shape.

ASG(LCL_VAR 16byte struct V01, CNST_INT 0)

By the way, I have been wondering about this (as it is the IR I am looking at in #51500) - what is the desirable shape (that we want to have for it moving forward) for this tree specifically when we have TYP_SIMD nodes on the lhs? GT_SIMD (that's what morph does today), HWINTRINSIC get_Zero, something else?

tannergooding · 2021-04-23T01:23:49Z

This looks to also be the root cause of #50939

Take code such as:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float Distance(Vector2 value1, Vector2 value2)
{
    float distanceSquared = DistanceSquared(value1, value2);
    return MathF.Sqrt(distanceSquared);
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float Distance2(Vector2 value1, Vector2 value2)
{
    Vector2 difference = value1 - value2;
    return difference.Length();
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float DistanceSquared(Vector2 value1, Vector2 value2)
{
    Vector2 difference = value1 - value2;
    return difference.LengthSquared();
}

[MethodImpl(MethodImplOptions.AggressiveOptimization)]
static void Main(string[] args)
{
    var Vector2Value1 = new Vector2(-1.0f, 1.0f);
    var Vector2ValueInverted1 = new Vector2(1.0f, -1.0f);

    var result1 = Distance(Vector2Value1, Vector2ValueInverted1);
    Console.WriteLine(result1);

    var Vector2Value2 = new Vector2(-2.0f, 2.0f);
    var Vector2ValueInverted2 = new Vector2(2.0f, -2.0f);

    var result2 = Distance2(Vector2Value2, Vector2ValueInverted2);
    Console.WriteLine(result2);
}

JitDump.txt

Distance results in:

IN000e: 000049 vmovsd   qword ptr [V06 rsp+30H], xmm1
IN000f: 00004F vmovsd   qword ptr [V07 rsp+28H], xmm0
IN0010: 000055 vmovsd   xmm0, qword ptr [V06 rsp+30H]
IN0011: 00005B vmovsd   xmm1, qword ptr [V07 rsp+28H]
IN0012: 000061 vsubps   xmm0, xmm0, xmm1
IN0013: 000065 vdpps    xmm0, xmm0, xmm0, 49
IN0014: 00006B vsqrtss  xmm0, xmm0

While Distance2 results in:

IN0021: 0000AA vsubps   xmm0, xmm2, xmm0
IN0022: 0000AE vdpps    xmm0, xmm0, xmm0, 49
IN0023: 0000B4 vsqrtss  xmm0, xmm0

The ultimate difference between the two is that the former results in OBJ(ADDR(LCL_VAR)) nodes while the latter preserves everything as just LCL_VAR, which hinders our ability to write helper methods that are dependent on other helper methods.

tannergooding · 2021-04-23T03:03:54Z

@sandreenko, at least in the case I just listed, this looks to come down to ASG nodes being inserted across the inlining boundaries.

We then end up promoting the SIMD locals because we allow that for Vector2/3/4 in particular. It isn't clear to me why we allow promotion for Vector2/3/4 rather than keeping it in register and preferring extractps (or shufps on older hardware) via impSIMDGetFixed.

However, it also isn't clear why we aren't marking GT_ASG where the target is TYP_SIMD* as SetOpLclRelatedToSIMDIntrinsic. Because, for all intents and purposes simd = simd is a SIMD intrinsic, it is a movaps or movups (or movss for scalars).

tannergooding · 2021-04-23T03:21:05Z

Inserting:

#if FEATURE_SIMD
    if (varTypeIsSIMD(dst->gtType))
    {
        // We want to track SIMD assignments as being intrinsics since they
        // are functionally SIMD `mov` instructions and are more efficient
        // when we don't promote, particularly when it occurs due to inlining

        SetOpLclRelatedToSIMDIntrinsic(dst);
        SetOpLclRelatedToSIMDIntrinsic(src);
    }
#endif // FEATURE_SIMD

in gtNewAssignNode looks to resolve the issue I just mentioned. It doesn't resolve the issue @SingleAccretion mentioned since OBJ(ADDR(LCL)) still exists.

We are currently generating OBJ(ADDR(LCL)) for args since they are passed "byref" (this even looks to be the case on Linux, where some of them are actually meant to be passed "byval") and for this.
Seemingly, we should be able to replace the OBJ(ADDR(LCL)) with a direct copy in many scenarios (particularly for inlining), so we preserve the "byval" semantics, but remove the indirection that exists here.

tannergooding · 2021-04-23T03:27:01Z

Seemingly, we should be able to replace the OBJ(ADDR(LCL)) with a direct copy in many scenarios (particularly for inlining), so we preserve the "byval" semantics, but remove the indirection that exists here.

One potential idea here is to modify impPopCallArgs to insert a copy, rather than normalizing to OBJ(ADDR(LCL)). We can then make the appropriate ABI decision (actually byval vs byref) later if the call isn't inlined and we need to actually pass the value. Otherwise, the required copy has already been made (preserving the "pass by value" semantics that are important across call boundaries) and we never end up with an indirection that may harm downstream optimizations.

SingleAccretion · 2021-12-23T09:37:05Z

Essentially, this is a subset of #11057, the source of this wrapping is impNormStructVal, the primary purpose - carrying the signature type to morph. There are interesting side effects too, for example, it prevents morph from creating R-value TYP_STRUCT LCL_FLD (not completely though - they can still be generated under GT_RETURNs).

tannergooding · 2023-02-09T22:20:58Z

This was generally fixed by #81636

@SingleAccretion, do you want this kept open to track other OBJ wrapping we might be introducing?

SingleAccretion · 2023-02-09T22:51:49Z

No, we can close this assuming the performance impact of #81636 has been evaluated as acceptable.

tannergooding · 2023-02-09T23:14:59Z

Don't believe we've actually seen regressions in the perf triage yet. Will close and reopen if anything crops up.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Apr 20, 2021

SingleAccretion changed the title ~~Importer creates OBJ(ADDR(LCL_VAR)) trees when importing some static intrinsics.~~ Importer creates OBJ(ADDR(LCL_VAR)) trees when importing some static intrinsics Apr 20, 2021

sandreenko self-assigned this Apr 22, 2021

sandreenko added this to the Future milestone Apr 22, 2021

sandreenko removed the untriaged New issue has not been triaged by the area owner label Apr 22, 2021

tannergooding mentioned this issue Apr 23, 2021

Mark SIMD assignments as related to SIMD intrinsics #51731

Merged

SingleAccretion changed the title ~~Importer creates OBJ(ADDR(LCL_VAR)) trees when importing some static intrinsics~~ Remove the OBJ(ADDR) wrapping created for calls by the importer Dec 23, 2021

SingleAccretion assigned SingleAccretion and unassigned sandreenko Dec 23, 2021

SingleAccretion mentioned this issue Feb 2, 2022

Remove GT_ADDR nodes #11057

Closed

SingleAccretion mentioned this issue Feb 12, 2022

ZeroObj assertions #65257

Merged

SingleAccretion mentioned this issue Apr 25, 2022

Recognize more "local addr" trees as invariant #68484

Merged

jakobbotsch mentioned this issue Apr 27, 2022

Remove GTF_LATE_ARG #68617

Merged

tannergooding closed this as completed Feb 9, 2023

ghost locked as resolved and limited conversation to collaborators Mar 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the `OBJ(ADDR)` wrapping created for calls by the importer #51569

Remove the `OBJ(ADDR)` wrapping created for calls by the importer #51569

SingleAccretion commented Apr 20, 2021 •

edited by BruceForstall

Loading

AndyAyersMS commented Apr 22, 2021

sandreenko commented Apr 22, 2021

SingleAccretion commented Apr 22, 2021 •

edited

Loading

tannergooding commented Apr 23, 2021

tannergooding commented Apr 23, 2021 •

edited

Loading

tannergooding commented Apr 23, 2021

tannergooding commented Apr 23, 2021 •

edited

Loading

SingleAccretion commented Dec 23, 2021

tannergooding commented Feb 9, 2023

SingleAccretion commented Feb 9, 2023

tannergooding commented Feb 9, 2023

Remove the OBJ(ADDR) wrapping created for calls by the importer #51569

Remove the OBJ(ADDR) wrapping created for calls by the importer #51569

Comments

SingleAccretion commented Apr 20, 2021 • edited by BruceForstall Loading

AndyAyersMS commented Apr 22, 2021

sandreenko commented Apr 22, 2021

SingleAccretion commented Apr 22, 2021 • edited Loading

tannergooding commented Apr 23, 2021

tannergooding commented Apr 23, 2021 • edited Loading

tannergooding commented Apr 23, 2021

tannergooding commented Apr 23, 2021 • edited Loading

SingleAccretion commented Dec 23, 2021

tannergooding commented Feb 9, 2023

SingleAccretion commented Feb 9, 2023

tannergooding commented Feb 9, 2023

Remove the `OBJ(ADDR)` wrapping created for calls by the importer #51569

Remove the `OBJ(ADDR)` wrapping created for calls by the importer #51569

SingleAccretion commented Apr 20, 2021 •

edited by BruceForstall

Loading

SingleAccretion commented Apr 22, 2021 •

edited

Loading

tannergooding commented Apr 23, 2021 •

edited

Loading

tannergooding commented Apr 23, 2021 •

edited

Loading