Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: cloning invariant analysis seems oddly limited #70100

Closed
AndyAyersMS opened this issue Jun 1, 2022 · 4 comments · Fixed by #70232
Closed

JIT: cloning invariant analysis seems oddly limited #70100

AndyAyersMS opened this issue Jun 1, 2022 · 4 comments · Fixed by #70232
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

Consider the three similar methods F, G, and H:

using System;
using System.Runtime.CompilerServices;

class X
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void S() { }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void F(int[] a, int low, int high, ref int z)
    {
        for (int i = low; i < high; i++)
        {
             z += a[i];
             S(); 
        }  
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void G(int[] a, int low, int high, ref int z)
    {
        for (int i = low; i < high; i++)
        {
             z += a[i];
        }  
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void H(int[] a, int low, int high, ref int z)
    {
        int r = 0;
        for (int i = low; i < high; i++)
        {
             r += a[i];
             S();
        }  
        z = r;
    }

    public static int Main()
    {
         int[] a = new int[] { 1, 2, 3, 4 };
         int z = 0;
         F(a, 2, 3, ref z);
         G(a, 2, 3, ref z);
         H(a, 2, 3, ref z);
         return z + 70;
    }
}

The JIT will clone the loops in G and H but not in F. This happens because optIsSetAssgLoop will always report locals as modified in any loop that contains both a user call and an indir store, even if the local in question (i in this case) is not address exposed.

I think the check done in the latter part of optIsSetAssgLoop is wrong, and it should be checking whether inds has the various possibilities, not loop->lpAsgInds (which we've already checked earlier in the method). And the loop invariant tests done by the cloner always pass an empty inds so that these latter checks become irrelevant.

int Compiler::optIsSetAssgLoop(unsigned lnum, ALLVARSET_VALARG_TP vars, varRefKinds inds)
{
noway_assert(lnum < optLoopCount);
LoopDsc* loop = &optLoopTable[lnum];
/* Do we already know what variables are assigned within this loop? */
if (!(loop->lpFlags & LPFLG_ASGVARS_YES))
{
isVarAssgDsc desc;
/* Prepare the descriptor used by the tree walker call-back */
desc.ivaVar = (unsigned)-1;
desc.ivaSkip = nullptr;
#ifdef DEBUG
desc.ivaSelf = &desc;
#endif
AllVarSetOps::AssignNoCopy(this, desc.ivaMaskVal, AllVarSetOps::MakeEmpty(this));
desc.ivaMaskInd = VR_NONE;
desc.ivaMaskCall = CALLINT_NONE;
desc.ivaMaskIncomplete = false;
/* Now walk all the statements of the loop */
for (BasicBlock* const block : loop->LoopBlocks())
{
for (Statement* const stmt : block->NonPhiStatements())
{
fgWalkTreePre(stmt->GetRootNodePointer(), optIsVarAssgCB, &desc);
if (desc.ivaMaskIncomplete)
{
loop->lpFlags |= LPFLG_ASGVARS_INC;
}
}
}
AllVarSetOps::Assign(this, loop->lpAsgVars, desc.ivaMaskVal);
loop->lpAsgInds = desc.ivaMaskInd;
loop->lpAsgCall = desc.ivaMaskCall;
/* Now we know what variables are assigned in the loop */
loop->lpFlags |= LPFLG_ASGVARS_YES;
}
/* Now we can finally test the caller's mask against the loop's */
if (!AllVarSetOps::IsEmptyIntersection(this, loop->lpAsgVars, vars) || (loop->lpAsgInds & inds))
{
return 1;
}
switch (loop->lpAsgCall)
{
case CALLINT_ALL:
/* Can't hoist if the call might have side effect on an indirection. */
if (loop->lpAsgInds != VR_NONE)
{
return 1;
}
break;
case CALLINT_REF_INDIRS:
/* Can't hoist if the call might have side effect on an ref indirection. */
if (loop->lpAsgInds & VR_IND_REF)
{
return 1;
}
break;
case CALLINT_SCL_INDIRS:
/* Can't hoist if the call might have side effect on an non-ref indirection. */
if (loop->lpAsgInds & VR_IND_SCL)
{
return 1;
}
break;
case CALLINT_ALL_INDIRS:
/* Can't hoist if the call might have side effect on any indirection. */
if (loop->lpAsgInds & (VR_IND_REF | VR_IND_SCL))
{
return 1;
}
break;
case CALLINT_NONE:
/* Other helpers kill nothing */
break;
default:
noway_assert(!"Unexpected lpAsgCall value");
}
return 0;
}
void Compiler::optPerformHoistExpr(GenTree* origExpr, BasicBlock* exprBb, unsigned lnum)

This impacts GDV-driven cloning of loops (#65206), since such loops inevitably involve calls; if the loop is the expansion of a foreach we often also see indir stores updating the iterator fields.

Fixing this leads to a fairly large total diff given the relatively small number of impacted methods. Presumably this combination of calls and indir stores only appears in larger loops.

We could perhaps be even more aggressive with the fix and consider actually setting the indir mask if the local we're interested in proving invariant is address exposed. Haven't tried this.

Longer term we really should find a way to leverage parts of the invariant logic we use for hoisting here. The fact that a local is assigned in a loop is an indication it might not be a loop invariant, but is by no means a proof.

96 total methods with Code Size differences (25 improved, 71 regressed), 9 unchanged.
91 total methods with Code Size differences (28 improved, 63 regressed), 8 unchanged.
79 total methods with Code Size differences (12 improved, 67 regressed), 3 unchanged.
246 total methods with Code Size differences (47 improved, 199 regressed), 35 unchanged.
383 total methods with Code Size differences (90 improved, 293 regressed), 36 unchanged.
529 total methods with Code Size differences (196 improved, 333 regressed), 22 unchanged.

Total bytes of delta: 26990 (0.12 % of base)
Total bytes of delta: 26591 (0.27 % of base)
Total bytes of delta: 16406 (0.01 % of base)
Total bytes of delta: 81656 (0.24 % of base)
Total bytes of delta: 125349 (0.25 % of base)
Total bytes of delta: 133325 (0.13 % of base)

cc @dotnet/jit-contrib

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 1, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Jun 1, 2022
@ghost
Copy link

ghost commented Jun 1, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Consider the three similar methods F, G, and H:

using System;
using System.Runtime.CompilerServices;

class X
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void S() { }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void F(int[] a, int low, int high, ref int z)
    {
        for (int i = low; i < high; i++)
        {
             z += a[i];
             S(); 
        }  
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void G(int[] a, int low, int high, ref int z)
    {
        for (int i = low; i < high; i++)
        {
             z += a[i];
        }  
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void H(int[] a, int low, int high, ref int z)
    {
        int r = 0;
        for (int i = low; i < high; i++)
        {
             r += a[i];
             S();
        }  
        z = r;
    }

    public static int Main()
    {
         int[] a = new int[] { 1, 2, 3, 4 };
         int z = 0;
         F(a, 2, 3, ref z);
         G(a, 2, 3, ref z);
         H(a, 2, 3, ref z);
         return z + 70;
    }
}

The JIT will clone the loops in G and H but not in F. This happens because optIsSetAssgLoop will always report locals as modified in any loop that contains both a user call and an indir store, even if the local in question (i in this case) is not address exposed.

I think the check done in the latter part of optIsSetAssgLoop is wrong, and it should be checking whether inds has the various possibilities, not loop->lpAsgInds (which we've already checked earlier in the method). And the loop invariant tests done by the cloner always pass an empty inds so that these latter checks become irrelevant.

int Compiler::optIsSetAssgLoop(unsigned lnum, ALLVARSET_VALARG_TP vars, varRefKinds inds)
{
noway_assert(lnum < optLoopCount);
LoopDsc* loop = &optLoopTable[lnum];
/* Do we already know what variables are assigned within this loop? */
if (!(loop->lpFlags & LPFLG_ASGVARS_YES))
{
isVarAssgDsc desc;
/* Prepare the descriptor used by the tree walker call-back */
desc.ivaVar = (unsigned)-1;
desc.ivaSkip = nullptr;
#ifdef DEBUG
desc.ivaSelf = &desc;
#endif
AllVarSetOps::AssignNoCopy(this, desc.ivaMaskVal, AllVarSetOps::MakeEmpty(this));
desc.ivaMaskInd = VR_NONE;
desc.ivaMaskCall = CALLINT_NONE;
desc.ivaMaskIncomplete = false;
/* Now walk all the statements of the loop */
for (BasicBlock* const block : loop->LoopBlocks())
{
for (Statement* const stmt : block->NonPhiStatements())
{
fgWalkTreePre(stmt->GetRootNodePointer(), optIsVarAssgCB, &desc);
if (desc.ivaMaskIncomplete)
{
loop->lpFlags |= LPFLG_ASGVARS_INC;
}
}
}
AllVarSetOps::Assign(this, loop->lpAsgVars, desc.ivaMaskVal);
loop->lpAsgInds = desc.ivaMaskInd;
loop->lpAsgCall = desc.ivaMaskCall;
/* Now we know what variables are assigned in the loop */
loop->lpFlags |= LPFLG_ASGVARS_YES;
}
/* Now we can finally test the caller's mask against the loop's */
if (!AllVarSetOps::IsEmptyIntersection(this, loop->lpAsgVars, vars) || (loop->lpAsgInds & inds))
{
return 1;
}
switch (loop->lpAsgCall)
{
case CALLINT_ALL:
/* Can't hoist if the call might have side effect on an indirection. */
if (loop->lpAsgInds != VR_NONE)
{
return 1;
}
break;
case CALLINT_REF_INDIRS:
/* Can't hoist if the call might have side effect on an ref indirection. */
if (loop->lpAsgInds & VR_IND_REF)
{
return 1;
}
break;
case CALLINT_SCL_INDIRS:
/* Can't hoist if the call might have side effect on an non-ref indirection. */
if (loop->lpAsgInds & VR_IND_SCL)
{
return 1;
}
break;
case CALLINT_ALL_INDIRS:
/* Can't hoist if the call might have side effect on any indirection. */
if (loop->lpAsgInds & (VR_IND_REF | VR_IND_SCL))
{
return 1;
}
break;
case CALLINT_NONE:
/* Other helpers kill nothing */
break;
default:
noway_assert(!"Unexpected lpAsgCall value");
}
return 0;
}
void Compiler::optPerformHoistExpr(GenTree* origExpr, BasicBlock* exprBb, unsigned lnum)

This impacts GDV-driven cloning of loops (#65206), since such loops inevitably involve calls; if the loop is the expansion of a foreach we often also see indir stores updating the iterator fields.

Fixing this leads to a fairly large total diff given the relatively small number of impacted methods. Presumably this combination of calls and indir stores only appears in larger loops.

We could perhaps be even more aggressive with the fix and consider actually setting the indir mask if the local we're interested in proving invariant is address exposed. Haven't tried this.

Longer term we really should find a way to leverage parts of the invariant logic we use for hoisting here. The fact that a local is assigned in a loop is an indication it might not be a loop invariant, but is by no means a proof.

96 total methods with Code Size differences (25 improved, 71 regressed), 9 unchanged.
91 total methods with Code Size differences (28 improved, 63 regressed), 8 unchanged.
79 total methods with Code Size differences (12 improved, 67 regressed), 3 unchanged.
246 total methods with Code Size differences (47 improved, 199 regressed), 35 unchanged.
383 total methods with Code Size differences (90 improved, 293 regressed), 36 unchanged.
529 total methods with Code Size differences (196 improved, 333 regressed), 22 unchanged.

Total bytes of delta: 26990 (0.12 % of base)
Total bytes of delta: 26591 (0.27 % of base)
Total bytes of delta: 16406 (0.01 % of base)
Total bytes of delta: 81656 (0.24 % of base)
Total bytes of delta: 125349 (0.25 % of base)
Total bytes of delta: 133325 (0.13 % of base)

cc @dotnet/jit-contrib

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion
Copy link
Contributor

FWIW, I have investigated this part of loop cloning in the past (diff), in an attempt to fix #61040, and the conclusion was that it is not just conservative, but incorrect, as it can miss locals defined in the loop via indirect or LCL_FLD-based stores.

@AndyAyersMS AndyAyersMS self-assigned this Jun 3, 2022
@AndyAyersMS AndyAyersMS added this to the 7.0.0 milestone Jun 3, 2022
@AndyAyersMS
Copy link
Member Author

Also wanted to fix the loop invariant analysis to use wider bit vectors, but we're upstream of lvaMarkLocalVars and so can't readily use VARSET_TP.

  • We could move (or more likely duplicate) lvaMarkLocalVars and run it earlier (too), but that might cause other issues. Might be worth a try though.
  • We could use a general BitVec but keeping it in sync with traits seems a bit clunky. Eg we may not want to use lvaCount (though object alloc does this, and perhaps we'd be likewise ok because we will only have a few BVs per loop). Would be more viable if we had sparse BVs.

Ultimately all this loop stuff should happen with SSA built, in which case the above is moot, we won't need bit vectors.

@AndyAyersMS
Copy link
Member Author

Looks like ALLVARSET is only really used by loop analysis (and one other small thing) so boosting it to behave more like VARSET is plausible.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Jun 4, 2022
Streamline call effects checks. Use wider bit vectors.

Closes dotnet#70100.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 4, 2022
AndyAyersMS added a commit that referenced this issue Jun 7, 2022
Streamline call effects checks. Use wider bit vectors.

Closes #70100.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants