Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: Add return cases into optimise bools #112945

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 46 additions & 21 deletions src/coreclr/jit/optimizebools.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -962,23 +962,26 @@ bool OptBoolsDsc::optOptimizeRangeTests()
//
bool OptBoolsDsc::optOptimizeCompareChainCondBlock()
{
assert((m_b1 != nullptr) && (m_b2 != nullptr) && (m_b3 == nullptr));
assert((m_b1 != nullptr) && (m_b2 != nullptr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed by @jakobbotsch in #96819 (comment), should we still consider doing this based on the PGO information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"That would require us to have reliable PGO data at the point of "optimize bools" though, which we don't have yet"

  • I think that's still the current state of PGO. So, not much we can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think our PGO is in a good shape now (certainly much better than it was back then). cc @amanasifkhalid @AndyAyersMS for thoughts on using PGO information during optOptimizeBools

Copy link
Member

@amanasifkhalid amanasifkhalid Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we ought to use it as a matter of principle, and identify cases where the PGO data fails us as motivation for introducing another profile repair call site somewhere midway through compilation. I don't think it's all that common for the profile data to be completely wrecked post-morph, and in the cases where it is, there is usually a bigger underlying problem we've yet to solve (i.e. method call counts for OSR methods), but I don't think that should block us from leveraging PGO data at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use PGO in optimize bools, then it should be used in if conversion too, given they are fairly interlinked.

I'd like to suggest doing that in a separate PR, as I suspect it'll require some back and too in deciding what the exact cut off points are. Whereas this PR should only be causing diffs to happen in return cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view if-conversion and this optimization as orthogonal optimizations: converting short circuiting conditionals into bitwise ops can be beneficial even on platforms that do not benefit from if conversion.

With that said, I am ok with not switching to utilize PGO information in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an issue here: #113080

(including if conversion, to save opening two issues)

m_t3 = nullptr;

bool foundEndOfOrConditions = false;
if (m_b1->FalseTargetIs(m_b2) && m_b2->FalseTargetIs(m_b1->GetTrueTarget()))
{
// Found the end of two (or more) conditions being ORed together.
// The final condition has been inverted.
foundEndOfOrConditions = true;
}
else if (m_b1->FalseTargetIs(m_b2) && m_b1->TrueTargetIs(m_b2->GetTrueTarget()))
{
// Found two conditions connected together.
}
else
bool reverseFirstCondition = false;
if (m_b3 == nullptr)
{
return false;
if (m_b1->FalseTargetIs(m_b2) && m_b2->FalseTargetIs(m_b1->GetTrueTarget()))
{
// Found the end of two (or more) conditions being ORed together.
// The final condition has been inverted.
reverseFirstCondition = true;
}
else if (m_b1->FalseTargetIs(m_b2) && m_b1->TrueTargetIs(m_b2->GetTrueTarget()))
{
// Found two conditions connected together.
}
else
{
return false;
}
}

Statement* const s1 = optOptimizeBoolsChkBlkCond();
Expand All @@ -990,9 +993,21 @@ bool OptBoolsDsc::optOptimizeCompareChainCondBlock()

assert(m_testInfo1.testTree->OperIs(GT_JTRUE));
GenTree* cond1 = m_testInfo1.testTree->gtGetOp1();
assert(m_testInfo2.testTree->OperIs(GT_JTRUE));
assert(m_testInfo2.testTree->OperIs((m_b3 == nullptr) ? GT_JTRUE : GT_RETURN));
GenTree* cond2 = m_testInfo2.testTree->gtGetOp1();

if (m_b3 != nullptr)
{
assert(m_b1->FalseTargetIs(m_b2) && m_b1->TrueTargetIs(m_b3));
GenTree* root = m_b3->firstStmt()->GetRootNode();
assert(root->OperIs(GT_RETURN));
GenTree* op1 = root->gtGetOp1();
if (op1 && op1->IsIntegralConst(0))
{
reverseFirstCondition = true;
}
}

// Ensure both conditions are suitable.
if (!cond1->OperIsCompare() || !cond2->OperIsCompare())
{
Expand Down Expand Up @@ -1031,8 +1046,8 @@ bool OptBoolsDsc::optOptimizeCompareChainCondBlock()
int op1Cost = cond1->GetCostEx();
int op2Cost = cond2->GetCostEx();
// The cost of combing three simple conditions is 32.
int maxOp1Cost = op1IsCondChain ? 31 : 7;
int maxOp2Cost = op2IsCondChain ? 31 : 7;
int maxOp1Cost = op1IsCondChain ? 31 : 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't increase the maxOpCost values then these changes won't really have much of an effect.
This is one of the tests I added.

        [MethodImpl(MethodImplOptions.NoInlining)]
        static bool CmpOptimizeBoolsReturn(int a, int lower, int upper)
        {
            //ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
            //ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, nzc, ge
            //ARM64-FULL-LINE: cset {{x[0-9]+}}, le
            return a >= lower && a <= upper;
        }

and when it gets to optOptimizeCompareChainCondBlock cond2 is this and it has a opCost of 8.

N003 (  8,  5) [000008] -----+-----                         *  EQ        int    $101
N001 (  3,  2) [000006] -----+-----                         +--*  LCL_VAR   int    V01 arg1         u:1 (last use) $81
N002 (  1,  2) [000007] -----+-----                         \--*  CNS_INT   int    1 $41

I'm increasing these values so that these bool optimisations can also happen on return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I originally added these costs, they were just best guesses to ensures we don't optimise for long chains. Tweaking it should be fine.

int maxOp2Cost = op2IsCondChain ? 31 : 10;

// Cost to allow for chain size of three.
if (op1Cost > maxOp1Cost || op2Cost > maxOp2Cost)
Expand All @@ -1048,14 +1063,14 @@ bool OptBoolsDsc::optOptimizeCompareChainCondBlock()
m_comp->fgRemoveStmt(m_b1, s1 DEBUGARG(isUnlink));

// Invert the condition.
if (foundEndOfOrConditions)
if (reverseFirstCondition)
{
GenTree* revCond = m_comp->gtReverseCond(cond1);
assert(cond1 == revCond); // Ensure `gtReverseCond` did not create a new node.
}

// Join the two conditions together
genTreeOps chainedOper = foundEndOfOrConditions ? GT_AND : GT_OR;
genTreeOps chainedOper = reverseFirstCondition ? GT_AND : GT_OR;
GenTree* chainedConditions = m_comp->gtNewOperNode(chainedOper, TYP_INT, cond1, cond2);
cond1->gtFlags &= ~GTF_RELOP_JMP_USED;
cond2->gtFlags &= ~GTF_RELOP_JMP_USED;
Expand Down Expand Up @@ -1157,7 +1172,7 @@ Statement* OptBoolsDsc::optOptimizeBoolsChkBlkCond()
}
else
{
if (!testTree2->OperIs(GT_RETURN, GT_SWIFT_ERROR_RET))
if (!testTree2->OperIs(GT_RETURN))
{
return nullptr;
}
Expand All @@ -1169,7 +1184,7 @@ Statement* OptBoolsDsc::optOptimizeBoolsChkBlkCond()
}

GenTree* testTree3 = s3->GetRootNode();
if (!testTree3->OperIs(GT_RETURN, GT_SWIFT_ERROR_RET))
if (!testTree3->OperIs(GT_RETURN))
{
return nullptr;
}
Expand Down Expand Up @@ -2011,6 +2026,16 @@ PhaseStatus Compiler::optOptimizeBools()
change = true;
numReturn++;
}
#ifdef TARGET_ARM64
else if (optBoolsDsc.optOptimizeCompareChainCondBlock())
{
// The optimization will have merged b1 and b2. Retry the loop so that
// b1 and b2->bbNext can be tested.
change = true;
retry = true;
numCond++;
}
#endif
}
else
{
Expand Down
79 changes: 79 additions & 0 deletions src/tests/JIT/opt/InstructionCombining/Cmp.cs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,31 @@ public static int CheckCompare()
fail = true;
}

if (!CmpOptimizeBoolsReturn(5, 3, 10))
{
fail = true;
}

if (!CmpOptimizeBoolsReturnOr(2, 3, 4))
{
fail = true;
}

if (CmpOptimizeBoolsIf(5, 3, 10) != 1)
{
fail = true;
}

if (CmpOptimizeBoolsIfOr(2, 3, 4) != 1)
{
fail = true;
}

if (!CmpMultipleOptimizeBools(1, 2, 3, 3, 5, 4))
{
fail = true;
}

if (fail)
{
return 101;
Expand Down Expand Up @@ -134,5 +159,59 @@ static bool CmpLargeShift64Bit(long a, long b)
}
return false;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static bool CmpOptimizeBoolsReturn(int a, int lower, int upper)
{
//ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, 0, ge
//ARM64-FULL-LINE: cset {{x[0-9]+}}, le
return a >= lower && a <= upper;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static bool CmpOptimizeBoolsReturnOr(int a, int lower, int upper)
{
//ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, nzc, lt
//ARM64-FULL-LINE: cset {{x[0-9]+}}, le
return a >= lower || a <= upper;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static int CmpOptimizeBoolsIf(int a, int lower, int upper)
{
//ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, 0, ge
//ARM64-FULL-LINE: csel {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, gt
if (a >= lower && a <= upper)
{
return 1;
}
return -1;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static int CmpOptimizeBoolsIfOr(int a, int lower, int upper)
{
//ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, nzc, lt
//ARM64-FULL-LINE: csel {{w[0-9]+}}, {{w[0-9]+}}, {{w[0-9]+}}, gt
if (a >= lower || a <= upper)
{
return 1;
}
return -1;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static bool CmpMultipleOptimizeBools(int a, int b, int c, int d, int e, int f)
{
//ARM64-FULL-LINE: cmp {{w[0-9]+}}, {{w[0-9]+}}
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, 0, lt
//ARM64-FULL-LINE: ccmp {{w[0-9]+}}, {{w[0-9]+}}, nzc, le
//ARM64-FULL-LINE: cset {{x[0-9]+}}, gt
return (a < b) && (c <= d) && (e > f);
}
}
}
Loading