Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#78303 Add transformation ~v1 & v2 to VectorXxx.AndNot(v1, v2) #81993

Merged
merged 20 commits into from
Sep 4, 2023
Merged
Changes from 4 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions src/coreclr/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10869,7 +10869,71 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node)
INDEBUG(node->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);
return node;
}
#if defined(TARGET_XARCH)
case NI_SSE_And:
case NI_SSE2_And:
case NI_AVX_And:
case NI_AVX2_And:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Vector128/256_And and AdvSimd ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector64/128/256_And don't exist outside of import at the moment so they don't need to be handled.

AdvSimd should be since we want parity between xarch and arm.

{
if (node->GetOperandCount() != 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when exactly it might be not 2 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't ever not be 2. If it was, we'd have a buggy node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an assert then instead?

{
return node;
}

GenTree* op1 = node->Op(1);
GenTree* op2 = node->Op(2);
GenTree* lhs = nullptr;
GenTree* rhs = nullptr;
GenTreeHWIntrinsic* inner_hw = nullptr;

// Transforms ~v1 & v2 to VectorXxx.AndNot(v2, v1)
if (op1->OperIs(GT_HWINTRINSIC))
{
rhs = op2;
inner_hw = op1->AsHWIntrinsic();
}
// Transforms v2 & (~v1) to VectorXxx.AndNot(v1, v2)
else if (op2->OperIs(GT_HWINTRINSIC))
{
rhs = op1;
inner_hw = op2->AsHWIntrinsic();
}
else
{
return node;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to miss the optimization for cases like: ((x & y) & ~z)

You're going to need to check that it is a hwintrinsic and that it is the relevant xor (xarch and arm) or not (arm only) node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also potentially a concern around side effects and ensuring that ~x & y, which must be represented as gtNewSimdBinOpNode(AND_NOT, y, x, ...) preserves side effects with regards to x being evaluted before y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have resolved some comments and pushed them to make sure I got you right.
Could you please give me a hint how to treat not for arm and how to test it and how to handle the sideeffect case?


if ((inner_hw->GetOperandCount() != 2) || (!inner_hw->Op(2)->IsVectorAllBitsSet()))
{
return node;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to check this as part of handling _Xor below, that way you don't need to check the operand count and its easier for the general logic to support AdvSimd_Not on Arm64.


switch (inner_hw->GetHWIntrinsicId())
{
case NI_SSE_Xor:
case NI_SSE2_Xor:
case NI_AVX_Xor:
case NI_AVX2_Xor:
break;
default:
return node;
}

var_types hw_type = node->TypeGet();
CorInfoType hw_coretype = node->GetSimdBaseJitType();
unsigned int hw_simdsize = node->GetSimdSize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We refer to these as just simdType, simdBaseJitType, and simdSize almost everywhere else in the JIT.


lhs = inner_hw->Op(1);

GenTree* andnNode = gtNewSimdBinOpNode(GT_AND_NOT, hw_type, lhs, rhs, hw_coretype, hw_simdsize, true);

DEBUG_DESTROY_NODE(node);

INDEBUG(andnNode->gtDebugFlags |= GTF_DEBUG_NODE_MORPHED);

return andnNode;
}
#endif
default:
{
break;
Expand Down