Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Implement ShiftLeftLogical, ShiftRightArithmetic, ShiftRightLogical #104119

Merged
merged 12 commits into from
Jun 28, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #99957. As discussed offline with @kunalspathak, some of the newly-added tests are failing under JitStress scenarios due to known callee-save register issues. Other than those cases, all tests are passing under the other stress modes -- the output is large, but I'm happy to share it if it's of interest.

These new APIs are unique in that their operands can differ in vector widths. To avoid introducing a new test template, I tweaked the existing vector binop test template to parse the second operand's type correctly, and added some preprocessor logic to conditionally compile certain tests that require the operand types to be the same. It looks a bit hacky, but it doesn't affect existing tests that have matching operand types.

@dotnet/arm64-contrib PTAL, thanks!

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@@ -558,6 +558,28 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
{
assert(instrIsRMW);

insScalableOpts sopt;

switch (intrinEmbMask.id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the best place to put this switch statement, but these intrinsics need to be wrapped with ConditionalSelect, so implementing special codegen for them off this path doesn't seem feasible without some more invasive refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do similar handling for ConvertTo for numOperands == 1 case.

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 28, 2024
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Added some questions around testing.

/// LSL Ztied1.S, Pg/M, Ztied1.S, Zop2.S
/// svuint32_t svlsl[_u32]_x(svbool_t pg, svuint32_t op1, svuint32_t op2)
/// LSL Ztied1.S, Pg/M, Ztied1.S, Zop2.S
/// LSLR Ztied2.S, Pg/M, Ztied2.S, Zop1.S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are doing LSLR. Can you please remove it from summary docs from this and file?

/// ASR Ztied1.H, Pg/M, Ztied1.H, Zop2.H
/// svint16_t svasr[_s16]_x(svbool_t pg, svint16_t op1, svuint16_t op2)
/// ASR Ztied1.H, Pg/M, Ztied1.H, Zop2.H
/// ASRR Ztied2.H, Pg/M, Ztied2.H, Zop1.H
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are actually bunch of places where we have also added "XXXR". please remove it.

("SveVecBinOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_ShiftLeftLogical_ushort_ulong", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "ShiftLeftLogical", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt16", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt16", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "UInt64", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetUInt16()", ["NextValueOp2"] = "TestLibrary.Generator.GetUInt64()", ["ValidateIterResult"] = "Helpers.ShiftLeft<ushort>(left[i], right[(i * sizeof(ushort)) / sizeof(ulong)]) != result[i]", ["GetIterResult"] = "Helpers.ShiftLeft<ushort>(left[i], right[(i * sizeof(ushort)) / sizeof(ulong)])"}),
("SveVecBinOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_ShiftLeftLogical_uint_ulong", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "ShiftLeftLogical", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt32", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt32", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "UInt64", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetUInt32()", ["NextValueOp2"] = "TestLibrary.Generator.GetUInt64()", ["ValidateIterResult"] = "Helpers.ShiftLeft<uint>(left[i], right[(i * sizeof(uint)) / sizeof(ulong)]) != result[i]", ["GetIterResult"] = "Helpers.ShiftLeft<uint>(left[i], right[(i * sizeof(uint)) / sizeof(ulong)])"}),

("SveVecBinOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_ShiftRightArithmetic_sbyte_byte", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "ShiftRightArithmetic", ["RetVectorType"] = "Vector", ["RetBaseType"] = "SByte", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "SByte", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "Byte", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetSByte()", ["NextValueOp2"] = "TestLibrary.Generator.GetByte()", ["ValidateIterResult"] = "Helpers.ShiftRight<sbyte>(left[i], (ulong)right[i]) != result[i]", ["GetIterResult"] = "Helpers.ShiftRight<sbyte>(left[i], (ulong)right[i])"}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to see something like Helpers.ShiftRightArithmetic for validation for these APIs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic ShiftRight helper will do either arithmetic or logical shift, depending on whether the first operand is signed or not since it's just doing division. I can write a ShiftRightArithmetic wrapper for ShiftRight that only takes signed types for the first operand to make the API surface clearer, though I was hoping to avoid having to write multiple overloads for sbyte, short, int, etc -- which would you prefer?

@@ -1943,6 +1943,28 @@ private static int HighNarrowing(long op1, bool round)

public static long MultiplyWideningUpperAndSubtract(long[] op1, int[] op2, int[] op3, int i) => MultiplyWideningAndSubtract(op1[i], op2[i + op2.Length / 2], op3[i + op3.Length / 2]);

public static T ShiftLeft<T>(T op1, ulong op2) where T : INumber<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the existing Helpers.ShiftLeftLogical()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think those match the functionality of these APIs: The existing UnsignedShift seems to only consider the first byte of the shift value, regardless of the size of the input, whereas the SVE APIs consider the whole second operand (docs). I agree the code duplication isn't ideal -- I'm happy to rename the new helpers to something more discerning, if you'd like.

// If they aren't the same, the #undef won't do anything,
// and the first operand type symbol will remain defined.
// We can then conditionally compile on the condition #if !{Op1VectorType}{Op1BaseType}, etc.
#define {Op1VectorType}{Op1BaseType}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent is right and I suggested something along these lines at one point, but looking at this, it feels little odd and might be hard to maintain, specially because we are not careful in setting Op1BaseType and Op2BaseType. Lets add a new template (if you did not find the one to reuse) and we will clean up them once we are done with .NET 9 implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; I copied my changes to _SveBinaryOpTestTemplate over to _SveBinaryOpDifferentTypesTestTemplate (minus the conditional compilation stuff), and reverted my changes to the former.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jun 28, 2024

@kunalspathak stress tests still pass (minus the known callee-save issues) with the template changes.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants