Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Arm64/SVE: Implemented AddRotateComplex and AddSequentialAcross #104258

Closed
wants to merge 39 commits into from

Conversation

ebepho
Copy link
Contributor

@ebepho ebepho commented Jul 1, 2024

Contributes to #99957

Stress test output:

Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddRotateComplex_float() : 8
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddRotateComplex_double() : 8
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddRotateComplex_float() : 8
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddRotateComplex_double() : 8
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

cc @dotnet/arm64-contrib

ebepho and others added 30 commits June 5, 2024 14:30
updating branch due to assert issue.
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@ebepho ebepho added the arm-sve Work related to arm64 SVE/SVE2 support label Jul 2, 2024
@ebepho ebepho changed the title (WIP) Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross Jul 2, 2024
@ebepho ebepho changed the title Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross Arm64/SVE: Implemented AddRotateAcross Jul 2, 2024
@ebepho ebepho changed the title Arm64/SVE: Implemented AddRotateAcross Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross Jul 3, 2024
@ebepho ebepho marked this pull request as draft July 3, 2024 00:10
@ebepho ebepho changed the title Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross (WIP) Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross Jul 3, 2024
@@ -8391,13 +8391,13 @@ void CodeGen::genArm64EmitterUnitTestsSve()
INS_OPTS_SCALABLE_D); // ST1B {<Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

// IF_SVE_GP_3A
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 90,
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 0,
Copy link
Member

@amanasifkhalid amanasifkhalid Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak at the API level, do we want users to pass the actual angle value (90, 180, etc) for the immediate? If so, we might have to do some awkward transformations throughout the JIT's phases to get this to work:

  • If we need to generate a switch table of all immediate values (in case the user doesn't pass a constant), HWIntrinsicImmOpHelper expects the immediates to be contiguous, like [0, 3]. If the possible values are 90, 180, etc., we'll need some special handling there to pass the correct immediates to emitIns_R_R_R_I.
  • For FCADD, emitIns_R_R_R_I expects us to pass the immediate as an angle value; it then converts the value to its bitwise representation [0, 3] internally. If we streamline this so we can just pass the immediate in its bitwise form to emitIns_R_R_R_I, that might simplify the logic elsewhere in the JIT. For example, the bounds for this intrinsic in lookupImmBounds would be [0, 3], and we'd just have to transform the user's input to this form somewhere in the JIT -- perhaps during importation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is bit tricky. The only acceptable and valid values for FCADD is 90 or 270 and that's what we expect user to pass. All the other values are invalid and we should probably throw ArgumentOutOfRangeException. Also, since the values are not contiguous, we might not be able to use the generic table generation logic. It is meant for the contiguous value. For this API, we want something like this:

if (IsConstant(rot) && (rot == 90) || (rot == 270))
{    
    fcadd ... // here we will embed 0 or 1, depending on if the rot is 90 or 270
}
else
{
    // generate fallback
}

// fallback codegen
rot = ... // either constant or from variable
if (rot == 90)
{
    fcadd ...0 // '0' to specify rotation is 90
}
else if (rot == 270)
{
    fcadd ...1 // '1' to specify rotation is 270
}
else
{
    throw ArgumentOutOfRangeException();
}

@tannergooding - I don't believe we have API that has such restriction about the input value, do we? For eg. I don't see we have implemented AdvSimd's FcAdd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant to put a comment here, but spoke with @tannergooding offline and the right thing to do here is to handle the fallback in C# level, something like:

if (cns == 90) { AddRotateComplex(..., 90) }
else if (cns == 270) { .... }
else { throw }

and then in rationalizer, make sure to the argument is indeed a constant and is "in bounds", before we rewrite it back to the call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebepho @amanasifkhalid - let me know if you need anything else to move this further.

@amanasifkhalid
Copy link
Member

@ebepho try syncing your fork with dotnet/runtime, then running git pull on your main branch, and then git merge main on your add branch. I think other PRs have already added in the templates you're trying to add in this one.

@amanasifkhalid amanasifkhalid changed the title (WIP) Arm64/SVE: Implemented AddRotateAcross and AddSequentialAcross (WIP) Arm64/SVE: Implemented AddRotateComplex and AddSequentialAcross Jul 5, 2024
@amanasifkhalid
Copy link
Member

AddSequentialAcross is implemented here: #104640

@ebepho ebepho closed this Jul 10, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants