Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARM64 encodings for groups IF_SVE_HO,HP,HS #99058

Merged
merged 7 commits into from
Mar 11, 2024

Conversation

snickolls-arm
Copy link
Contributor

Covers many floating point - integer conversion and floating point precision conversion variants.

N.B. I had to change a few of the generated groups for these, some of the _K-M groups were added manually as the generator missed the half precision variants.

Matching capstone output:

bfcvt z3.h, p2/m, z9.s
fcvt z7.d, p7/m, z1.s
fcvtx z2.s, p0/m, z6.d
fcvt z29.s, p3/m, z12.d
fcvt z0.h, p4/m, z13.d
fcvt z1.d, p5/m, z14.h
fcvt z2.h, p6/m, z15.s
fcvt z3.s, p7/m, z16.h
fcvtzs z9.s, p1/m, z3.s
fcvtzu z3.s, p2/m, z10.s
fcvtzs z5.d, p0/m, z24.s
fcvtzu z10.d, p7/m, z1.s
fcvtzs z12.s, p3/m, z6.d
fcvtzu z4.s, p3/m, z13.d
fcvtzs z2.d, p1/m, z17.d
fcvtzu z22.d, p6/m, z4.d
fcvtzs z3.h, p2/m, z18.h
fcvtzu z23.h, p7/m, z5.h
fcvtzs z4.s, p3/m, z19.h
fcvtzu z24.s, p0/m, z6.h
fcvtzs z5.d, p4/m, z20.h
fcvtzu z25.d, p1/m, z7.h
scvtf z19.s, p2/m, z8.s
ucvtf z17.s, p6/m, z11.s
scvtf z1.d, p5/m, z19.s
ucvtf z3.d, p3/m, z20.s
scvtf z4.s, p0/m, z14.d
ucvtf z8.s, p1/m, z7.d
scvtf z0.d, p0/m, z0.d
ucvtf z8.d, p4/m, z9.d
scvtf z12.h, p5/m, z14.h
ucvtf z13.h, p6/m, z15.h
scvtf z14.h, p7/m, z16.s
ucvtf z15.h, p0/m, z17.s
scvtf z16.h, p1/m, z18.d
ucvtf z17.h, p2/m, z19.d

Contributing towards #94549

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 28, 2024
@ghost
Copy link

ghost commented Feb 28, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Covers many floating point - integer conversion and floating point precision conversion variants.

N.B. I had to change a few of the generated groups for these, some of the _K-M groups were added manually as the generator missed the half precision variants.

Matching capstone output:

bfcvt z3.h, p2/m, z9.s
fcvt z7.d, p7/m, z1.s
fcvtx z2.s, p0/m, z6.d
fcvt z29.s, p3/m, z12.d
fcvt z0.h, p4/m, z13.d
fcvt z1.d, p5/m, z14.h
fcvt z2.h, p6/m, z15.s
fcvt z3.s, p7/m, z16.h
fcvtzs z9.s, p1/m, z3.s
fcvtzu z3.s, p2/m, z10.s
fcvtzs z5.d, p0/m, z24.s
fcvtzu z10.d, p7/m, z1.s
fcvtzs z12.s, p3/m, z6.d
fcvtzu z4.s, p3/m, z13.d
fcvtzs z2.d, p1/m, z17.d
fcvtzu z22.d, p6/m, z4.d
fcvtzs z3.h, p2/m, z18.h
fcvtzu z23.h, p7/m, z5.h
fcvtzs z4.s, p3/m, z19.h
fcvtzu z24.s, p0/m, z6.h
fcvtzs z5.d, p4/m, z20.h
fcvtzu z25.d, p1/m, z7.h
scvtf z19.s, p2/m, z8.s
ucvtf z17.s, p6/m, z11.s
scvtf z1.d, p5/m, z19.s
ucvtf z3.d, p3/m, z20.s
scvtf z4.s, p0/m, z14.d
ucvtf z8.s, p1/m, z7.d
scvtf z0.d, p0/m, z0.d
ucvtf z8.d, p4/m, z9.d
scvtf z12.h, p5/m, z14.h
ucvtf z13.h, p6/m, z15.h
scvtf z14.h, p7/m, z16.s
ucvtf z15.h, p0/m, z17.s
scvtf z16.h, p1/m, z18.d
ucvtf z17.h, p2/m, z19.d

Contributing towards #94549

Author: snickolls-arm
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@snickolls-arm
Copy link
Contributor Author

@a74nh @kunalspathak @dotnet/arm64-contrib

Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All LGTM, but @kunalspathak needs to comment on the table changes.

@@ -239,6 +239,43 @@ INST7(ld1sw, "ld1sw", 0, IF_SV
// LD1SW {<Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D] SVE_IU_4B_B 11000101010mmmmm 100gggnnnnnttttt C540 8000
// LD1SW {<Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}] SVE_IV_3A 11000101001iiiii 100gggnnnnnttttt C520 8000

// enum name info SVE_HP_3B SVE_HP_3B_H SVE_HP_3B_I SVE_HP_3B_J SVE_HP_3B_K SVE_HP_3B_L SVE_HP_3B_M
INST7(fcvtzs, "fcvtzs", 0, IF_SVE_7B, 0x659CA000, 0x65DCA000, 0x65D8A000, 0x65DEA000, 0x655AA000, 0x655CA000, 0x655EA000 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The H variant was missing from all these encodings. The changes here look fine to me, but I'll let @kunalspathak comment

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 29, 2024
@kunalspathak
Copy link
Member

kunalspathak commented Feb 29, 2024

Looking at FCVT, here is the information we have:

bit 22 is "0":

  • Half-precision to single-precision
  • Single-precision to half-precision

bit 16 is "0":

  • Single-precision to half-precision
  • Double-precision to half-precision
  • Double-precision to single-precision

Basically, bit 16 is "0" when we are downgrading the size and bit 22 is "0" when half <--> single conversion is involved.
Given that information, we can just have 1 format of 6588 A000 named SVE_HO_3A and delete SVE_HO_3A_B and all the new ones we added in this PR. Then during encoding, based on the conversion option, we can set the right bits. This is one of the complicated encoding that the tool had hard time parsing the information and converting it to the encoding format/names.

I didn't look deeply, but this should be applicable for other instructions as well like fcvtzs, scvtf, etc. that this PR touches.


For completion, here are the scenarios when these bits are "1":

bit 22 is "1":

  • Half-precision to double-precision
  • Single-precision to double-precision
  • Double-precision to half-precision
  • Double-precision to single-precision

bit 16 is "1":

  • Half-precision to single-precision
  • Half-precision to double-precision
  • Single-precision to double-precision

@snickolls-arm
Copy link
Contributor Author

Looking at FCVT, here is the information we have:

bit 22 is "0":

  • Half-precision to single-precision
  • Single-precision to half-precision

bit 16 is "0":

  • Single-precision to half-precision
  • Double-precision to half-precision
  • Double-precision to single-precision

Basically, bit 16 is "0" when we are downgrading the size and bit 22 is "0" when half <--> single conversion is involved. Given that information, we can just have 1 format of 6588 A000 named SVE_HO_3A and delete SVE_HO_3A_B and all the new ones we added in this PR. Then during encoding, based on the conversion option, we can set the right bits. This is one of the complicated encoding that the tool had hard time parsing the information and converting it to the encoding format/names.

I didn't look deeply, but this should be applicable for other instructions as well like fcvtzs, scvtf, etc. that this PR touches.

For completion, here are the scenarios when these bits are "1":

bit 22 is "1":

  • Half-precision to double-precision
  • Single-precision to double-precision
  • Double-precision to half-precision
  • Double-precision to single-precision

bit 16 is "1":

  • Half-precision to single-precision
  • Half-precision to double-precision
  • Single-precision to double-precision

You would also need to look at flipping bits 17 as well for FCVT. I've had a look across the different instructions and they have fairly inconsistent patterns between them for when you need to flip these bits, for example on SCVTF and UCVTF you also have to flip the 23rd bit sometimes and it is flipping 18:17 rather than 17:16. I'm wondering if this might be quite complex/branch-heavy? @a74nh any thoughts on this?

@kunalspathak
Copy link
Member

You would also need to look at flipping bits 17 as well for FCVT

Right and they are "1" when double <--> single conversion is involved. The point is we want to generalize and reduce the number of formats that we introduce just because they are difficult to maintain. As long as the position of bit pattern where we embed registers/special codes like size, element size is same, we try to give it same format name. In this case, regardless of which conversion we are on, Pg, Zn and Zd will go at the same bit positions. If you see the NEON FCVT, we have a single entry of it:

INST1(fcvt, "fcvt", 0, IF_DV_2J, 0x1E224000)
// fcvt Vd,Vn DV_2J 00011110SS10001D D10000nnnnnddddd 1E22 4000 Vd,Vn

and then, during encoding, we embed the right bits depending on the conversion:

https://github.com/dotnet/runtime/blob/049da221a7f0e38eb92ef72fe0611b9b72e513da/src/coreclr/jit/emitarm64.cpp#L18989-L19017

@a74nh
Copy link
Contributor

a74nh commented Mar 1, 2024

You would also need to look at flipping bits 17 as well for FCVT. I've had a look across the different instructions and they have fairly inconsistent patterns between them for when you need to flip these bits, for example on SCVTF and UCVTF you also have to flip the 23rd bit sometimes and it is flipping 18:17 rather than 17:16. I'm wondering if this might be quite complex/branch-heavy? @a74nh any thoughts on this?

I think this would be: Update the hardcoded group table to have all those bits clear. Then during encoded potentially set some of those bits. For each group, you'd have 4 cases (B/H/S/D) to check, and then each case changes up to 4 bits. That's a little messy, but I think we have worse cases.

I do think adding the groups (how you've done it) looks better in the code, but, I'm not sure on how fixed we are on keeping the tables unchanged. If so, then it'll need doing the bit twiddling way.

// BFCVT <Zd>.H, <Pg>/M, <Zn>.S SVE_HO_3A 0110010110001010 101gggnnnnnddddd 658A A000

// enum name info SVE_HO_3B
INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6508A000)
Copy link
Member

@kunalspathak kunalspathak Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit 23 is always 1 and during the encoding, you are not setting it for all conversions. I am wondering with the current changes, is the encoding matching with capstone?

Suggested change
INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6508A000)
INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6588A000)

Copy link
Contributor Author

@snickolls-arm snickolls-arm Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being set but not always explicitly, under (3 << 22) in some of the cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see the changes here, they are always getting set to 1:

image

which matches the requirement:

image

So my suggestion would be to just encode 1 at bit 23 in the instrsarm64sve.h.

// FCVT <Zd>.D, <Pg>/M, <Zn>.S SVE_HO_3B 0110010100001000 101gggnnnnnddddd 6508 A000

// enum name info SVE_HO_3C
INST1(fcvtx, "fcvtx", 0, IF_SVE_HO_3C, 0x650AA000 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, fcvtx and bfcvt technically can have share the same format name of IF_SVE_HO_3A unless you found a reason of have them different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have kept them separate is because these variants only accept one combination of operand sizes so I consider them to have different format to avoid dealing with special cases after the group has been decided upon.

// enum name info SVE_HP_3A
INST1(flogb, "flogb", 0, IF_SVE_HP_3A, 0x6518A000 )
// FLOGB <Zd>.<T>, <Pg>/M, <Zn>.<T> SVE_HP_3A 0110010100011xx0 101gggnnnnnddddd 6518 A000

// enum name info SVE_HP_3B
INST1(fcvtzs, "fcvtzs", 0, IF_SVE_HP_3B, 0x6518A000)
Copy link
Member

@kunalspathak kunalspathak Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise for fcvt, fcvtzs and fcvtzu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because fcvtz[s,u] has bits 18 and 17 varying with operand size, but fcvt has bits 17 and 16 varying. They also accept different sizes of operands, for example fcvt will accept D=>H but fcvtz[s,u] will not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I agree.

theEmitter->emitIns_R_R_R(INS_sve_bfcvt, EA_SCALABLE, REG_V3, REG_P2, REG_V9,
INS_OPTS_S_TO_H); // BFCVT <Zd>.H, <Pg>/M, <Zn>.S

// IF_SVE_HO_3A_B
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// IF_SVE_HO_3A_B
// IF_SVE_HO_3B

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise most of the comments in this file needs to change to highlight the updated format name. e.g. IF_SVE_HP_3B_H no longer exist.

// BFCVT <Zd>.H, <Pg>/M, <Zn>.S SVE_HO_3A 0110010110001010 101gggnnnnnddddd 658A A000

// enum name info SVE_HO_3B
INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6508A000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see the changes here, they are always getting set to 1:

image

which matches the requirement:

image

So my suggestion would be to just encode 1 at bit 23 in the instrsarm64sve.h.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak kunalspathak merged commit 4983d9e into dotnet:main Mar 11, 2024
129 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants