Add ARM64 encodings for groups IF_SVE_HO,HP,HS #99058

snickolls-arm · 2024-02-28T17:23:34Z

Covers many floating point - integer conversion and floating point precision conversion variants.

N.B. I had to change a few of the generated groups for these, some of the _K-M groups were added manually as the generator missed the half precision variants.

Matching capstone output:

bfcvt z3.h, p2/m, z9.s
fcvt z7.d, p7/m, z1.s
fcvtx z2.s, p0/m, z6.d
fcvt z29.s, p3/m, z12.d
fcvt z0.h, p4/m, z13.d
fcvt z1.d, p5/m, z14.h
fcvt z2.h, p6/m, z15.s
fcvt z3.s, p7/m, z16.h
fcvtzs z9.s, p1/m, z3.s
fcvtzu z3.s, p2/m, z10.s
fcvtzs z5.d, p0/m, z24.s
fcvtzu z10.d, p7/m, z1.s
fcvtzs z12.s, p3/m, z6.d
fcvtzu z4.s, p3/m, z13.d
fcvtzs z2.d, p1/m, z17.d
fcvtzu z22.d, p6/m, z4.d
fcvtzs z3.h, p2/m, z18.h
fcvtzu z23.h, p7/m, z5.h
fcvtzs z4.s, p3/m, z19.h
fcvtzu z24.s, p0/m, z6.h
fcvtzs z5.d, p4/m, z20.h
fcvtzu z25.d, p1/m, z7.h
scvtf z19.s, p2/m, z8.s
ucvtf z17.s, p6/m, z11.s
scvtf z1.d, p5/m, z19.s
ucvtf z3.d, p3/m, z20.s
scvtf z4.s, p0/m, z14.d
ucvtf z8.s, p1/m, z7.d
scvtf z0.d, p0/m, z0.d
ucvtf z8.d, p4/m, z9.d
scvtf z12.h, p5/m, z14.h
ucvtf z13.h, p6/m, z15.h
scvtf z14.h, p7/m, z16.s
ucvtf z15.h, p0/m, z17.s
scvtf z16.h, p1/m, z18.d
ucvtf z17.h, p2/m, z19.d

Contributing towards #94549

ghost · 2024-02-28T17:23:42Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Covers many floating point - integer conversion and floating point precision conversion variants.

N.B. I had to change a few of the generated groups for these, some of the _K-M groups were added manually as the generator missed the half precision variants.

Matching capstone output:

bfcvt z3.h, p2/m, z9.s
fcvt z7.d, p7/m, z1.s
fcvtx z2.s, p0/m, z6.d
fcvt z29.s, p3/m, z12.d
fcvt z0.h, p4/m, z13.d
fcvt z1.d, p5/m, z14.h
fcvt z2.h, p6/m, z15.s
fcvt z3.s, p7/m, z16.h
fcvtzs z9.s, p1/m, z3.s
fcvtzu z3.s, p2/m, z10.s
fcvtzs z5.d, p0/m, z24.s
fcvtzu z10.d, p7/m, z1.s
fcvtzs z12.s, p3/m, z6.d
fcvtzu z4.s, p3/m, z13.d
fcvtzs z2.d, p1/m, z17.d
fcvtzu z22.d, p6/m, z4.d
fcvtzs z3.h, p2/m, z18.h
fcvtzu z23.h, p7/m, z5.h
fcvtzs z4.s, p3/m, z19.h
fcvtzu z24.s, p0/m, z6.h
fcvtzs z5.d, p4/m, z20.h
fcvtzu z25.d, p1/m, z7.h
scvtf z19.s, p2/m, z8.s
ucvtf z17.s, p6/m, z11.s
scvtf z1.d, p5/m, z19.s
ucvtf z3.d, p3/m, z20.s
scvtf z4.s, p0/m, z14.d
ucvtf z8.s, p1/m, z7.d
scvtf z0.d, p0/m, z0.d
ucvtf z8.d, p4/m, z9.d
scvtf z12.h, p5/m, z14.h
ucvtf z13.h, p6/m, z15.h
scvtf z14.h, p7/m, z16.s
ucvtf z15.h, p0/m, z17.s
scvtf z16.h, p1/m, z18.d
ucvtf z17.h, p2/m, z19.d

Contributing towards #94549

Author:	snickolls-arm
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

snickolls-arm · 2024-02-28T17:24:02Z

@a74nh @kunalspathak @dotnet/arm64-contrib

a74nh

All LGTM, but @kunalspathak needs to comment on the table changes.

a74nh · 2024-02-29T09:45:13Z

src/coreclr/jit/instrsarm64sve.h

@@ -239,6 +239,43 @@ INST7(ld1sw,             "ld1sw",                 0,                       IF_SV
    // LD1SW   {<Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]                                      SVE_IU_4B_B         11000101010mmmmm 100gggnnnnnttttt     C540 8000   
    // LD1SW   {<Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]                                     SVE_IV_3A           11000101001iiiii 100gggnnnnnttttt     C520 8000   

+//    enum               name                     info                                              SVE_HP_3B        SVE_HP_3B_H      SVE_HP_3B_I      SVE_HP_3B_J  SVE_HP_3B_K   SVE_HP_3B_L    SVE_HP_3B_M 
+INST7(fcvtzs,            "fcvtzs",                0,                       IF_SVE_7B,               0x659CA000,      0x65DCA000,      0x65D8A000,      0x65DEA000,  0x655AA000,   0x655CA000,    0x655EA000       )


The H variant was missing from all these encodings. The changes here look fine to me, but I'll let @kunalspathak comment

kunalspathak · 2024-02-29T16:48:26Z

Looking at FCVT, here is the information we have:

bit 22 is "0":

Half-precision to single-precision
Single-precision to half-precision

bit 16 is "0":

Single-precision to half-precision
Double-precision to half-precision
Double-precision to single-precision

Basically, bit 16 is "0" when we are downgrading the size and bit 22 is "0" when half <--> single conversion is involved.
Given that information, we can just have 1 format of 6588 A000 named SVE_HO_3A and delete SVE_HO_3A_B and all the new ones we added in this PR. Then during encoding, based on the conversion option, we can set the right bits. This is one of the complicated encoding that the tool had hard time parsing the information and converting it to the encoding format/names.

I didn't look deeply, but this should be applicable for other instructions as well like fcvtzs, scvtf, etc. that this PR touches.

For completion, here are the scenarios when these bits are "1":

bit 22 is "1":

Half-precision to double-precision
Single-precision to double-precision
Double-precision to half-precision
Double-precision to single-precision

bit 16 is "1":

Half-precision to single-precision
Half-precision to double-precision
Single-precision to double-precision

snickolls-arm · 2024-03-01T14:52:16Z

Looking at FCVT, here is the information we have:

bit 22 is "0":

Half-precision to single-precision

Single-precision to half-precision

bit 16 is "0":

Single-precision to half-precision

Double-precision to half-precision

Double-precision to single-precision

Basically, bit 16 is "0" when we are downgrading the size and bit 22 is "0" when half <--> single conversion is involved. Given that information, we can just have 1 format of 6588 A000 named SVE_HO_3A and delete SVE_HO_3A_B and all the new ones we added in this PR. Then during encoding, based on the conversion option, we can set the right bits. This is one of the complicated encoding that the tool had hard time parsing the information and converting it to the encoding format/names.

I didn't look deeply, but this should be applicable for other instructions as well like fcvtzs, scvtf, etc. that this PR touches.

For completion, here are the scenarios when these bits are "1":

bit 22 is "1":

Half-precision to double-precision

Single-precision to double-precision

Double-precision to half-precision

Double-precision to single-precision

bit 16 is "1":

Half-precision to single-precision

Half-precision to double-precision

Single-precision to double-precision

You would also need to look at flipping bits 17 as well for FCVT. I've had a look across the different instructions and they have fairly inconsistent patterns between them for when you need to flip these bits, for example on SCVTF and UCVTF you also have to flip the 23rd bit sometimes and it is flipping 18:17 rather than 17:16. I'm wondering if this might be quite complex/branch-heavy? @a74nh any thoughts on this?

kunalspathak · 2024-03-01T15:36:19Z

You would also need to look at flipping bits 17 as well for FCVT

Right and they are "1" when double <--> single conversion is involved. The point is we want to generalize and reduce the number of formats that we introduce just because they are difficult to maintain. As long as the position of bit pattern where we embed registers/special codes like size, element size is same, we try to give it same format name. In this case, regardless of which conversion we are on, Pg, Zn and Zd will go at the same bit positions. If you see the NEON FCVT, we have a single entry of it:

runtime/src/coreclr/jit/instrsarm64.h

Lines 1731 to 1732 in 049da22

    
           INST1(fcvt,        "fcvt",         0,      IF_DV_2J,  0x1E224000) 
        
                                              //  fcvt    Vd,Vn                DV_2J  00011110SS10001D D10000nnnnnddddd   1E22 4000   Vd,Vn

and then, during encoding, we embed the right bits depending on the conversion:

https://github.com/dotnet/runtime/blob/049da221a7f0e38eb92ef72fe0611b9b72e513da/src/coreclr/jit/emitarm64.cpp#L18989-L19017

a74nh · 2024-03-01T15:44:59Z

You would also need to look at flipping bits 17 as well for FCVT. I've had a look across the different instructions and they have fairly inconsistent patterns between them for when you need to flip these bits, for example on SCVTF and UCVTF you also have to flip the 23rd bit sometimes and it is flipping 18:17 rather than 17:16. I'm wondering if this might be quite complex/branch-heavy? @a74nh any thoughts on this?

I think this would be: Update the hardcoded group table to have all those bits clear. Then during encoded potentially set some of those bits. For each group, you'd have 4 cases (B/H/S/D) to check, and then each case changes up to 4 bits. That's a little messy, but I think we have worse cases.

I do think adding the groups (how you've done it) looks better in the code, but, I'm not sure on how fixed we are on keeping the tables unchanged. If so, then it'll need doing the bit twiddling way.

kunalspathak · 2024-03-05T14:57:10Z

src/coreclr/jit/instrsarm64sve.h

+    // BFCVT   <Zd>.H, <Pg>/M, <Zn>.S                                                    SVE_HO_3A           0110010110001010 101gggnnnnnddddd     658A A000   
+
+//    enum               name                     info                                              SVE_HO_3B
+INST1(fcvt,              "fcvt",                  0,                       IF_SVE_HO_3B,            0x6508A000)


bit 23 is always 1 and during the encoding, you are not setting it for all conversions. I am wondering with the current changes, is the encoding matching with capstone?

Suggested change

INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6508A000)

INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6588A000)

It is being set but not always explicitly, under (3 << 22) in some of the cases.

If I see the changes here, they are always getting set to 1:

which matches the requirement:

So my suggestion would be to just encode 1 at bit 23 in the instrsarm64sve.h.

kunalspathak · 2024-03-05T15:11:09Z

src/coreclr/jit/instrsarm64sve.h

+    // FCVT    <Zd>.D, <Pg>/M, <Zn>.S                                                    SVE_HO_3B           0110010100001000 101gggnnnnnddddd     6508 A000   
+
+//    enum               name                     info                                              SVE_HO_3C
+INST1(fcvtx,             "fcvtx",                 0,                       IF_SVE_HO_3C,            0x650AA000                                   )


also, fcvtx and bfcvt technically can have share the same format name of IF_SVE_HO_3A unless you found a reason of have them different?

The reason I have kept them separate is because these variants only accept one combination of operand sizes so I consider them to have different format to avoid dealing with special cases after the group has been decided upon.

kunalspathak · 2024-03-05T15:12:32Z

src/coreclr/jit/instrsarm64sve.h

 //    enum               name                     info                                              SVE_HP_3A                                    
 INST1(flogb,             "flogb",                 0,                       IF_SVE_HP_3A,            0x6518A000                                   )
    // FLOGB   <Zd>.<T>, <Pg>/M, <Zn>.<T>                                                SVE_HP_3A           0110010100011xx0 101gggnnnnnddddd     6518 A000   

+//    enum               name                     info                                              SVE_HP_3B
+INST1(fcvtzs,            "fcvtzs",                0,                       IF_SVE_HP_3B,            0x6518A000)


likewise for fcvt, fcvtzs and fcvtzu.

This is because fcvtz[s,u] has bits 18 and 17 varying with operand size, but fcvt has bits 17 and 16 varying. They also accept different sizes of operands, for example fcvt will accept D=>H but fcvtz[s,u] will not.

sure, I agree.

kunalspathak · 2024-03-08T14:47:47Z

src/coreclr/jit/codegenarm64test.cpp

+    theEmitter->emitIns_R_R_R(INS_sve_bfcvt, EA_SCALABLE, REG_V3, REG_P2, REG_V9,
+                              INS_OPTS_S_TO_H); // BFCVT   <Zd>.H, <Pg>/M, <Zn>.S
+
+    // IF_SVE_HO_3A_B


Suggested change

// IF_SVE_HO_3A_B

// IF_SVE_HO_3B

likewise most of the comments in this file needs to change to highlight the updated format name. e.g. IF_SVE_HP_3B_H no longer exist.

kunalspathak · 2024-03-08T14:51:33Z

src/coreclr/jit/instrsarm64sve.h

+    // BFCVT   <Zd>.H, <Pg>/M, <Zn>.S                                                    SVE_HO_3A           0110010110001010 101gggnnnnnddddd     658A A000   
+
+//    enum               name                     info                                              SVE_HO_3B
+INST1(fcvt,              "fcvt",                  0,                       IF_SVE_HO_3B,            0x6508A000)


If I see the changes here, they are always getting set to 1:

which matches the requirement:

So my suggestion would be to just encode 1 at bit 23 in the instrsarm64sve.h.

kunalspathak

LGTM. Thanks!

Add ARM64 encodings for groups IF_SVE_HO,HP,HS

032772b

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 28, 2024

This was referenced Feb 28, 2024

Assertion failed: (GetComponentSize() <= 2) || IsArray() #86273

Closed

CI test failure: Connection handshake was canceled due to the configured timeout #99074

Closed

a74nh approved these changes Feb 29, 2024

View reviewed changes

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 29, 2024

kunalspathak mentioned this pull request Feb 29, 2024

Arm64: Implement SVE encodings #94549

Closed

snickolls-arm added 3 commits March 5, 2024 09:49

Merge branch 'main' into github-IF_SVE_HO,HP,HS

f692dfe

Fix missing code emission after merge

25c60aa

Refactor to use fewer encoding groups

a1ceb6d

kunalspathak reviewed Mar 5, 2024

View reviewed changes

Merge branch 'main' into github-IF_SVE_HO,HP,HS

5aeacb7

kunalspathak reviewed Mar 8, 2024

View reviewed changes

snickolls-arm added 2 commits March 11, 2024 15:15

Merge branch 'main' into github-IF_SVE_HO,HP,HS

83e3be9

Address review comments

8fd2894

kunalspathak approved these changes Mar 11, 2024

View reviewed changes

kunalspathak merged commit 4983d9e into dotnet:main Mar 11, 2024
129 checks passed

github-actions bot locked and limited conversation to collaborators Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ARM64 encodings for groups IF_SVE_HO,HP,HS #99058

Add ARM64 encodings for groups IF_SVE_HO,HP,HS #99058

snickolls-arm commented Feb 28, 2024

ghost commented Feb 28, 2024

snickolls-arm commented Feb 28, 2024

a74nh left a comment

a74nh Feb 29, 2024

kunalspathak commented Feb 29, 2024 •

edited

Loading

snickolls-arm commented Mar 1, 2024

kunalspathak commented Mar 1, 2024

a74nh commented Mar 1, 2024

kunalspathak Mar 5, 2024 •

edited

Loading

snickolls-arm Mar 7, 2024 •

edited

Loading

kunalspathak Mar 8, 2024

kunalspathak Mar 5, 2024

snickolls-arm Mar 7, 2024

kunalspathak Mar 5, 2024 •

edited

Loading

snickolls-arm Mar 7, 2024

kunalspathak Mar 8, 2024

kunalspathak Mar 8, 2024

kunalspathak Mar 8, 2024

kunalspathak Mar 8, 2024

kunalspathak left a comment

	INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6508A000)
	INST1(fcvt, "fcvt", 0, IF_SVE_HO_3B, 0x6588A000)

Add ARM64 encodings for groups IF_SVE_HO,HP,HS #99058

Add ARM64 encodings for groups IF_SVE_HO,HP,HS #99058

Conversation

snickolls-arm commented Feb 28, 2024

ghost commented Feb 28, 2024

snickolls-arm commented Feb 28, 2024

a74nh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak commented Feb 29, 2024 • edited Loading

snickolls-arm commented Mar 1, 2024

kunalspathak commented Mar 1, 2024

a74nh commented Mar 1, 2024

kunalspathak Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

snickolls-arm Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak commented Feb 29, 2024 •

edited

Loading

kunalspathak Mar 5, 2024 •

edited

Loading

snickolls-arm Mar 7, 2024 •

edited

Loading

kunalspathak Mar 5, 2024 •

edited

Loading