Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: ARM64 SVE format encodings, SVE_IH_3A to SVE_JO_3A #95994

Merged
merged 43 commits into from
Jan 10, 2024

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Dec 14, 2023

Contributes to #94549

Adds the following SVE formats:

  • SVE_IH_3A
  • SVE_IH_3A_A
  • SVE_IH_3A_F
  • SVE_IJ_3A
  • SVE_IJ_3A_D
  • SVE_IJ_3A_E
  • SVE_IJ_3A_F
  • SVE_IJ_3A_G
  • SVE_IL_3A
  • SVE_IL_3A_A
  • SVE_IL_3A_B
  • SVE_IL_3A_C
  • SVE_IM_3A
  • SVE_IO_3A
  • SVE_IQ_3A
  • SVE_IS_3A
  • SVE_JE_3A
  • SVE_JM_3A
  • SVE_JN_3C
  • SVE_JN_3C_D
  • SVE_JO_3A

Capstone is in red, Jit is in green

--- a/Untitled-1.txt
+++ b/Untitled-2.txt
@@ -1,99 +1,99 @@
-ld1d  { z5.d }, p3/z, [x4]
-ld1d  { z0.q }, p2/z, [x3, #5, mul vl]
-ld1w  { z0.s }, p2/z, [x3, #3, mul vl]
-ld1w  { z0.d }, p2/z, [x3, #3, mul vl]
-ld1w  { z0.q }, p2/z, [x3, #3, mul vl]
-ld1sw { z0.d }, p5/z, [x3, #4, mul vl]
-ld1sb { z3.h }, p0/z, [x2, #6, mul vl]
-ld1sb { z3.s }, p0/z, [x2, #6, mul vl]
-ld1sb { z3.d }, p0/z, [x2, #6, mul vl]
-ld1b  { z5.b }, p1/z, [x3, #7, mul vl]
-ld1b  { z5.h }, p1/z, [x3, #7, mul vl]
-ld1b  { z5.s }, p1/z, [x3, #7, mul vl]
-ld1b  { z5.d }, p1/z, [x3, #7, mul vl]
-ld1sh { z7.s }, p3/z, [x5, #2, mul vl]
-ld1sh { z7.d }, p3/z, [x5, #2, mul vl]
-ld1h  { z2.h }, p1/z, [x6, #1, mul vl]
-ld1h  { z2.s }, p1/z, [x6, #1, mul vl]
-ld1h  { z2.d }, p1/z, [x6, #1, mul vl]
-ldnf1d        { z0.d }, p0/z, [x0]
-ldnf1sw       { z0.d }, p0/z, [x0]
-ldnf1d        { z0.d }, p1/z, [x2, #5, mul vl]
-ldnf1sw       { z0.d }, p1/z, [x2, #5, mul vl]
-ldnf1sh       { z0.s }, p1/z, [x5, #5, mul vl]
-ldnf1w        { z0.s }, p2/z, [x4, #5, mul vl]
-ldnf1sh       { z0.d }, p1/z, [x5, #5, mul vl]
-ldnf1w        { z0.d }, p2/z, [x4, #5, mul vl]
-ldnf1h        { z1.h }, p3/z, [x2, #5, mul vl]
-ldnf1sb       { z0.h }, p4/z, [x1, #5, mul vl]
-ldnf1h        { z1.s }, p3/z, [x2, #5, mul vl]
-ldnf1sb       { z0.s }, p4/z, [x1, #5, mul vl]
-ldnf1h        { z1.d }, p3/z, [x2, #5, mul vl]
-ldnf1sb       { z0.d }, p4/z, [x1, #5, mul vl]
-ldnf1b        { z2.b }, p5/z, [x3, #-4, mul vl]
-ldnf1b        { z2.h }, p5/z, [x3, #-2, mul vl]
-ldnf1b        { z2.s }, p5/z, [x3, #2, mul vl]
-ldnf1b        { z2.d }, p5/z, [x3, #1, mul vl]
-ldnt1b        { z0.b }, p1/z, [x2, #-5, mul vl]
-ldnt1d        { z3.d }, p4/z, [x5, #-1, mul vl]
-ldnt1h        { z6.h }, p7/z, [x8]
-ldnt1w        { z1.s }, p2/z, [x3, #-8, mul vl]
-ld1rob        { z0.b }, p1/z, [x2]
-ld1rod        { z4.d }, p5/z, [x6, #-0x20]
-ld1roh        { z8.h }, p3/z, [x1, #-0x100]
-ld1row        { z3.s }, p4/z, [x0, #0xE0]
-ld1rqb        { z6.b }, p7/z, [x8, #0x40]
-ld1rqd        { z9.d }, p0/z, [x1, #-0x80]
-ld1rqh        { z4.h }, p5/z, [x6, #0x70]
-ld1rqw        { z3.s }, p2/z, [x1, #-0x10]
-ld2q  { z0.q, z1.q }, p1/z, [x2, #-0x10, mul vl]
-ld2q  { z0.q, z1.q }, p1/z, [x2, #0xE, mul vl]
-ld3q  { z0.q - z2.q }, p4/z, [x5, #-0x18, mul vl]
-ld3q  { z0.q - z2.q }, p4/z, [x5, #0x15, mul vl]
-ld4q  { z0.q - z3.q }, p5/z, [x3, #-0x20, mul vl]
-ld4q  { z0.q - z3.q }, p5/z, [x3, #0x1C, mul vl]
-ld2q  { z12.q, z13.q }, p1/z, [x2, #-0x10, mul vl]
-ld2q  { z13.q, z14.q }, p1/z, [x2, #0xE, mul vl]
-ld3q  { z14.q - z16.q }, p4/z, [x5, #-0x18, mul vl]
-ld3q  { z15.q - z17.q }, p4/z, [x5, #0x15, mul vl]
-ld4q  { z16.q - z19.q }, p5/z, [x3, #-0x20, mul vl]
-ld4q  { z27.q - z30.q }, p5/z, [x3, #0x1C, mul vl]
-ld4q  { z28.q - z31.q }, p5/z, [x3, #0x1C, mul vl]
-ld4q  { z29.q, z30.q, z31.q, z0.q }, p5/z, [x3, #0x1C, mul vl]
-ld4q  { z30.q, z31.q, z0.q, z1.q }, p5/z, [x3, #0x1C, mul vl]
-ld4q  { z31.q, z0.q, z1.q, z2.q }, p5/z, [x3, #0x1C, mul vl]
-ld2q  { z31.q, z0.q }, p1/z, [x2, #-0x10, mul vl]
-ld2b  { z0.b, z1.b }, p1/z, [x2, #-0x10, mul vl]
-ld2d  { z4.d, z5.d }, p5/z, [x7, #0xE, mul vl]
-ld2h  { z6.h, z7.h }, p5/z, [x4, #8, mul vl]
-ld2w  { z0.s, z1.s }, p0/z, [x1, #2, mul vl]
-ld3b  { z0.b - z2.b }, p0/z, [x0, #0x15, mul vl]
-ld3d  { z0.d - z2.d }, p0/z, [x0, #-0x18, mul vl]
-ld3h  { z0.h - z2.h }, p0/z, [x0, #0x15, mul vl]
-ld3w  { z0.s - z2.s }, p0/z, [x0, #-0x18, mul vl]
-ld4b  { z31.b, z0.b, z1.b, z2.b }, p2/z, [x1, #-0x20, mul vl]
-ld4d  { z8.d - z11.d }, p0/z, [x0, #0x1C, mul vl]
-ld4h  { z5.h - z8.h }, p4/z, [x3, #-0x20, mul vl]
-ld4w  { z0.s - z3.s }, p1/z, [x2, #0x1C, mul vl]
-st2q  { z0.q, z1.q }, p3, [x0, #-0x10, mul vl]
-st3q  { z2.q - z4.q }, p3, [x4, #0x15, mul vl]
-st4q  { z7.q - z10.q }, p6, [x5, #0x1C, mul vl]
-stnt1b        { z1.b }, p2, [x3, #4, mul vl]
-stnt1d        { z8.d }, p7, [x6, #5, mul vl]
-stnt1h        { z9.h }, p1, [x0, #-5, mul vl]
-stnt1w        { z0.s }, p0, [x2, #-7, mul vl]
-st1d  { z1.d }, p2, [x3, #4, mul vl]
-st1w  { z3.q }, p4, [x5, #6, mul vl]
-st1d  { z2.q }, p1, [x0]
-st2b  { z0.b, z1.b }, p1, [x2, #-0x10, mul vl]
-st2d  { z5.d, z6.d }, p4, [x3, #-0x10, mul vl]
-st2h  { z6.h, z7.h }, p7, [x8, #-0x10, mul vl]
-st2w  { z8.s, z9.s }, p1, [x9, #-0x10, mul vl]
-st3b  { z7.b - z9.b }, p6, [x5, #-0x18, mul vl]
-st3d  { z2.d - z4.d }, p3, [x4, #-0x18, mul vl]
-st3h  { z1.h - z3.h }, p2, [x3, #-0x18, mul vl]
-st3w  { z1.s - z3.s }, p3, [x8, #-0x18, mul vl]
-st4b  { z0.b - z3.b }, p0, [x0, #-0x20, mul vl]
-st4d  { z2.d - z5.d }, p0, [x1, #-0x20, mul vl]
-st4h  { z3.h - z6.h }, p5, [x2, #-0x20, mul vl]
-st4w  { z0.s - z3.s }, p1, [x5, #0x1C, mul vl]
\ No newline at end of file
+ld1d    { z5.d }, p3/z, [x4]
+ld1d    { z0.q }, p2/z, [x3, #5, mul vl]
+ld1w    { z0.s }, p2/z, [x3, #3, mul vl]
+ld1w    { z0.d }, p2/z, [x3, #3, mul vl]
+ld1w    { z0.q }, p2/z, [x3, #3, mul vl]
+ld1sw   { z0.d }, p5/z, [x3, #4, mul vl]
+ld1sb   { z3.h }, p0/z, [x2, #6, mul vl]
+ld1sb   { z3.s }, p0/z, [x2, #6, mul vl]
+ld1sb   { z3.d }, p0/z, [x2, #6, mul vl]
+ld1b    { z5.b }, p1/z, [x3, #7, mul vl]
+ld1b    { z5.h }, p1/z, [x3, #7, mul vl]
+ld1b    { z5.s }, p1/z, [x3, #7, mul vl]
+ld1b    { z5.d }, p1/z, [x3, #7, mul vl]
+ld1sh   { z7.s }, p3/z, [x5, #2, mul vl]
+ld1sh   { z7.d }, p3/z, [x5, #2, mul vl]
+ld1h    { z2.h }, p1/z, [x6, #1, mul vl]
+ld1h    { z2.s }, p1/z, [x6, #1, mul vl]
+ld1h    { z2.d }, p1/z, [x6, #1, mul vl]
+ldnf1d  { z0.d }, p0/z, [x0]
+ldnf1sw { z0.d }, p0/z, [x0]
+ldnf1d  { z0.d }, p1/z, [x2, #5, mul vl]
+ldnf1sw { z0.d }, p1/z, [x2, #5, mul vl]
+ldnf1sh { z0.s }, p1/z, [x5, #5, mul vl]
+ldnf1w  { z0.s }, p2/z, [x4, #5, mul vl]
+ldnf1sh { z0.d }, p1/z, [x5, #5, mul vl]
+ldnf1w  { z0.d }, p2/z, [x4, #5, mul vl]
+ldnf1h  { z1.h }, p3/z, [x2, #5, mul vl]
+ldnf1sb { z0.h }, p4/z, [x1, #5, mul vl]
+ldnf1h  { z1.s }, p3/z, [x2, #5, mul vl]
+ldnf1sb { z0.s }, p4/z, [x1, #5, mul vl]
+ldnf1h  { z1.d }, p3/z, [x2, #5, mul vl]
+ldnf1sb { z0.d }, p4/z, [x1, #5, mul vl]
+ldnf1b  { z2.b }, p5/z, [x3, #-4, mul vl]
+ldnf1b  { z2.h }, p5/z, [x3, #-2, mul vl]
+ldnf1b  { z2.s }, p5/z, [x3, #2, mul vl]
+ldnf1b  { z2.d }, p5/z, [x3, #1, mul vl]
+ldnt1b  { z0.b }, p1/z, [x2, #-5, mul vl]
+ldnt1d  { z3.d }, p4/z, [x5, #-1, mul vl]
+ldnt1h  { z6.h }, p7/z, [x8]
+ldnt1w  { z1.s }, p2/z, [x3, #-8, mul vl]
+ld1rob  { z0.b }, p1/z, [x2]
+ld1rod  { z4.d }, p5/z, [x6, #-0x20]
+ld1roh  { z8.h }, p3/z, [x1, #-0x100]
+ld1row  { z3.s }, p4/z, [x0, #0xE0]
+ld1rqb  { z6.b }, p7/z, [x8, #0x40]
+ld1rqd  { z9.d }, p0/z, [x1, #-0x80]
+ld1rqh  { z4.h }, p5/z, [x6, #0x70]
+ld1rqw  { z3.s }, p2/z, [x1, #-0x10]
+ld2q    { z0.q, z1.q }, p1/z, [x2, #-0x10, mul vl]
+ld2q    { z0.q, z1.q }, p1/z, [x2, #0x0E, mul vl]
+ld3q    { z0.q - z2.q }, p4/z, [x5, #-0x18, mul vl]
+ld3q    { z0.q - z2.q }, p4/z, [x5, #0x15, mul vl]
+ld4q    { z0.q - z3.q }, p5/z, [x3, #-0x20, mul vl]
+ld4q    { z0.q - z3.q }, p5/z, [x3, #0x1C, mul vl]
+ld2q    { z12.q, z13.q }, p1/z, [x2, #-0x10, mul vl]
+ld2q    { z13.q, z14.q }, p1/z, [x2, #0x0E, mul vl]
+ld3q    { z14.q - z16.q }, p4/z, [x5, #-0x18, mul vl]
+ld3q    { z15.q - z17.q }, p4/z, [x5, #0x15, mul vl]
+ld4q    { z16.q - z19.q }, p5/z, [x3, #-0x20, mul vl]
+ld4q    { z27.q - z30.q }, p5/z, [x3, #0x1C, mul vl]
+ld4q    { z28.q - z31.q }, p5/z, [x3, #0x1C, mul vl]
+ld4q    { z29.q, z30.q, z31.q, z0.q }, p5/z, [x3, #0x1C, mul vl]
+ld4q    { z30.q, z31.q, z0.q, z1.q }, p5/z, [x3, #0x1C, mul vl]
+ld4q    { z31.q, z0.q, z1.q, z2.q }, p5/z, [x3, #0x1C, mul vl]
+ld2q    { z31.q, z0.q }, p1/z, [x2, #-0x10, mul vl]
+ld2b    { z0.b, z1.b }, p1/z, [x2, #-0x10, mul vl]
+ld2d    { z4.d, z5.d }, p5/z, [x7, #0x0E, mul vl]
+ld2h    { z6.h, z7.h }, p5/z, [x4, #0x08, mul vl]
+ld2w    { z0.s, z1.s }, p0/z, [x1, #0x02, mul vl]
+ld3b    { z0.b - z2.b }, p0/z, [x0, #0x15, mul vl]
+ld3d    { z0.d - z2.d }, p0/z, [x0, #-0x18, mul vl]
+ld3h    { z0.h - z2.h }, p0/z, [x0, #0x15, mul vl]
+ld3w    { z0.s - z2.s }, p0/z, [x0, #-0x18, mul vl]
+ld4b    { z31.b, z0.b, z1.b, z2.b }, p2/z, [x1, #-0x20, mul vl]
+ld4d    { z8.d - z11.d }, p0/z, [x0, #0x1C, mul vl]
+ld4h    { z5.h - z8.h }, p4/z, [x3, #-0x20, mul vl]
+ld4w    { z0.s - z3.s }, p1/z, [x2, #0x1C, mul vl]
+st2q    { z0.q, z1.q }, p3, [x0, #-0x10, mul vl]
+st3q    { z2.q - z4.q }, p3, [x4, #0x15, mul vl]
+st4q    { z7.q - z10.q }, p6, [x5, #0x1C, mul vl]
+stnt1b  { z1.b }, p2, [x3, #4, mul vl]
+stnt1d  { z8.d }, p7, [x6, #5, mul vl]
+stnt1h  { z9.h }, p1, [x0, #-5, mul vl]
+stnt1w  { z0.s }, p0, [x2, #-7, mul vl]
+st1d    { z1.d }, p2, [x3, #4, mul vl]
+st1w    { z3.q }, p4, [x5, #6, mul vl]
+st1d    { z2.q }, p1, [x0]
+st2b    { z0.b, z1.b }, p1, [x2, #-0x10, mul vl]
+st2d    { z5.d, z6.d }, p4, [x3, #-0x10, mul vl]
+st2h    { z6.h, z7.h }, p7, [x8, #-0x10, mul vl]
+st2w    { z8.s, z9.s }, p1, [x9, #-0x10, mul vl]
+st3b    { z7.b - z9.b }, p6, [x5, #-0x18, mul vl]
+st3d    { z2.d - z4.d }, p3, [x4, #-0x18, mul vl]
+st3h    { z1.h - z3.h }, p2, [x3, #-0x18, mul vl]
+st3w    { z1.s - z3.s }, p3, [x8, #-0x18, mul vl]
+st4b    { z0.b - z3.b }, p0, [x0, #-0x20, mul vl]
+st4d    { z2.d - z5.d }, p0, [x1, #-0x20, mul vl]
+st4h    { z3.h - z6.h }, p5, [x2, #-0x20, mul vl]
+st4w    { z0.s - z3.s }, p1, [x5, #0x1C, mul vl]
\ No newline at end of file
image

@ghost ghost assigned TIHan Dec 14, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 14, 2023
@ghost
Copy link

ghost commented Dec 14, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Note: I still have a few more formats to do as well as some cleanup.

Contributes to #94549

Adds the following SVE formats:

  • SVE_IH_3A
  • SVE_IH_3A_A
  • SVE_IH_3A_F
  • SVE_IJ_3A
  • SVE_IJ_3A_D
  • SVE_IJ_3A_E
  • SVE_IJ_3A_F
  • SVE_IJ_3A_G
  • SVE_IL_3A
  • SVE_IL_3A_A
  • SVE_IL_3A_B
  • SVE_IL_3A_C
  • SVE_IM_3A
  • SVE_IO_3A
  • SVE_IQ_3A
  • SVE_IS_3A
  • SVE_JE_3A
  • SVE_JM_3A
  • SVE_JN_3C
  • SVE_JN_3C_D
  • SVE_JO_3A
Author: TIHan
Assignees: TIHan
Labels:

area-CodeGen-coreclr

Milestone: -

@TIHan TIHan changed the title JIT: wip ARM64 SVE format encodings, starting at SVE_IH_3A JIT: wip ARM64 SVE format encodings, SVE_IH_3A to SVE_JO_3A Dec 14, 2023
@TIHan TIHan added arm-sve Work related to arm64 SVE/SVE2 support arch-arm64 labels Dec 14, 2023
@TIHan TIHan changed the title JIT: wip ARM64 SVE format encodings, SVE_IH_3A to SVE_JO_3A JIT: ARM64 SVE format encodings, SVE_IH_3A to SVE_JO_3A Dec 15, 2023
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to take a while to review this.

inline static bool insOptsScalableWordsOrQuadwords(insOpts opt)
{
// `opt` is any of the standard word, quadword and above scalable types.
return (insOptsScalableWords(opt) || (opt == INS_OPTS_SCALABLE_Q));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a check for INS_OPTS_SCALABLE_Q seems little odd. Probably #96692 would revise this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR would revise this. Should I wait for the PR to be merged then adjust this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are adding just 1 entry and since your PR was out for a while now, I would suggest to merge it after you address the feedback.

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
// Therefore, by default return NONE due to ambiguity.
case IF_SVE_AH_3A:
case IF_SVE_DB_3A:
// TODO: Handle these cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't implemented those formats yet, and also I'm not sure how we will represent <Pg>/<ZM>.

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.h Outdated Show resolved Hide resolved

if (canEncodeSveElemsize_dtype(ins))
{
code = insEncodeSveElemsize_dtype(ins, optGetSveElemsize(id->idInsOpt()), code);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be the reason of TP regression:

image

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
const bool notLastRegister = (i != listSize - 1);
emitDispSveReg(currReg, opt, notLastRegister);
currReg = (currReg == REG_V31) ? REG_V0 : REG_NEXT(currReg);
if ((listSize == 2) || (((unsigned)currReg + listSize - 1) > (unsigned)REG_V31))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(listSize == 2)

Can't this be true for any listSize > 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test that passes REG_V31 and see if they are disassembled correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

@TIHan TIHan Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have unit tests that use REG_V31. It prints out ld4b { z31.b, z0.b, z1.b, z2.b }, p2/z, [x1, #-0x20, mul vl], but I may not have one when listsize is 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(listSize == 2)

Can't this be true for any listSize > 1?

looks like this is not yet addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look at it, I guess we may not need that check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need the check, but I slightly changed it.

case IF_SVE_JE_3A:
case IF_SVE_JO_3A:
// This does not have to be printed as hex.
// We only do it because the capstone disassembly displays this immediate as hex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work properly with LATEDISASM? Regardless, I would remove the comment about capstone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coredistools has trouble decoding some instructions here compared to capstone, so it's going to be different

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 9, 2024
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 9, 2024
@TIHan
Copy link
Contributor Author

TIHan commented Jan 9, 2024

@kunalspathak this is ready again.

capstone left, jit right:
image

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need to address one outstanding feedback.

src/coreclr/jit/instrsarm64sve.h Show resolved Hide resolved
const bool notLastRegister = (i != listSize - 1);
emitDispSveReg(currReg, opt, notLastRegister);
currReg = (currReg == REG_V31) ? REG_V0 : REG_NEXT(currReg);
if ((listSize == 2) || (((unsigned)currReg + listSize - 1) > (unsigned)REG_V31))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(listSize == 2)

Can't this be true for any listSize > 1?

looks like this is not yet addressed.

@TIHan
Copy link
Contributor Author

TIHan commented Jan 10, 2024

@kunalspathak Ok, addressed the remaining feedback.

@kunalspathak
Copy link
Member

// We do not want the short-hand for list size of 1 or 2.

Is this true for all the instructions? Currently only sqrshrn uses it and don't think they have a requirement to show the registers using short-hand format: https://docsmirror.github.io/A64/2023-06/sqrshrn_z_mz2.html

I would rewrite it to something like:

assert(listSize > 0);

// We do not want the short-hand for list size of 1 or 2.
if ((listSize <= 2) || (((unsigned)currReg + listSize - 1) > (unsigned)REG_V31))
{
    for (unsigned i = 0; i < listSize; i++)
    {
        const bool notLastRegister = (i != listSize - 1);
        emitDispSveReg(currReg, opt, notLastRegister);
        currReg = (currReg == REG_V31) ? REG_V0 : REG_NEXT(currReg);
    }
}
else
{
    // short-hand. example: { z0.s - z2.s } which is the same as { z0.s, z1.s, z2.s }
    emitDispSveReg(currReg, opt, false);
    printf(" - ");
    emitDispSveReg((regNumber)(currReg + listSize - 1), opt, false);
}

@TIHan
Copy link
Contributor Author

TIHan commented Jan 10, 2024

Anything with the syntax { Zn1 - Zn2 } can use the short-hand if it is greater than 2.

@TIHan
Copy link
Contributor Author

TIHan commented Jan 10, 2024

@kunalspathak I put in your change.

@kunalspathak
Copy link
Member

Anything with the syntax { Zn1 - Zn2 } can use the short-hand if it is greater than 2.

Right, but if you see

emitDispSveRegList(id->idReg2(), 2, INS_OPTS_SCALABLE_S, true); // nnnn
, it passes listSize == 2 and with the code in the PR, it will use the short-hand.

I think we should separate the 2 methods, emitDispSveRegList() that just prints the comma separated list and emitDispSveRegListShortHand() that will print the short-hand. Both will have assert(listSize > 1). (sorry, in my suggestion about it should be > 1 rather than 0.

@TIHan
Copy link
Contributor Author

TIHan commented Jan 10, 2024

with the code in the PR, it will use the short-hand.

It won't use the short-hand since the list size is 2.

@TIHan
Copy link
Contributor Author

TIHan commented Jan 10, 2024

@kunalspathak I renamed emitDispSveRegList to emitDispSveConsecutiveRegList for clarity.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TIHan TIHan merged commit 9eb5c8a into dotnet:main Jan 10, 2024
102 of 124 checks passed
@TIHan
Copy link
Contributor Author

TIHan commented Jan 10, 2024

Thanks!

tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants