-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Add saturating decrement/increment by element count #102315
Conversation
Note regarding the
|
Change-Id: Ife679701cd65239b5f64be538dab312c2eb896e2
|
||
|
||
|
||
private static int AddSaturateScalar(int left, int right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are helper functions for the fallback versions of the new API methods. Feels wrong to put them here. But where else should they go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding - any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be putting any kind of manual fallback implementation for the platform specific intrinsics. We should be relying on the JIT functionality to generate the jump table that still exactly executes sqdech
instead.
Manually authoring a fallback is problematic for many reasons including that it requires precisely emulating the behavior of the underlying instruction and doing so over time. This current fallback doesn't account for things like out of range immediate values, whether its valid to execute under streaming mode, or various other considerations and so will lead to fundamental behavioral differences between direct and indirect execution or based on what optimizations the JIT can or cannot do.
Instead, we should simply use the existing functionality to have the recursive expansion emit the same jump table that other paths already are doing. Since pattern
is 5-bits and imm4
is 4
bits, we should already have these marked as being "non full-range" immediates and so the fallback can already hook into the functionality to emit the bounds checks guaranteeing that each input is in bounds. The fallback can then construct the jump table index using (imm4 << 5) | pattern
(or (pattern << 4) | imm4
, it doesn't really matter). We then have 512 jump table entries each executing precisely sqdech
where index 0 is pattern: 0, imm4: 0
and index 511 is pattern: 31, imm4: 15
. Assuming pattern is the lower bits, then we'd have index 31 is pattern: 31, imm4: 0
and index 32 is pattern: 0, imm4: 1
.
It's really straightforward handling that should generally hook into all the existing infrastructure and keep things well defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, we should simply use the existing functionality to have the recursive expansion emit the same jump table that other paths already are doing. Since
pattern
is 5-bits andimm4
is4
bits, we should already have these marked as being "non full-range" immediates and so the fallback can already hook into the functionality to emit the bounds checks guaranteeing that each input is in bounds. The fallback can then construct the jump table index using(imm4 << 5) | pattern
(or(pattern << 4) | imm4
, it doesn't really matter). We then have 512 jump table entries each executing preciselysqdech
where index 0 ispattern: 0, imm4: 0
and index 511 ispattern: 31, imm4: 15
. Assuming pattern is the lower bits, then we'd have index 31 ispattern: 31, imm4: 0
and index 32 ispattern: 0, imm4: 1
.It's really straightforward handling that should generally hook into all the existing infrastructure and keep things well defined.
There's two ways I can think of doing this
- Do it late. Write a
HWIntrinsic2ImmOpHelper
based offHWIntrinsicImmOpHelper
. This handles combining the two immediates into one using a temp register, then constructs the table. - Do it early. During import stage, combine the two immediate nodes and connect this to
op2
, leavingop3
asnullptr
. That requires some fiddling to make sure future range checks work. Can just use the standardHWIntrinsicImmOpHelper
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go ahead with option 1. This is specific to codegen so we shouldn't do it in early phases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table is now working, with all 512 entries.
Note that some of these entries are invalid (there a 6 invalid values in the middle of pattern between VL256 and MUL4). These are all still being generated for the table. I figure that's better than adding extra assembly to restrict to a smaller table.
Assembly listing for System.Runtime.Intrinsics.Arm.Sve:SaturatingDecrementBy32BitElementCount
``` ; Assembly listing for method System.Runtime.Intrinsics.Arm.Sve:SaturatingDecrementBy32BitElementCount(System.Numerics.Vector`1[int],ubyte,ubyte):System.Numerics.Vector`1[int] (Tier0) ; Emitting BLENDED_CODE for generic ARM64 - Unix ; Tier0 code ; fp based frame ; partially interruptible ; Final local variable assignments ; ; V00 arg0 [V00 ] ( 1, 1 ) simd16 -> [fp+0x20] HFA(simd16) do-not-enreg[S] ; V01 arg1 [V01 ] ( 1, 1 ) ubyte -> [fp+0x1C] do-not-enreg[] ; V02 arg2 [V02 ] ( 1, 1 ) ubyte -> [fp+0x18] do-not-enreg[] ;# V03 OutArgs [V03 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ; ; Lcl frame size = 32G_M8891_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x30]!
mov fp, sp
str q0, [fp, #0x20] // [V00 arg0]
str w0, [fp, #0x1C] // [V01 arg1]
str w1, [fp, #0x18] // [V02 arg2]
;; size=20 bbWeight=1 PerfScore 4.50
G_M8891_IG02: ;; offset=0x0014
ldr q16, [fp, #0x20] // [V00 arg0]
ldr w0, [fp, #0x1C] // [V01 arg1]
uxtb w0, w0
sub w0, w0, #1
cmp w0, #16
bhs G_M8891_IG517
ldr w0, [fp, #0x1C] // [V01 arg1]
uxtb w0, w0
ldr w1, [fp, #0x18] // [V02 arg2]
uxtb w1, w1
cmp w1, #32
bhs G_M8891_IG517
ldr w1, [fp, #0x18] // [V02 arg2]
uxtb w1, w1
sub w0, w0, #1
lsl w1, w1, #4
orr w0, w0, w1
adr x2, [G_M8891_IG03]
add x2, x2, x0, LSL #3
br x2
;; size=80 bbWeight=1 PerfScore 20.00
G_M8891_IG03: ;; offset=0x0064
sqdecw z16.s, pow2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG04: ;; offset=0x006C
sqdecw z16.s, pow2, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG05: ;; offset=0x0074
sqdecw z16.s, pow2, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG06: ;; offset=0x007C
sqdecw z16.s, pow2, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG07: ;; offset=0x0084
sqdecw z16.s, pow2, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG08: ;; offset=0x008C
sqdecw z16.s, pow2, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG09: ;; offset=0x0094
sqdecw z16.s, pow2, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG10: ;; offset=0x009C
sqdecw z16.s, pow2, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG11: ;; offset=0x00A4
sqdecw z16.s, pow2, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG12: ;; offset=0x00AC
sqdecw z16.s, pow2, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG13: ;; offset=0x00B4
sqdecw z16.s, pow2, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG14: ;; offset=0x00BC
sqdecw z16.s, pow2, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG15: ;; offset=0x00C4
sqdecw z16.s, pow2, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG16: ;; offset=0x00CC
sqdecw z16.s, pow2, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG17: ;; offset=0x00D4
sqdecw z16.s, pow2, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG18: ;; offset=0x00DC
sqdecw z16.s, pow2, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG19: ;; offset=0x00E4
sqdecw z16.s, vl1
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG20: ;; offset=0x00EC
sqdecw z16.s, vl1, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG21: ;; offset=0x00F4
sqdecw z16.s, vl1, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG22: ;; offset=0x00FC
sqdecw z16.s, vl1, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG23: ;; offset=0x0104
sqdecw z16.s, vl1, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG24: ;; offset=0x010C
sqdecw z16.s, vl1, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG25: ;; offset=0x0114
sqdecw z16.s, vl1, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG26: ;; offset=0x011C
sqdecw z16.s, vl1, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG27: ;; offset=0x0124
sqdecw z16.s, vl1, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG28: ;; offset=0x012C
sqdecw z16.s, vl1, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG29: ;; offset=0x0134
sqdecw z16.s, vl1, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG30: ;; offset=0x013C
sqdecw z16.s, vl1, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG31: ;; offset=0x0144
sqdecw z16.s, vl1, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG32: ;; offset=0x014C
sqdecw z16.s, vl1, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG33: ;; offset=0x0154
sqdecw z16.s, vl1, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG34: ;; offset=0x015C
sqdecw z16.s, vl1, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG35: ;; offset=0x0164
sqdecw z16.s, vl2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG36: ;; offset=0x016C
sqdecw z16.s, vl2, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG37: ;; offset=0x0174
sqdecw z16.s, vl2, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG38: ;; offset=0x017C
sqdecw z16.s, vl2, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG39: ;; offset=0x0184
sqdecw z16.s, vl2, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG40: ;; offset=0x018C
sqdecw z16.s, vl2, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG41: ;; offset=0x0194
sqdecw z16.s, vl2, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG42: ;; offset=0x019C
sqdecw z16.s, vl2, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG43: ;; offset=0x01A4
sqdecw z16.s, vl2, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG44: ;; offset=0x01AC
sqdecw z16.s, vl2, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG45: ;; offset=0x01B4
sqdecw z16.s, vl2, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG46: ;; offset=0x01BC
sqdecw z16.s, vl2, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG47: ;; offset=0x01C4
sqdecw z16.s, vl2, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG48: ;; offset=0x01CC
sqdecw z16.s, vl2, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG49: ;; offset=0x01D4
sqdecw z16.s, vl2, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG50: ;; offset=0x01DC
sqdecw z16.s, vl2, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG51: ;; offset=0x01E4
sqdecw z16.s, vl3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG52: ;; offset=0x01EC
sqdecw z16.s, vl3, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG53: ;; offset=0x01F4
sqdecw z16.s, vl3, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG54: ;; offset=0x01FC
sqdecw z16.s, vl3, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG55: ;; offset=0x0204
sqdecw z16.s, vl3, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG56: ;; offset=0x020C
sqdecw z16.s, vl3, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG57: ;; offset=0x0214
sqdecw z16.s, vl3, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG58: ;; offset=0x021C
sqdecw z16.s, vl3, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG59: ;; offset=0x0224
sqdecw z16.s, vl3, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG60: ;; offset=0x022C
sqdecw z16.s, vl3, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG61: ;; offset=0x0234
sqdecw z16.s, vl3, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG62: ;; offset=0x023C
sqdecw z16.s, vl3, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG63: ;; offset=0x0244
sqdecw z16.s, vl3, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG64: ;; offset=0x024C
sqdecw z16.s, vl3, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG65: ;; offset=0x0254
sqdecw z16.s, vl3, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG66: ;; offset=0x025C
sqdecw z16.s, vl3, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG67: ;; offset=0x0264
sqdecw z16.s, vl4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG68: ;; offset=0x026C
sqdecw z16.s, vl4, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG69: ;; offset=0x0274
sqdecw z16.s, vl4, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG70: ;; offset=0x027C
sqdecw z16.s, vl4, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG71: ;; offset=0x0284
sqdecw z16.s, vl4, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG72: ;; offset=0x028C
sqdecw z16.s, vl4, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG73: ;; offset=0x0294
sqdecw z16.s, vl4, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG74: ;; offset=0x029C
sqdecw z16.s, vl4, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG75: ;; offset=0x02A4
sqdecw z16.s, vl4, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG76: ;; offset=0x02AC
sqdecw z16.s, vl4, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG77: ;; offset=0x02B4
sqdecw z16.s, vl4, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG78: ;; offset=0x02BC
sqdecw z16.s, vl4, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG79: ;; offset=0x02C4
sqdecw z16.s, vl4, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG80: ;; offset=0x02CC
sqdecw z16.s, vl4, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG81: ;; offset=0x02D4
sqdecw z16.s, vl4, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG82: ;; offset=0x02DC
sqdecw z16.s, vl4, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG83: ;; offset=0x02E4
sqdecw z16.s, vl5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG84: ;; offset=0x02EC
sqdecw z16.s, vl5, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG85: ;; offset=0x02F4
sqdecw z16.s, vl5, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG86: ;; offset=0x02FC
sqdecw z16.s, vl5, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG87: ;; offset=0x0304
sqdecw z16.s, vl5, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG88: ;; offset=0x030C
sqdecw z16.s, vl5, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG89: ;; offset=0x0314
sqdecw z16.s, vl5, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG90: ;; offset=0x031C
sqdecw z16.s, vl5, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG91: ;; offset=0x0324
sqdecw z16.s, vl5, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG92: ;; offset=0x032C
sqdecw z16.s, vl5, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG93: ;; offset=0x0334
sqdecw z16.s, vl5, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG94: ;; offset=0x033C
sqdecw z16.s, vl5, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG95: ;; offset=0x0344
sqdecw z16.s, vl5, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG96: ;; offset=0x034C
sqdecw z16.s, vl5, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG97: ;; offset=0x0354
sqdecw z16.s, vl5, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG98: ;; offset=0x035C
sqdecw z16.s, vl5, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG99: ;; offset=0x0364
sqdecw z16.s, vl6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG100: ;; offset=0x036C
sqdecw z16.s, vl6, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG101: ;; offset=0x0374
sqdecw z16.s, vl6, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG102: ;; offset=0x037C
sqdecw z16.s, vl6, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG103: ;; offset=0x0384
sqdecw z16.s, vl6, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG104: ;; offset=0x038C
sqdecw z16.s, vl6, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG105: ;; offset=0x0394
sqdecw z16.s, vl6, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG106: ;; offset=0x039C
sqdecw z16.s, vl6, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG107: ;; offset=0x03A4
sqdecw z16.s, vl6, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG108: ;; offset=0x03AC
sqdecw z16.s, vl6, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG109: ;; offset=0x03B4
sqdecw z16.s, vl6, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG110: ;; offset=0x03BC
sqdecw z16.s, vl6, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG111: ;; offset=0x03C4
sqdecw z16.s, vl6, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG112: ;; offset=0x03CC
sqdecw z16.s, vl6, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG113: ;; offset=0x03D4
sqdecw z16.s, vl6, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG114: ;; offset=0x03DC
sqdecw z16.s, vl6, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG115: ;; offset=0x03E4
sqdecw z16.s, vl7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG116: ;; offset=0x03EC
sqdecw z16.s, vl7, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG117: ;; offset=0x03F4
sqdecw z16.s, vl7, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG118: ;; offset=0x03FC
sqdecw z16.s, vl7, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG119: ;; offset=0x0404
sqdecw z16.s, vl7, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG120: ;; offset=0x040C
sqdecw z16.s, vl7, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG121: ;; offset=0x0414
sqdecw z16.s, vl7, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG122: ;; offset=0x041C
sqdecw z16.s, vl7, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG123: ;; offset=0x0424
sqdecw z16.s, vl7, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG124: ;; offset=0x042C
sqdecw z16.s, vl7, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG125: ;; offset=0x0434
sqdecw z16.s, vl7, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG126: ;; offset=0x043C
sqdecw z16.s, vl7, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG127: ;; offset=0x0444
sqdecw z16.s, vl7, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG128: ;; offset=0x044C
sqdecw z16.s, vl7, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG129: ;; offset=0x0454
sqdecw z16.s, vl7, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG130: ;; offset=0x045C
sqdecw z16.s, vl7, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG131: ;; offset=0x0464
sqdecw z16.s, vl8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG132: ;; offset=0x046C
sqdecw z16.s, vl8, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG133: ;; offset=0x0474
sqdecw z16.s, vl8, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG134: ;; offset=0x047C
sqdecw z16.s, vl8, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG135: ;; offset=0x0484
sqdecw z16.s, vl8, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG136: ;; offset=0x048C
sqdecw z16.s, vl8, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG137: ;; offset=0x0494
sqdecw z16.s, vl8, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG138: ;; offset=0x049C
sqdecw z16.s, vl8, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG139: ;; offset=0x04A4
sqdecw z16.s, vl8, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG140: ;; offset=0x04AC
sqdecw z16.s, vl8, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG141: ;; offset=0x04B4
sqdecw z16.s, vl8, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG142: ;; offset=0x04BC
sqdecw z16.s, vl8, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG143: ;; offset=0x04C4
sqdecw z16.s, vl8, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG144: ;; offset=0x04CC
sqdecw z16.s, vl8, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG145: ;; offset=0x04D4
sqdecw z16.s, vl8, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG146: ;; offset=0x04DC
sqdecw z16.s, vl8, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG147: ;; offset=0x04E4
sqdecw z16.s, vl16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG148: ;; offset=0x04EC
sqdecw z16.s, vl16, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG149: ;; offset=0x04F4
sqdecw z16.s, vl16, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG150: ;; offset=0x04FC
sqdecw z16.s, vl16, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG151: ;; offset=0x0504
sqdecw z16.s, vl16, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG152: ;; offset=0x050C
sqdecw z16.s, vl16, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG153: ;; offset=0x0514
sqdecw z16.s, vl16, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG154: ;; offset=0x051C
sqdecw z16.s, vl16, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG155: ;; offset=0x0524
sqdecw z16.s, vl16, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG156: ;; offset=0x052C
sqdecw z16.s, vl16, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG157: ;; offset=0x0534
sqdecw z16.s, vl16, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG158: ;; offset=0x053C
sqdecw z16.s, vl16, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG159: ;; offset=0x0544
sqdecw z16.s, vl16, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG160: ;; offset=0x054C
sqdecw z16.s, vl16, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG161: ;; offset=0x0554
sqdecw z16.s, vl16, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG162: ;; offset=0x055C
sqdecw z16.s, vl16, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG163: ;; offset=0x0564
sqdecw z16.s, vl32
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG164: ;; offset=0x056C
sqdecw z16.s, vl32, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG165: ;; offset=0x0574
sqdecw z16.s, vl32, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG166: ;; offset=0x057C
sqdecw z16.s, vl32, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG167: ;; offset=0x0584
sqdecw z16.s, vl32, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG168: ;; offset=0x058C
sqdecw z16.s, vl32, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG169: ;; offset=0x0594
sqdecw z16.s, vl32, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG170: ;; offset=0x059C
sqdecw z16.s, vl32, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG171: ;; offset=0x05A4
sqdecw z16.s, vl32, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG172: ;; offset=0x05AC
sqdecw z16.s, vl32, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG173: ;; offset=0x05B4
sqdecw z16.s, vl32, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG174: ;; offset=0x05BC
sqdecw z16.s, vl32, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG175: ;; offset=0x05C4
sqdecw z16.s, vl32, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG176: ;; offset=0x05CC
sqdecw z16.s, vl32, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG177: ;; offset=0x05D4
sqdecw z16.s, vl32, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG178: ;; offset=0x05DC
sqdecw z16.s, vl32, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG179: ;; offset=0x05E4
sqdecw z16.s, vl64
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG180: ;; offset=0x05EC
sqdecw z16.s, vl64, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG181: ;; offset=0x05F4
sqdecw z16.s, vl64, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG182: ;; offset=0x05FC
sqdecw z16.s, vl64, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG183: ;; offset=0x0604
sqdecw z16.s, vl64, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG184: ;; offset=0x060C
sqdecw z16.s, vl64, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG185: ;; offset=0x0614
sqdecw z16.s, vl64, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG186: ;; offset=0x061C
sqdecw z16.s, vl64, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG187: ;; offset=0x0624
sqdecw z16.s, vl64, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG188: ;; offset=0x062C
sqdecw z16.s, vl64, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG189: ;; offset=0x0634
sqdecw z16.s, vl64, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG190: ;; offset=0x063C
sqdecw z16.s, vl64, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG191: ;; offset=0x0644
sqdecw z16.s, vl64, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG192: ;; offset=0x064C
sqdecw z16.s, vl64, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG193: ;; offset=0x0654
sqdecw z16.s, vl64, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG194: ;; offset=0x065C
sqdecw z16.s, vl64, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG195: ;; offset=0x0664
sqdecw z16.s, vl128
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG196: ;; offset=0x066C
sqdecw z16.s, vl128, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG197: ;; offset=0x0674
sqdecw z16.s, vl128, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG198: ;; offset=0x067C
sqdecw z16.s, vl128, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG199: ;; offset=0x0684
sqdecw z16.s, vl128, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG200: ;; offset=0x068C
sqdecw z16.s, vl128, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG201: ;; offset=0x0694
sqdecw z16.s, vl128, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG202: ;; offset=0x069C
sqdecw z16.s, vl128, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG203: ;; offset=0x06A4
sqdecw z16.s, vl128, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG204: ;; offset=0x06AC
sqdecw z16.s, vl128, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG205: ;; offset=0x06B4
sqdecw z16.s, vl128, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG206: ;; offset=0x06BC
sqdecw z16.s, vl128, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG207: ;; offset=0x06C4
sqdecw z16.s, vl128, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG208: ;; offset=0x06CC
sqdecw z16.s, vl128, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG209: ;; offset=0x06D4
sqdecw z16.s, vl128, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG210: ;; offset=0x06DC
sqdecw z16.s, vl128, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG211: ;; offset=0x06E4
sqdecw z16.s, vl256
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG212: ;; offset=0x06EC
sqdecw z16.s, vl256, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG213: ;; offset=0x06F4
sqdecw z16.s, vl256, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG214: ;; offset=0x06FC
sqdecw z16.s, vl256, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG215: ;; offset=0x0704
sqdecw z16.s, vl256, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG216: ;; offset=0x070C
sqdecw z16.s, vl256, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG217: ;; offset=0x0714
sqdecw z16.s, vl256, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG218: ;; offset=0x071C
sqdecw z16.s, vl256, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG219: ;; offset=0x0724
sqdecw z16.s, vl256, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG220: ;; offset=0x072C
sqdecw z16.s, vl256, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG221: ;; offset=0x0734
sqdecw z16.s, vl256, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG222: ;; offset=0x073C
sqdecw z16.s, vl256, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG223: ;; offset=0x0744
sqdecw z16.s, vl256, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG224: ;; offset=0x074C
sqdecw z16.s, vl256, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG225: ;; offset=0x0754
sqdecw z16.s, vl256, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG226: ;; offset=0x075C
sqdecw z16.s, vl256, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG227: ;; offset=0x0764
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG228: ;; offset=0x076C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG229: ;; offset=0x0774
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG230: ;; offset=0x077C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG231: ;; offset=0x0784
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG232: ;; offset=0x078C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG233: ;; offset=0x0794
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG234: ;; offset=0x079C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG235: ;; offset=0x07A4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG236: ;; offset=0x07AC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG237: ;; offset=0x07B4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG238: ;; offset=0x07BC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG239: ;; offset=0x07C4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG240: ;; offset=0x07CC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG241: ;; offset=0x07D4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG242: ;; offset=0x07DC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG243: ;; offset=0x07E4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG244: ;; offset=0x07EC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG245: ;; offset=0x07F4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG246: ;; offset=0x07FC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG247: ;; offset=0x0804
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG248: ;; offset=0x080C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG249: ;; offset=0x0814
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG250: ;; offset=0x081C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG251: ;; offset=0x0824
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG252: ;; offset=0x082C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG253: ;; offset=0x0834
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG254: ;; offset=0x083C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG255: ;; offset=0x0844
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG256: ;; offset=0x084C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG257: ;; offset=0x0854
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG258: ;; offset=0x085C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG259: ;; offset=0x0864
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG260: ;; offset=0x086C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG261: ;; offset=0x0874
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG262: ;; offset=0x087C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG263: ;; offset=0x0884
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG264: ;; offset=0x088C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG265: ;; offset=0x0894
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG266: ;; offset=0x089C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG267: ;; offset=0x08A4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG268: ;; offset=0x08AC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG269: ;; offset=0x08B4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG270: ;; offset=0x08BC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG271: ;; offset=0x08C4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG272: ;; offset=0x08CC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG273: ;; offset=0x08D4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG274: ;; offset=0x08DC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG275: ;; offset=0x08E4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG276: ;; offset=0x08EC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG277: ;; offset=0x08F4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG278: ;; offset=0x08FC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG279: ;; offset=0x0904
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG280: ;; offset=0x090C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG281: ;; offset=0x0914
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG282: ;; offset=0x091C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG283: ;; offset=0x0924
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG284: ;; offset=0x092C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG285: ;; offset=0x0934
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG286: ;; offset=0x093C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG287: ;; offset=0x0944
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG288: ;; offset=0x094C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG289: ;; offset=0x0954
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG290: ;; offset=0x095C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG291: ;; offset=0x0964
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG292: ;; offset=0x096C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG293: ;; offset=0x0974
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG294: ;; offset=0x097C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG295: ;; offset=0x0984
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG296: ;; offset=0x098C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG297: ;; offset=0x0994
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG298: ;; offset=0x099C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG299: ;; offset=0x09A4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG300: ;; offset=0x09AC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG301: ;; offset=0x09B4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG302: ;; offset=0x09BC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG303: ;; offset=0x09C4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG304: ;; offset=0x09CC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG305: ;; offset=0x09D4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG306: ;; offset=0x09DC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG307: ;; offset=0x09E4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG308: ;; offset=0x09EC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG309: ;; offset=0x09F4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG310: ;; offset=0x09FC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG311: ;; offset=0x0A04
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG312: ;; offset=0x0A0C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG313: ;; offset=0x0A14
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG314: ;; offset=0x0A1C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG315: ;; offset=0x0A24
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG316: ;; offset=0x0A2C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG317: ;; offset=0x0A34
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG318: ;; offset=0x0A3C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG319: ;; offset=0x0A44
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG320: ;; offset=0x0A4C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG321: ;; offset=0x0A54
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG322: ;; offset=0x0A5C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG323: ;; offset=0x0A64
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG324: ;; offset=0x0A6C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG325: ;; offset=0x0A74
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG326: ;; offset=0x0A7C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG327: ;; offset=0x0A84
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG328: ;; offset=0x0A8C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG329: ;; offset=0x0A94
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG330: ;; offset=0x0A9C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG331: ;; offset=0x0AA4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG332: ;; offset=0x0AAC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG333: ;; offset=0x0AB4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG334: ;; offset=0x0ABC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG335: ;; offset=0x0AC4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG336: ;; offset=0x0ACC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG337: ;; offset=0x0AD4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG338: ;; offset=0x0ADC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG339: ;; offset=0x0AE4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG340: ;; offset=0x0AEC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG341: ;; offset=0x0AF4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG342: ;; offset=0x0AFC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG343: ;; offset=0x0B04
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG344: ;; offset=0x0B0C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG345: ;; offset=0x0B14
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG346: ;; offset=0x0B1C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG347: ;; offset=0x0B24
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG348: ;; offset=0x0B2C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG349: ;; offset=0x0B34
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG350: ;; offset=0x0B3C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG351: ;; offset=0x0B44
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG352: ;; offset=0x0B4C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG353: ;; offset=0x0B54
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG354: ;; offset=0x0B5C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG355: ;; offset=0x0B64
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG356: ;; offset=0x0B6C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG357: ;; offset=0x0B74
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG358: ;; offset=0x0B7C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG359: ;; offset=0x0B84
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG360: ;; offset=0x0B8C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG361: ;; offset=0x0B94
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG362: ;; offset=0x0B9C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG363: ;; offset=0x0BA4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG364: ;; offset=0x0BAC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG365: ;; offset=0x0BB4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG366: ;; offset=0x0BBC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG367: ;; offset=0x0BC4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG368: ;; offset=0x0BCC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG369: ;; offset=0x0BD4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG370: ;; offset=0x0BDC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG371: ;; offset=0x0BE4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG372: ;; offset=0x0BEC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG373: ;; offset=0x0BF4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG374: ;; offset=0x0BFC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG375: ;; offset=0x0C04
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG376: ;; offset=0x0C0C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG377: ;; offset=0x0C14
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG378: ;; offset=0x0C1C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG379: ;; offset=0x0C24
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG380: ;; offset=0x0C2C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG381: ;; offset=0x0C34
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG382: ;; offset=0x0C3C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG383: ;; offset=0x0C44
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG384: ;; offset=0x0C4C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG385: ;; offset=0x0C54
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG386: ;; offset=0x0C5C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG387: ;; offset=0x0C64
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG388: ;; offset=0x0C6C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG389: ;; offset=0x0C74
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG390: ;; offset=0x0C7C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG391: ;; offset=0x0C84
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG392: ;; offset=0x0C8C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG393: ;; offset=0x0C94
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG394: ;; offset=0x0C9C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG395: ;; offset=0x0CA4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG396: ;; offset=0x0CAC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG397: ;; offset=0x0CB4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG398: ;; offset=0x0CBC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG399: ;; offset=0x0CC4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG400: ;; offset=0x0CCC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG401: ;; offset=0x0CD4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG402: ;; offset=0x0CDC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG403: ;; offset=0x0CE4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG404: ;; offset=0x0CEC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG405: ;; offset=0x0CF4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG406: ;; offset=0x0CFC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG407: ;; offset=0x0D04
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG408: ;; offset=0x0D0C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG409: ;; offset=0x0D14
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG410: ;; offset=0x0D1C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG411: ;; offset=0x0D24
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG412: ;; offset=0x0D2C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG413: ;; offset=0x0D34
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG414: ;; offset=0x0D3C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG415: ;; offset=0x0D44
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG416: ;; offset=0x0D4C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG417: ;; offset=0x0D54
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG418: ;; offset=0x0D5C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG419: ;; offset=0x0D64
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG420: ;; offset=0x0D6C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG421: ;; offset=0x0D74
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG422: ;; offset=0x0D7C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG423: ;; offset=0x0D84
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG424: ;; offset=0x0D8C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG425: ;; offset=0x0D94
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG426: ;; offset=0x0D9C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG427: ;; offset=0x0DA4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG428: ;; offset=0x0DAC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG429: ;; offset=0x0DB4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG430: ;; offset=0x0DBC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG431: ;; offset=0x0DC4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG432: ;; offset=0x0DCC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG433: ;; offset=0x0DD4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG434: ;; offset=0x0DDC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG435: ;; offset=0x0DE4
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG436: ;; offset=0x0DEC
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG437: ;; offset=0x0DF4
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG438: ;; offset=0x0DFC
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG439: ;; offset=0x0E04
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG440: ;; offset=0x0E0C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG441: ;; offset=0x0E14
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG442: ;; offset=0x0E1C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG443: ;; offset=0x0E24
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG444: ;; offset=0x0E2C
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG445: ;; offset=0x0E34
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG446: ;; offset=0x0E3C
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG447: ;; offset=0x0E44
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG448: ;; offset=0x0E4C
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG449: ;; offset=0x0E54
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG450: ;; offset=0x0E5C
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG451: ;; offset=0x0E64
sqdecw z16.s, invalid
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG452: ;; offset=0x0E6C
sqdecw z16.s, invalid, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG453: ;; offset=0x0E74
sqdecw z16.s, invalid, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG454: ;; offset=0x0E7C
sqdecw z16.s, invalid, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG455: ;; offset=0x0E84
sqdecw z16.s, invalid, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG456: ;; offset=0x0E8C
sqdecw z16.s, invalid, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG457: ;; offset=0x0E94
sqdecw z16.s, invalid, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG458: ;; offset=0x0E9C
sqdecw z16.s, invalid, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG459: ;; offset=0x0EA4
sqdecw z16.s, invalid, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG460: ;; offset=0x0EAC
sqdecw z16.s, invalid, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG461: ;; offset=0x0EB4
sqdecw z16.s, invalid, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG462: ;; offset=0x0EBC
sqdecw z16.s, invalid, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG463: ;; offset=0x0EC4
sqdecw z16.s, invalid, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG464: ;; offset=0x0ECC
sqdecw z16.s, invalid, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG465: ;; offset=0x0ED4
sqdecw z16.s, invalid, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG466: ;; offset=0x0EDC
sqdecw z16.s, invalid, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG467: ;; offset=0x0EE4
sqdecw z16.s, mul4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG468: ;; offset=0x0EEC
sqdecw z16.s, mul4, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG469: ;; offset=0x0EF4
sqdecw z16.s, mul4, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG470: ;; offset=0x0EFC
sqdecw z16.s, mul4, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG471: ;; offset=0x0F04
sqdecw z16.s, mul4, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG472: ;; offset=0x0F0C
sqdecw z16.s, mul4, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG473: ;; offset=0x0F14
sqdecw z16.s, mul4, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG474: ;; offset=0x0F1C
sqdecw z16.s, mul4, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG475: ;; offset=0x0F24
sqdecw z16.s, mul4, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG476: ;; offset=0x0F2C
sqdecw z16.s, mul4, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG477: ;; offset=0x0F34
sqdecw z16.s, mul4, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG478: ;; offset=0x0F3C
sqdecw z16.s, mul4, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG479: ;; offset=0x0F44
sqdecw z16.s, mul4, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG480: ;; offset=0x0F4C
sqdecw z16.s, mul4, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG481: ;; offset=0x0F54
sqdecw z16.s, mul4, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG482: ;; offset=0x0F5C
sqdecw z16.s, mul4, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG483: ;; offset=0x0F64
sqdecw z16.s, mul3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG484: ;; offset=0x0F6C
sqdecw z16.s, mul3, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG485: ;; offset=0x0F74
sqdecw z16.s, mul3, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG486: ;; offset=0x0F7C
sqdecw z16.s, mul3, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG487: ;; offset=0x0F84
sqdecw z16.s, mul3, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG488: ;; offset=0x0F8C
sqdecw z16.s, mul3, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG489: ;; offset=0x0F94
sqdecw z16.s, mul3, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG490: ;; offset=0x0F9C
sqdecw z16.s, mul3, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG491: ;; offset=0x0FA4
sqdecw z16.s, mul3, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG492: ;; offset=0x0FAC
sqdecw z16.s, mul3, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG493: ;; offset=0x0FB4
sqdecw z16.s, mul3, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG494: ;; offset=0x0FBC
sqdecw z16.s, mul3, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG495: ;; offset=0x0FC4
sqdecw z16.s, mul3, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG496: ;; offset=0x0FCC
sqdecw z16.s, mul3, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG497: ;; offset=0x0FD4
sqdecw z16.s, mul3, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG498: ;; offset=0x0FDC
sqdecw z16.s, mul3, mul #16
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG499: ;; offset=0x0FE4
sqdecw z16.s, all
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG500: ;; offset=0x0FEC
sqdecw z16.s, all, mul #2
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG501: ;; offset=0x0FF4
sqdecw z16.s, all, mul #3
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG502: ;; offset=0x0FFC
sqdecw z16.s, all, mul #4
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG503: ;; offset=0x1004
sqdecw z16.s, all, mul #5
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG504: ;; offset=0x100C
sqdecw z16.s, all, mul #6
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG505: ;; offset=0x1014
sqdecw z16.s, all, mul #7
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG506: ;; offset=0x101C
sqdecw z16.s, all, mul #8
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG507: ;; offset=0x1024
sqdecw z16.s, all, mul #9
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG508: ;; offset=0x102C
sqdecw z16.s, all, mul #10
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG509: ;; offset=0x1034
sqdecw z16.s, all, mul #11
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG510: ;; offset=0x103C
sqdecw z16.s, all, mul #12
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG511: ;; offset=0x1044
sqdecw z16.s, all, mul #13
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG512: ;; offset=0x104C
sqdecw z16.s, all, mul #14
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG513: ;; offset=0x1054
sqdecw z16.s, all, mul #15
b G_M8891_IG515
;; size=8 bbWeight=1 PerfScore 3.00
G_M8891_IG514: ;; offset=0x105C
sqdecw z16.s, all, mul #16
;; size=4 bbWeight=1 PerfScore 2.00
G_M8891_IG515: ;; offset=0x1060
and w0, w0, #15
lsr w1, w1, #4
add w0, w0, #1
mov v0.16b, v16.16b
;; size=16 bbWeight=1 PerfScore 2.50
G_M8891_IG516: ;; offset=0x1070
ldp fp, lr, [sp], #0x30
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
G_M8891_IG517: ;; offset=0x1078
bl CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
brk_unix #0
;; size=8 bbWeight=0 PerfScore 0.00
; Total bytes of code 4224, prolog size 8, PerfScore 1564.00, instruction count 1056, allocated bytes for code 4224 (MethodHash=e486dd44) for method System.Runtime.Intrinsics.Arm.Sve:SaturatingDecrementBy32BitElementCount(System.Numerics.Vector1[int],ubyte,ubyte):System.Numerics.Vector
1[int] (Tier0)
; ============================================================
</details>
@dotnet/arm64-contrib @kunalspathak : This is ready for review. Tests are failing during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some questions/comments.
@@ -3256,6 +3257,58 @@ | |||
("SveVecReduceUnOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_OrAcross_uint", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "OrAcross", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt32", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt32", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetUInt32()", ["ValidateReduceOpResult"] = "Helpers.OrAcross(firstOp) != result[0]", ["ValidateRemainingResults"] = "result[i] != 0"}), | |||
("SveVecReduceUnOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_OrAcross_ulong", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "OrAcross", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt64", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt64", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetUInt64()", ["ValidateReduceOpResult"] = "Helpers.OrAcross(firstOp) != result[0]", ["ValidateRemainingResults"] = "result[i] != 0"}), | |||
|
|||
("ScalarImm2UnOpTest.template", new Dictionary<string, string> {["TestName"] = "Sve_SaturatingDecrementBy16BitElementCount_int", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "SaturatingDecrementBy16BitElementCount", ["RetBaseType"] = "Int32", ["Op1BaseType"] = "Int32", ["Op2BaseType"] = "Byte", ["Op3BaseType"] = "SveMaskPattern", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetInt32()", ["Imm"] = "(Byte)2", ["Imm2"] = "SveMaskPattern.All", ["ValidateResult"] = "isUnexpectedResult = (result != (data - (2 * Unsafe.SizeOf<Vector<Int16>>() / sizeof(Int16))));",}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know we haven't done this for APIs we added so far for AdvSimd, but can we add test cases that passes invalid immediate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we do that? The program would error and exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be similar to how we test APIs on unsupported platform in RunUnsupportedScenario()
. Basically, the test method needs to catch the exception and make sure it is ArgumentException
with the right message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing expanded:
- Tests for invalid Imm and Imm2 values in the template
- Generate file uses random values for Imm, Imm2, InvalidImm and InvalidImm2
- Added helper methods to calculate the number of elements in a vector based on a pattern.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Outdated
Show resolved
Hide resolved
|
||
|
||
|
||
private static int AddSaturateScalar(int left, int right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding - any preference?
Change-Id: I7fe4dc44c2dca9eb09e5b3540b89874001fb062a
@@ -512,7 +558,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic, | |||
return gtNewScalarHWIntrinsicNode(TYP_VOID, intrinsic); | |||
} | |||
|
|||
assert(category != HW_Category_Scalar); | |||
bool isScalar = (category == HW_Category_Scalar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we be explicit about this?
assert((category == HW_Category_Scalar) || (id == SaturatingDecrementBy8BitElementCount) || (id == SaturatingIncrementBy8BitElementCount))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we've already potentially switched to the scalar variants. So the full assert would be:
assert((category != HW_Category_Scalar) || (id == SaturatingDecrementBy8BitElementCount) || (id == SaturatingIncrementBy8BitElementCount)
|| (id == NI_Sve_SaturatingDecrementBy16BitElementCountScalar) || (id == NI_Sve_SaturatingDecrementBy32BitElementCountScalar)
|| (id == NI_Sve_SaturatingDecrementBy64BitElementCountScalar) || (id == NI_Sve_SaturatingIncrementBy16BitElementCountScalar)
|| (id == NI_Sve_SaturatingIncrementBy32BitElementCountScalar) || (id == NI_Sve_SaturatingIncrementBy64BitElementCountScalar));
Which feels excessive. Happy to switch if you still want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this by adding a DEBUG only bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not HW_Category_Scalar
and are not marked with isValidScalarIntrinsic
? does it work as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reminding me offline that they are HW_Category_SIMD
and so should be fine.
src/tests/JIT/HardwareIntrinsics/Arm/Shared/_SveImm2UnaryOpTestTemplate.template
Outdated
Show resolved
Hide resolved
All comments addressed as suggested (except the large assert) |
@@ -3,7 +3,7 @@ | |||
|
|||
/****************************************************************************** | |||
* This file is auto-generated from a template file by the GenerateTests.csx * | |||
* script in tests\src\JIT\HardwareIntrinsics\X86\Shared. In order to make * | |||
* script in tests\src\JIT\HardwareIntrinsics\Arm\Shared. In order to make * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for fixing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
stress tests:
|
i will merge this, once you give a go ahead that these are existing issues related to predicate register. |
src/coreclr/jit/lsraarm64.cpp
Outdated
@@ -1964,8 +1983,6 @@ int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree, int* pDstCou | |||
} | |||
} | |||
|
|||
buildInternalRegisterUses(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a latent bug.
Previously, we define the use for the internal register, and then afterwards build the definition of the destination. There is only one use for the internal, therefore the destination def is free to reuse the same register.
This bug was never seen in practice because all HWIntrisnsics that had immediate values had a vector register for the destination. The internal register is always a scalar register, and so cannot be reused.
SaturatingDecrement is the first HWIntrisnsics with both an immediate and a scalar destination.
Fix is to simply move the internal use after the destination definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always want all the uses to be present before the definition.
SaturatingDecrement is the first HWIntrisnsics with both an immediate and a scalar destination.
Yes, and hence as we discussed yesterday, the proper fix would be to mark the internal register as delay free under certain circumstances such as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I didn't get that it was the internal register that should be delayed. Makes sense now. And fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And with that change all the stress tests are working.
Stress test results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Adds a new flag HasScalarVariant. This is for intrinsics which have both scalar and vector variants. During import, if the intrinsic does not have a vector arguments, then it switches the to the scalar version.
These intrinsics have 2 immediates, therefore can't use the table lookup when the values aren't constants (as a NxM table would be required). Instead, the intrinsic must fallback to a C# implementation.
TODO: I'm not sure how to add a C# fallback.