Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Disable FP loads/stores when fp-armv8 not enabled #77817

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

ostannard
Copy link
Collaborator

Most of the floating-point instructions are already gated on the fp-armv8 subtarget feature (or some other feature), but most of the load and store instructions, and one move instruction, were not.

I found this list of instructions with a script which consumes the output of llvm-tblgen --dump-json, looking for instructions which have an FPR operand but no predicate. That script now finds zero instructions.

This only affects assembly, not codegen, because the floating-point types and registers are already not marked as legal when the FPU is disabled, so it is impossible for any of these to be selected.

Most of the floating-point instructions are already gated on the
fp-armv8 subtarget feature (or some other feature), but most of the load
and store instructions, and one move instruction, were not.

I found this list of instructions with a script which consumes the
output of llvm-tblgen --dump-json, looking for instructions which have
an FPR operand but no predicate. That script now finds zero
instructions.

This only affects assembly, not codegen, because the floating-point
types and registers are already not marked as legal when the FPU is
disabled, so it is impossible for any of these to be selected.
@ostannard ostannard added backend:AArch64 mc Machine (object) code labels Jan 11, 2024
@ostannard ostannard requested a review from stuij January 11, 2024 18:43
@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-mc

Author: None (ostannard)

Changes

Most of the floating-point instructions are already gated on the fp-armv8 subtarget feature (or some other feature), but most of the load and store instructions, and one move instruction, were not.

I found this list of instructions with a script which consumes the output of llvm-tblgen --dump-json, looking for instructions which have an FPR operand but no predicate. That script now finds zero instructions.

This only affects assembly, not codegen, because the floating-point types and registers are already not marked as legal when the FPU is disabled, so it is impossible for any of these to be selected.


Patch is 22.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77817.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+41-1)
  • (modified) llvm/lib/Target/AArch64/AArch64SystemOperands.td (+2)
  • (added) llvm/test/MC/AArch64/no-fp-errors.s (+193)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 62b2bf490f37a2..3f4875998fc004 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -2925,27 +2925,33 @@ def UDF : UDFType<0, "udf">;
 // Pair (indexed, offset)
 defm LDPW : LoadPairOffset<0b00, 0, GPR32z, simm7s4, "ldp">;
 defm LDPX : LoadPairOffset<0b10, 0, GPR64z, simm7s8, "ldp">;
+let Predicates = [HasFPARMv8] in {
 defm LDPS : LoadPairOffset<0b00, 1, FPR32Op, simm7s4, "ldp">;
 defm LDPD : LoadPairOffset<0b01, 1, FPR64Op, simm7s8, "ldp">;
 defm LDPQ : LoadPairOffset<0b10, 1, FPR128Op, simm7s16, "ldp">;
+}
 
 defm LDPSW : LoadPairOffset<0b01, 0, GPR64z, simm7s4, "ldpsw">;
 
 // Pair (pre-indexed)
 def LDPWpre : LoadPairPreIdx<0b00, 0, GPR32z, simm7s4, "ldp">;
 def LDPXpre : LoadPairPreIdx<0b10, 0, GPR64z, simm7s8, "ldp">;
+let Predicates = [HasFPARMv8] in {
 def LDPSpre : LoadPairPreIdx<0b00, 1, FPR32Op, simm7s4, "ldp">;
 def LDPDpre : LoadPairPreIdx<0b01, 1, FPR64Op, simm7s8, "ldp">;
 def LDPQpre : LoadPairPreIdx<0b10, 1, FPR128Op, simm7s16, "ldp">;
+}
 
 def LDPSWpre : LoadPairPreIdx<0b01, 0, GPR64z, simm7s4, "ldpsw">;
 
 // Pair (post-indexed)
 def LDPWpost : LoadPairPostIdx<0b00, 0, GPR32z, simm7s4, "ldp">;
 def LDPXpost : LoadPairPostIdx<0b10, 0, GPR64z, simm7s8, "ldp">;
+let Predicates = [HasFPARMv8] in {
 def LDPSpost : LoadPairPostIdx<0b00, 1, FPR32Op, simm7s4, "ldp">;
 def LDPDpost : LoadPairPostIdx<0b01, 1, FPR64Op, simm7s8, "ldp">;
 def LDPQpost : LoadPairPostIdx<0b10, 1, FPR128Op, simm7s16, "ldp">;
+}
 
 def LDPSWpost : LoadPairPostIdx<0b01, 0, GPR64z, simm7s4, "ldpsw">;
 
@@ -2953,9 +2959,11 @@ def LDPSWpost : LoadPairPostIdx<0b01, 0, GPR64z, simm7s4, "ldpsw">;
 // Pair (no allocate)
 defm LDNPW : LoadPairNoAlloc<0b00, 0, GPR32z, simm7s4, "ldnp">;
 defm LDNPX : LoadPairNoAlloc<0b10, 0, GPR64z, simm7s8, "ldnp">;
+let Predicates = [HasFPARMv8] in {
 defm LDNPS : LoadPairNoAlloc<0b00, 1, FPR32Op, simm7s4, "ldnp">;
 defm LDNPD : LoadPairNoAlloc<0b01, 1, FPR64Op, simm7s8, "ldnp">;
 defm LDNPQ : LoadPairNoAlloc<0b10, 1, FPR128Op, simm7s16, "ldnp">;
+}
 
 def : Pat<(AArch64ldp (am_indexed7s64 GPR64sp:$Rn, simm7s8:$offset)),
           (LDPXi GPR64sp:$Rn, simm7s8:$offset)>;
@@ -2973,11 +2981,13 @@ defm LDRW  : Load32RO<0b10, 0, 0b01, GPR32, "ldr", i32, load>;
 defm LDRX  : Load64RO<0b11, 0, 0b01, GPR64, "ldr", i64, load>;
 
 // Floating-point
+let Predicates = [HasFPARMv8] in {
 defm LDRB : Load8RO<0b00,   1, 0b01, FPR8Op,   "ldr", i8, load>;
 defm LDRH : Load16RO<0b01,  1, 0b01, FPR16Op,  "ldr", f16, load>;
 defm LDRS : Load32RO<0b10,  1, 0b01, FPR32Op,  "ldr", f32, load>;
 defm LDRD : Load64RO<0b11,  1, 0b01, FPR64Op,  "ldr", f64, load>;
 defm LDRQ : Load128RO<0b00, 1, 0b11, FPR128Op, "ldr", f128, load>;
+}
 
 // Load sign-extended half-word
 defm LDRSHW : Load16RO<0b01, 0, 0b11, GPR32, "ldrsh", i32, sextloadi16>;
@@ -3147,6 +3157,7 @@ defm LDRX : LoadUI<0b11, 0, 0b01, GPR64z, uimm12s8, "ldr",
 defm LDRW : LoadUI<0b10, 0, 0b01, GPR32z, uimm12s4, "ldr",
                    [(set GPR32z:$Rt,
                          (load (am_indexed32 GPR64sp:$Rn, uimm12s4:$offset)))]>;
+let Predicates = [HasFPARMv8] in {
 defm LDRB : LoadUI<0b00, 1, 0b01, FPR8Op, uimm12s1, "ldr",
                    [(set FPR8Op:$Rt,
                          (load (am_indexed8 GPR64sp:$Rn, uimm12s1:$offset)))]>;
@@ -3162,6 +3173,7 @@ defm LDRD : LoadUI<0b11, 1, 0b01, FPR64Op, uimm12s8, "ldr",
 defm LDRQ : LoadUI<0b00, 1, 0b11, FPR128Op, uimm12s16, "ldr",
                  [(set (f128 FPR128Op:$Rt),
                        (load (am_indexed128 GPR64sp:$Rn, uimm12s16:$offset)))]>;
+}
 
 // bf16 load pattern
 def : Pat <(bf16 (load (am_indexed16 GPR64sp:$Rn, uimm12s2:$offset))),
@@ -3339,12 +3351,14 @@ def LDRWl : LoadLiteral<0b00, 0, GPR32z, "ldr",
   [(set GPR32z:$Rt, (load (AArch64adr alignedglobal:$label)))]>;
 def LDRXl : LoadLiteral<0b01, 0, GPR64z, "ldr",
   [(set GPR64z:$Rt, (load (AArch64adr alignedglobal:$label)))]>;
+let Predicates = [HasFPARMv8] in {
 def LDRSl : LoadLiteral<0b00, 1, FPR32Op, "ldr",
   [(set (f32 FPR32Op:$Rt), (load (AArch64adr alignedglobal:$label)))]>;
 def LDRDl : LoadLiteral<0b01, 1, FPR64Op, "ldr",
   [(set (f64 FPR64Op:$Rt), (load (AArch64adr alignedglobal:$label)))]>;
 def LDRQl : LoadLiteral<0b10, 1, FPR128Op, "ldr",
   [(set (f128 FPR128Op:$Rt), (load (AArch64adr alignedglobal:$label)))]>;
+}
 
 // load sign-extended word
 def LDRSWl : LoadLiteral<0b10, 0, GPR64z, "ldrsw",
@@ -3367,6 +3381,7 @@ defm LDURX : LoadUnscaled<0b11, 0, 0b01, GPR64z, "ldur",
 defm LDURW : LoadUnscaled<0b10, 0, 0b01, GPR32z, "ldur",
                     [(set GPR32z:$Rt,
                           (load (am_unscaled32 GPR64sp:$Rn, simm9:$offset)))]>;
+let Predicates = [HasFPARMv8] in {
 defm LDURB : LoadUnscaled<0b00, 1, 0b01, FPR8Op, "ldur",
                     [(set FPR8Op:$Rt,
                           (load (am_unscaled8 GPR64sp:$Rn, simm9:$offset)))]>;
@@ -3382,6 +3397,7 @@ defm LDURD : LoadUnscaled<0b11, 1, 0b01, FPR64Op, "ldur",
 defm LDURQ : LoadUnscaled<0b00, 1, 0b11, FPR128Op, "ldur",
                     [(set (f128 FPR128Op:$Rt),
                           (load (am_unscaled128 GPR64sp:$Rn, simm9:$offset)))]>;
+}
 
 defm LDURHH
     : LoadUnscaled<0b01, 0, 0b01, GPR32, "ldurh",
@@ -3641,11 +3657,13 @@ defm LDTRSW  : LoadUnprivileged<0b10, 0, 0b10, GPR64, "ldtrsw">;
 // (immediate pre-indexed)
 def LDRWpre : LoadPreIdx<0b10, 0, 0b01, GPR32z, "ldr">;
 def LDRXpre : LoadPreIdx<0b11, 0, 0b01, GPR64z, "ldr">;
+let Predicates = [HasFPARMv8] in {
 def LDRBpre : LoadPreIdx<0b00, 1, 0b01, FPR8Op,  "ldr">;
 def LDRHpre : LoadPreIdx<0b01, 1, 0b01, FPR16Op, "ldr">;
 def LDRSpre : LoadPreIdx<0b10, 1, 0b01, FPR32Op, "ldr">;
 def LDRDpre : LoadPreIdx<0b11, 1, 0b01, FPR64Op, "ldr">;
 def LDRQpre : LoadPreIdx<0b00, 1, 0b11, FPR128Op, "ldr">;
+}
 
 // load sign-extended half-word
 def LDRSHWpre : LoadPreIdx<0b01, 0, 0b11, GPR32z, "ldrsh">;
@@ -3666,11 +3684,13 @@ def LDRSWpre : LoadPreIdx<0b10, 0, 0b10, GPR64z, "ldrsw">;
 // (immediate post-indexed)
 def LDRWpost : LoadPostIdx<0b10, 0, 0b01, GPR32z, "ldr">;
 def LDRXpost : LoadPostIdx<0b11, 0, 0b01, GPR64z, "ldr">;
+let Predicates = [HasFPARMv8] in {
 def LDRBpost : LoadPostIdx<0b00, 1, 0b01, FPR8Op,  "ldr">;
 def LDRHpost : LoadPostIdx<0b01, 1, 0b01, FPR16Op, "ldr">;
 def LDRSpost : LoadPostIdx<0b10, 1, 0b01, FPR32Op, "ldr">;
 def LDRDpost : LoadPostIdx<0b11, 1, 0b01, FPR64Op, "ldr">;
 def LDRQpost : LoadPostIdx<0b00, 1, 0b11, FPR128Op, "ldr">;
+}
 
 // load sign-extended half-word
 def LDRSHWpost : LoadPostIdx<0b01, 0, 0b11, GPR32z, "ldrsh">;
@@ -3695,30 +3715,38 @@ def LDRSWpost : LoadPostIdx<0b10, 0, 0b10, GPR64z, "ldrsw">;
 // FIXME: Use dedicated range-checked addressing mode operand here.
 defm STPW : StorePairOffset<0b00, 0, GPR32z, simm7s4, "stp">;
 defm STPX : StorePairOffset<0b10, 0, GPR64z, simm7s8, "stp">;
+let Predicates = [HasFPARMv8] in {
 defm STPS : StorePairOffset<0b00, 1, FPR32Op, simm7s4, "stp">;
 defm STPD : StorePairOffset<0b01, 1, FPR64Op, simm7s8, "stp">;
 defm STPQ : StorePairOffset<0b10, 1, FPR128Op, simm7s16, "stp">;
+}
 
 // Pair (pre-indexed)
 def STPWpre : StorePairPreIdx<0b00, 0, GPR32z, simm7s4, "stp">;
 def STPXpre : StorePairPreIdx<0b10, 0, GPR64z, simm7s8, "stp">;
+let Predicates = [HasFPARMv8] in {
 def STPSpre : StorePairPreIdx<0b00, 1, FPR32Op, simm7s4, "stp">;
 def STPDpre : StorePairPreIdx<0b01, 1, FPR64Op, simm7s8, "stp">;
 def STPQpre : StorePairPreIdx<0b10, 1, FPR128Op, simm7s16, "stp">;
+}
 
 // Pair (post-indexed)
 def STPWpost : StorePairPostIdx<0b00, 0, GPR32z, simm7s4, "stp">;
 def STPXpost : StorePairPostIdx<0b10, 0, GPR64z, simm7s8, "stp">;
+let Predicates = [HasFPARMv8] in {
 def STPSpost : StorePairPostIdx<0b00, 1, FPR32Op, simm7s4, "stp">;
 def STPDpost : StorePairPostIdx<0b01, 1, FPR64Op, simm7s8, "stp">;
 def STPQpost : StorePairPostIdx<0b10, 1, FPR128Op, simm7s16, "stp">;
+}
 
 // Pair (no allocate)
 defm STNPW : StorePairNoAlloc<0b00, 0, GPR32z, simm7s4, "stnp">;
 defm STNPX : StorePairNoAlloc<0b10, 0, GPR64z, simm7s8, "stnp">;
+let Predicates = [HasFPARMv8] in {
 defm STNPS : StorePairNoAlloc<0b00, 1, FPR32Op, simm7s4, "stnp">;
 defm STNPD : StorePairNoAlloc<0b01, 1, FPR64Op, simm7s8, "stnp">;
 defm STNPQ : StorePairNoAlloc<0b10, 1, FPR128Op, simm7s16, "stnp">;
+}
 
 def : Pat<(AArch64stp GPR64z:$Rt, GPR64z:$Rt2, (am_indexed7s64 GPR64sp:$Rn, simm7s8:$offset)),
           (STPXi GPR64z:$Rt, GPR64z:$Rt2, GPR64sp:$Rn, simm7s8:$offset)>;
@@ -3738,11 +3766,13 @@ defm STRX  : Store64RO<0b11, 0, 0b00, GPR64, "str",  i64, store>;
 
 
 // Floating-point
+let Predicates = [HasFPARMv8] in {
 defm STRB : Store8RO< 0b00,  1, 0b00, FPR8Op,   "str", i8, store>;
 defm STRH : Store16RO<0b01,  1, 0b00, FPR16Op,  "str", f16,     store>;
 defm STRS : Store32RO<0b10,  1, 0b00, FPR32Op,  "str", f32,     store>;
 defm STRD : Store64RO<0b11,  1, 0b00, FPR64Op,  "str", f64,     store>;
 defm STRQ : Store128RO<0b00, 1, 0b10, FPR128Op, "str">;
+}
 
 let Predicates = [UseSTRQro], AddedComplexity = 10 in {
   def : Pat<(store (f128 FPR128:$Rt),
@@ -3851,6 +3881,7 @@ defm STRX : StoreUIz<0b11, 0, 0b00, GPR64z, uimm12s8, "str",
 defm STRW : StoreUIz<0b10, 0, 0b00, GPR32z, uimm12s4, "str",
                     [(store GPR32z:$Rt,
                             (am_indexed32 GPR64sp:$Rn, uimm12s4:$offset))]>;
+let Predicates = [HasFPARMv8] in {
 defm STRB : StoreUI<0b00, 1, 0b00, FPR8Op, uimm12s1, "str",
                     [(store FPR8Op:$Rt,
                             (am_indexed8 GPR64sp:$Rn, uimm12s1:$offset))]>;
@@ -3864,6 +3895,7 @@ defm STRD : StoreUI<0b11, 1, 0b00, FPR64Op, uimm12s8, "str",
                     [(store (f64 FPR64Op:$Rt),
                             (am_indexed64 GPR64sp:$Rn, uimm12s8:$offset))]>;
 defm STRQ : StoreUI<0b00, 1, 0b10, FPR128Op, uimm12s16, "str", []>;
+}
 
 defm STRHH : StoreUIz<0b01, 0, 0b00, GPR32z, uimm12s2, "strh",
                      [(truncstorei16 GPR32z:$Rt,
@@ -3985,6 +4017,7 @@ defm STURX : StoreUnscaled<0b11, 0, 0b00, GPR64z, "stur",
 defm STURW : StoreUnscaled<0b10, 0, 0b00, GPR32z, "stur",
                          [(store GPR32z:$Rt,
                                  (am_unscaled32 GPR64sp:$Rn, simm9:$offset))]>;
+let Predicates = [HasFPARMv8] in {
 defm STURB : StoreUnscaled<0b00, 1, 0b00, FPR8Op, "stur",
                          [(store FPR8Op:$Rt,
                                  (am_unscaled8 GPR64sp:$Rn, simm9:$offset))]>;
@@ -4000,6 +4033,7 @@ defm STURD : StoreUnscaled<0b11, 1, 0b00, FPR64Op, "stur",
 defm STURQ : StoreUnscaled<0b00, 1, 0b10, FPR128Op, "stur",
                          [(store (f128 FPR128Op:$Rt),
                                  (am_unscaled128 GPR64sp:$Rn, simm9:$offset))]>;
+}
 defm STURHH : StoreUnscaled<0b01, 0, 0b00, GPR32z, "sturh",
                          [(truncstorei16 GPR32z:$Rt,
                                  (am_unscaled16 GPR64sp:$Rn, simm9:$offset))]>;
@@ -4156,11 +4190,13 @@ defm STTRB : StoreUnprivileged<0b00, 0, 0b00, GPR32, "sttrb">;
 // (immediate pre-indexed)
 def STRWpre : StorePreIdx<0b10, 0, 0b00, GPR32z, "str",  pre_store, i32>;
 def STRXpre : StorePreIdx<0b11, 0, 0b00, GPR64z, "str",  pre_store, i64>;
+let Predicates = [HasFPARMv8] in {
 def STRBpre : StorePreIdx<0b00, 1, 0b00, FPR8Op,  "str",  pre_store, i8>;
 def STRHpre : StorePreIdx<0b01, 1, 0b00, FPR16Op, "str",  pre_store, f16>;
 def STRSpre : StorePreIdx<0b10, 1, 0b00, FPR32Op, "str",  pre_store, f32>;
 def STRDpre : StorePreIdx<0b11, 1, 0b00, FPR64Op, "str",  pre_store, f64>;
 def STRQpre : StorePreIdx<0b00, 1, 0b10, FPR128Op, "str", pre_store, f128>;
+}
 
 def STRBBpre : StorePreIdx<0b00, 0, 0b00, GPR32z, "strb", pre_truncsti8,  i32>;
 def STRHHpre : StorePreIdx<0b01, 0, 0b00, GPR32z, "strh", pre_truncsti16, i32>;
@@ -4210,11 +4246,13 @@ def : Pat<(pre_store (v8f16 FPR128:$Rt), GPR64sp:$addr, simm9:$off),
 // (immediate post-indexed)
 def STRWpost : StorePostIdx<0b10, 0, 0b00, GPR32z,  "str", post_store, i32>;
 def STRXpost : StorePostIdx<0b11, 0, 0b00, GPR64z,  "str", post_store, i64>;
+let Predicates = [HasFPARMv8] in {
 def STRBpost : StorePostIdx<0b00, 1, 0b00, FPR8Op,   "str", post_store, i8>;
 def STRHpost : StorePostIdx<0b01, 1, 0b00, FPR16Op,  "str", post_store, f16>;
 def STRSpost : StorePostIdx<0b10, 1, 0b00, FPR32Op,  "str", post_store, f32>;
 def STRDpost : StorePostIdx<0b11, 1, 0b00, FPR64Op,  "str", post_store, f64>;
 def STRQpost : StorePostIdx<0b00, 1, 0b10, FPR128Op, "str", post_store, f128>;
+}
 
 def STRBBpost : StorePostIdx<0b00, 0, 0b00, GPR32z, "strb", post_truncsti8, i32>;
 def STRHHpost : StorePostIdx<0b01, 0, 0b00, GPR32z, "strh", post_truncsti16, i32>;
@@ -4531,7 +4569,8 @@ def : Pat<(f64 (fdiv (f64 (any_uint_to_fp (i32 GPR32:$Rn))), fixedpoint_f64_i32:
 defm FMOV : UnscaledConversion<"fmov">;
 
 // Add pseudo ops for FMOV 0 so we can mark them as isReMaterializable
-let isReMaterializable = 1, isCodeGenOnly = 1, isAsCheapAsAMove = 1 in {
+let isReMaterializable = 1, isCodeGenOnly = 1, isAsCheapAsAMove = 1,
+    Predicates = [HasFPARMv8] in {
 def FMOVH0 : Pseudo<(outs FPR16:$Rd), (ins), [(set f16:$Rd, (fpimm0))]>,
     Sched<[WriteF]>;
 def FMOVS0 : Pseudo<(outs FPR32:$Rd), (ins), [(set f32:$Rd, (fpimm0))]>,
@@ -4758,6 +4797,7 @@ def : Pat<(bf16 (AArch64csel (bf16 FPR16:$Rn), (bf16 FPR16:$Rm), (i32 imm:$cond)
 // CSEL instructions providing f128 types need to be handled by a
 // pseudo-instruction since the eventual code will need to introduce basic
 // blocks and control flow.
+let Predicates = [HasFPARMv8] in
 def F128CSEL : Pseudo<(outs FPR128:$Rd),
                       (ins FPR128:$Rn, FPR128:$Rm, ccode:$cond),
                       [(set (f128 FPR128:$Rd),
diff --git a/llvm/lib/Target/AArch64/AArch64SystemOperands.td b/llvm/lib/Target/AArch64/AArch64SystemOperands.td
index 0b80f263e12ee1..0564741c497000 100644
--- a/llvm/lib/Target/AArch64/AArch64SystemOperands.td
+++ b/llvm/lib/Target/AArch64/AArch64SystemOperands.td
@@ -986,8 +986,10 @@ def : RWSysReg<"SPSR_irq",           0b11, 0b100, 0b0100, 0b0011, 0b000>;
 def : RWSysReg<"SPSR_abt",           0b11, 0b100, 0b0100, 0b0011, 0b001>;
 def : RWSysReg<"SPSR_und",           0b11, 0b100, 0b0100, 0b0011, 0b010>;
 def : RWSysReg<"SPSR_fiq",           0b11, 0b100, 0b0100, 0b0011, 0b011>;
+let Requires = [{ {AArch64::FeatureFPARMv8} }] in {
 def : RWSysReg<"FPCR",               0b11, 0b011, 0b0100, 0b0100, 0b000>;
 def : RWSysReg<"FPSR",               0b11, 0b011, 0b0100, 0b0100, 0b001>;
+}
 def : RWSysReg<"DSPSR_EL0",          0b11, 0b011, 0b0100, 0b0101, 0b000>;
 def : RWSysReg<"DLR_EL0",            0b11, 0b011, 0b0100, 0b0101, 0b001>;
 def : RWSysReg<"IFSR32_EL2",         0b11, 0b100, 0b0101, 0b0000, 0b001>;
diff --git a/llvm/test/MC/AArch64/no-fp-errors.s b/llvm/test/MC/AArch64/no-fp-errors.s
new file mode 100644
index 00000000000000..1595ba4798b082
--- /dev/null
+++ b/llvm/test/MC/AArch64/no-fp-errors.s
@@ -0,0 +1,193 @@
+// RUN: not llvm-mc -triple aarch64-none-eabi -mattr=-fp-armv8 < %s 2>&1 | FileCheck %s --implicit-check-not error
+
+  ldr      s0, [x0]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  str      q0, [x0]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  fmov     d0, xzr
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldnp     s0, s1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldnp     d0, d1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldnp     q0, q1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldp       s0, s1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldp       d0, d1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldp       q0, q1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldp    s0, s1, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldp    d0, d1, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldp    q0, q1, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldp       s0, s1, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldp       d0, d1, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldp       q0, q1, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+
+  ldr    b0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr    h0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr    s0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr    d0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr    q0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldr     b0, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     h0, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     s0, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     d0, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     q0, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldr     b0, [x0, x1]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     h0, [x0, x1]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     s0, [x0, x1]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     d0, [x0, x1]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     q0, [x0, x1]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldr     b0, [x0, w1, sxtw]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     h0, [x0, w1, sxtw]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     s0, [x0, w1, sxtw]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     d0, [x0, w1, sxtw]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr     q0, [x0, w1, sxtw]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  ldr      b0, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr      h0, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr      s0, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr      d0, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr      q0, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+label:
+  ldr       s0, label
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr       d0, label
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  ldr       q0, label
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  stnp     s0, s1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stnp     d0, d1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stnp     q0, q1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  stp       s0, s1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stp       d0, d1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stp       q0, q1, [x0, #16]
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  stp    s0, s1, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stp    d0, d1, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stp    q0, q1, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  stp     s0, s1, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stp     d0, d1, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  stp     q0, q1, [x0, #16]!
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+
+  str    b0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  str    h0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  str    s0, [x0], #16
+// CHECK: [[@LINE-1]]:3: error: instruction requires: fp-armv8
+  str    d0, [x0], #16
+// CHECK: [[@LINE-1]]:3: ...
[truncated]

@ostannard
Copy link
Collaborator Author

This is the script I used to check for non-predicated instructions: https://gist.github.com/ostannard/f15ad6b4cffc9b00b1ee50f990dd801d

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ostannard ostannard merged commit 9c9bffe into llvm:main Jan 12, 2024
@ostannard ostannard deleted the disable-fp-insts branch January 12, 2024 09:36
@mysterymath
Copy link
Contributor

We've observed a breakage in Fuchsia due to this commit: https://luci-milo.appspot.com/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.arm64-debug/b8758715042351668065/overview

That module broke due to (amonst others) the inline asm mrs x9, fpcr. We compile that module with the flag -mgeneral-regs-only. That flag disables the target feature fp-armv8 in LLVM, but that isn't correct WRT it's GCC semantics, since it's explicitly documented not to affect the assembler (https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gcc/AArch64-Options.html#AArch64-Options).

I'd agree that these instructions should be disabled on systems that actually are configured to lack a FPU, but it would be prudent to roll this back and alter the semantics of -mgeneral-regs-only to no longer be affected before relanding.

@mysterymath
Copy link
Contributor

mysterymath commented Jan 16, 2024

After a bit more looking around, I found that this has been broken for quite a while (#30140), and it sounded like it was not quite trivial to fix. This PR just makes the issue somewhat worse.

@ostannard
Copy link
Collaborator Author

Since clang's -mgeneral-regs-only is already broken, and most FPU instructions are already rejected, I'd rather not revert this patch.

Would something like this work for you as a workaround? I've checked and this is accepted by both gcc and clang with -mgeneral-regs-only:

void foo() {
  __asm__(".arch_extension fp\n"
          "stp     q0, q1, [x8, #(0 * 32)]\n");
}

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Most of the floating-point instructions are already gated on the
fp-armv8 subtarget feature (or some other feature), but most of the load
and store instructions, and one move instruction, were not.

I found this list of instructions with a script which consumes the
output of llvm-tblgen --dump-json, looking for instructions which have
an FPR operand but no predicate. That script now finds zero
instructions.

This only affects assembly, not codegen, because the floating-point
types and registers are already not marked as legal when the FPU is
disabled, so it is impossible for any of these to be selected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants