Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm][AArch64] apple-m4 is armv9.2-a #98267

Merged
merged 7 commits into from
Jul 11, 2024

Conversation

jroelofs
Copy link
Contributor

@jroelofs jroelofs commented Jul 10, 2024

But since SVE and friends have been added to the default extensions list, and every CPU was opted into those extensions by default, we couldn't correctly announce its architecutral version to the backend. Additionally, we FEAT_MEC from llvm's "required" list for v9.0 to the optional list for v9.2, as the spec considers it optional, and M4 does not implement it. Similarly, fixes up several bugs w.r.t. FEAT_RME.

As a drive-by, I noticed that saphira did not have an AArch64CPUTestParams entry, and thus added one.

But since SVE and friends have been added to the default extensions list, and
every CPU was opted into those extensions by default, we couldn't correctly
announce its architecutral version to the backend.  Additionally, we remove
FEAT_MEC from llvm's "required" list for v9.2, as the spec considers it
optional, and M4 does not implement it.

As a drive-by, I noticed that saphira did not have an AArch64CPUTestParams
entry, and thus added one.
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2024

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-backend-aarch64

Author: Jon Roelofs (jroelofs)

Changes

But since SVE and friends have been added to the default extensions list, and every CPU was opted into those extensions by default, we couldn't correctly announce its architecutral version to the backend. Additionally, we remove FEAT_MEC from llvm's "required" list for v9.2, as the spec considers it optional, and M4 does not implement it.

As a drive-by, I noticed that saphira did not have an AArch64CPUTestParams entry, and thus added one.


Patch is 35.87 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98267.diff

4 Files Affected:

  • (modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+2-6)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+140-56)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+17-11)
diff --git a/llvm/include/llvm/TargetParser/AArch64TargetParser.h b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
index 13091748e091c..2169e9c94b61f 100644
--- a/llvm/include/llvm/TargetParser/AArch64TargetParser.h
+++ b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
@@ -161,14 +161,10 @@ struct CpuInfo {
   StringRef Name; // Name, as written for -mcpu.
   const ArchInfo &Arch;
   AArch64::ExtensionBitset
-      DefaultExtensions; // Default extensions for this CPU. These will be
-                         // ORd with the architecture defaults.
+      DefaultExtensions; // Default extensions for this CPU.
 
   AArch64::ExtensionBitset getImpliedExtensions() const {
-    AArch64::ExtensionBitset ImpliedExts;
-    ImpliedExts |= DefaultExtensions;
-    ImpliedExts |= Arch.DefaultExts;
-    return ImpliedExts;
+    return DefaultExtensions;
   }
 };
 
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index 8754ea4974999..fac75d56bde4f 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -797,7 +797,7 @@ def HasV8_9aOps : Architecture64<8, 9, "a", "v8.9a",
   !listconcat(HasV8_8aOps.DefaultExts, [FeatureSPECRES2, FeatureCSSC,
     FeatureRASv2])>;
 def HasV9_0aOps : Architecture64<9, 0, "a", "v9a",
-  [HasV8_5aOps, FeatureMEC],
+  [HasV8_5aOps],
   !listconcat(HasV8_5aOps.DefaultExts, [FeatureFullFP16, FeatureSVE,
     FeatureSVE2])>;
 def HasV9_1aOps : Architecture64<9, 1, "a", "v9.1a",
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 5859d5cd91236..531d6636cce4a 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -681,247 +681,331 @@ def ProcessorFeatures {
                                  FeatureFPARMv8, FeatureNEON, FeaturePerfMon];
   list<SubtargetFeature> A55  = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureFullFP16, FeatureDotProd,
-                                 FeatureRCPC, FeaturePerfMon];
+                                 FeatureRCPC, FeaturePerfMon, FeatureCRC,
+                                 FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> A510 = [HasV9_0aOps, FeatureNEON, FeaturePerfMon,
                                  FeatureMatMulInt8, FeatureBF16, FeatureAM,
                                  FeatureMTE, FeatureETE, FeatureSVE2BitPerm,
                                  FeatureFP16FML,
-                                 FeatureSB, FeaturePAuth, FeatureSSBS, FeatureSVE, FeatureSVE2];
+                                 FeatureSB, FeaturePAuth, FeatureSSBS, FeatureSVE, FeatureSVE2,
+                                 FeatureComplxNum, FeatureCRC, FeatureDotProd,
+                                 FeatureFPARMv8,FeatureFullFP16, FeatureJS, FeatureLSE,
+                                 FeatureRAS, FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> A520 = [HasV9_2aOps, FeaturePerfMon, FeatureAM,
                                  FeatureMTE, FeatureETE, FeatureSVE2BitPerm,
                                  FeatureFP16FML,
                                  FeatureSB, FeatureSSBS, FeaturePAuth, FeatureFlagM, FeaturePredRes,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureBF16, FeatureComplxNum, FeatureCRC,
+                                 FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8, FeatureJS,
+                                 FeatureNEON, FeatureLSE, FeatureRAS, FeatureRCPC, FeatureRDM,
+                                 FeatureDotProd];
   list<SubtargetFeature> A520AE = [HasV9_2aOps, FeaturePerfMon, FeatureAM,
                                  FeatureMTE, FeatureETE, FeatureSVE2BitPerm,
                                  FeatureFP16FML,
                                  FeatureSB, FeatureSSBS, FeaturePAuth, FeatureFlagM, FeaturePredRes,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureBF16, FeatureComplxNum, FeatureCRC,
+                                 FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8, FeatureJS,
+                                 FeatureNEON, FeatureLSE, FeatureRAS, FeatureRCPC, FeatureRDM,
+                                 FeatureDotProd];
   list<SubtargetFeature> A65  = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureFullFP16, FeatureDotProd,
                                  FeatureRCPC, FeatureSSBS, FeatureRAS,
-                                 FeaturePerfMon];
+                                 FeaturePerfMon, FeatureCRC, FeatureLSE, FeatureRDM];
   list<SubtargetFeature> A76  = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureFullFP16, FeatureDotProd,
-                                 FeatureRCPC, FeatureSSBS, FeaturePerfMon];
+                                 FeatureRCPC, FeatureSSBS, FeaturePerfMon,
+                                 FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> A77  = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureFullFP16, FeatureDotProd,
-                                 FeatureRCPC, FeaturePerfMon, FeatureSSBS];
+                                 FeatureRCPC, FeaturePerfMon, FeatureSSBS,
+                                 FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> A78  = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureFullFP16, FeatureDotProd,
                                  FeatureRCPC, FeaturePerfMon, FeatureSPE,
-                                 FeatureSSBS];
+                                 FeatureSSBS, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> A78AE = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                   FeatureNEON, FeatureFullFP16, FeatureDotProd,
                                   FeatureRCPC, FeaturePerfMon, FeatureSPE,
-                                  FeatureSSBS];
+                                  FeatureSSBS, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> A78C = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureFullFP16, FeatureDotProd,
                                  FeatureFlagM, FeaturePAuth,
                                  FeaturePerfMon, FeatureRCPC, FeatureSPE,
-                                 FeatureSSBS];
+                                 FeatureSSBS, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> A710 = [HasV9_0aOps, FeatureNEON, FeaturePerfMon,
                                  FeatureETE, FeatureMTE, FeatureFP16FML,
                                  FeatureSVE2BitPerm, FeatureBF16, FeatureMatMulInt8,
-                                 FeaturePAuth, FeatureFlagM, FeatureSB, FeatureSVE, FeatureSVE2];
+                                 FeaturePAuth, FeatureFlagM, FeatureSB, FeatureSVE, FeatureSVE2,
+                                 FeatureComplxNum, FeatureCRC, FeatureDotProd, FeatureFPARMv8,
+                                 FeatureFullFP16, FeatureJS, FeatureLSE, FeatureRAS, FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> A715 = [HasV9_0aOps, FeatureNEON, FeatureMTE,
                                  FeatureFP16FML, FeatureSVE, FeatureTRBE,
                                  FeatureSVE2BitPerm, FeatureBF16, FeatureETE,
                                  FeaturePerfMon, FeatureMatMulInt8, FeatureSPE,
                                  FeatureSB, FeatureSSBS, FeatureFullFP16, FeaturePAuth, FeaturePredRes, FeatureFlagM,
-                                 FeatureSVE2];
+                                 FeatureSVE2, FeatureComplxNum, FeatureCRC,
+                                 FeatureDotProd, FeatureFPARMv8,
+                                 FeatureJS, FeatureLSE, FeatureRAS,
+                                 FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> A720 = [HasV9_2aOps, FeatureMTE, FeatureFP16FML,
                                  FeatureTRBE, FeatureSVE2BitPerm, FeatureETE,
                                  FeaturePerfMon, FeatureSPE, FeatureSPE_EEF,
                                  FeatureSB, FeatureSSBS, FeaturePAuth, FeatureFlagM, FeaturePredRes,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureBF16, FeatureComplxNum, FeatureCRC,
+                                 FeatureDotProd, FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8,
+                                 FeatureJS, FeatureLSE, FeatureNEON, FeatureRAS,
+                                 FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> A720AE = [HasV9_2aOps, FeatureMTE, FeatureFP16FML,
                                  FeatureTRBE, FeatureSVE2BitPerm, FeatureETE,
                                  FeaturePerfMon, FeatureSPE, FeatureSPE_EEF,
                                  FeatureSB, FeatureSSBS, FeaturePAuth, FeatureFlagM, FeaturePredRes,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureBF16, FeatureComplxNum, FeatureCRC,
+                                 FeatureDotProd, FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8,
+                                 FeatureJS, FeatureLSE, FeatureNEON, FeatureRAS,
+                                 FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> A725 = [HasV9_2aOps, FeatureMTE, FeatureFP16FML,
                                  FeatureETE, FeaturePerfMon, FeatureSPE,
                                  FeatureSVE2BitPerm, FeatureSPE_EEF, FeatureTRBE,
                                  FeatureFlagM, FeaturePredRes, FeatureSB, FeatureSSBS,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureBF16, FeatureComplxNum, FeatureCRC,
+                                 FeatureDotProd, FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8,
+                                 FeatureJS, FeatureLSE, FeatureNEON, FeaturePAuth, FeatureRAS,
+                                 FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> R82  = [HasV8_0rOps, FeaturePerfMon, FeatureFullFP16,
                                  FeatureFP16FML, FeatureSSBS, FeaturePredRes,
                                  FeatureSB, FeatureRDM, FeatureDotProd,
                                  FeatureComplxNum, FeatureJS,
                                  FeatureCacheDeepPersist,
-                                 FeatureLSE, FeatureFlagM];
+                                 FeatureFlagM, FeatureCRC, FeatureLSE, FeatureRAS, FeatureFPARMv8,
+                                 FeatureNEON, FeaturePAuth, FeatureRCPC];
   list<SubtargetFeature> R82AE = [HasV8_0rOps, FeaturePerfMon, FeatureFullFP16,
                                   FeatureFP16FML, FeatureSSBS, FeaturePredRes,
                                   FeatureSB, FeatureRDM, FeatureDotProd,
                                   FeatureComplxNum, FeatureJS,
                                   FeatureCacheDeepPersist,
-                                  FeatureLSE, FeatureFlagM];
+                                  FeatureLSE, FeatureFlagM, FeatureCRC, FeatureFPARMv8, FeatureNEON,
+                                  FeaturePAuth, FeatureRAS, FeatureRCPC];
   list<SubtargetFeature> X1   = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureRCPC, FeaturePerfMon,
                                  FeatureSPE, FeatureFullFP16, FeatureDotProd,
-                                 FeatureSSBS];
+                                 FeatureSSBS, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> X1C  = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                  FeatureNEON, FeatureRCPC_IMMO, FeaturePerfMon,
                                  FeatureSPE, FeatureFullFP16, FeatureDotProd,
                                  FeaturePAuth, FeatureSSBS, FeatureFlagM,
                                  FeatureLSE2,
-                                 FeatureRCPC];
+                                 FeatureRCPC, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> X2   = [HasV9_0aOps, FeatureNEON, FeaturePerfMon,
                                  FeatureMatMulInt8, FeatureBF16, FeatureAM,
                                  FeatureMTE, FeatureETE, FeatureSVE2BitPerm,
                                  FeatureFP16FML,
-                                 FeaturePAuth, FeatureSSBS, FeatureSB, FeatureSVE, FeatureSVE2, FeatureFlagM];
+                                 FeaturePAuth, FeatureSSBS, FeatureSB, FeatureSVE, FeatureSVE2, FeatureFlagM,
+                                 FeatureComplxNum, FeatureCRC, FeatureDotProd, FeatureFPARMv8, FeatureFullFP16,
+                                 FeatureJS, FeatureLSE, FeatureRAS, FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> X3 =   [HasV9_0aOps, FeatureSVE, FeatureNEON,
                                  FeaturePerfMon, FeatureETE, FeatureTRBE,
                                  FeatureSPE, FeatureBF16, FeatureMatMulInt8,
                                  FeatureMTE, FeatureSVE2BitPerm, FeatureFullFP16,
                                  FeatureFP16FML,
                                  FeatureSB, FeaturePAuth, FeaturePredRes, FeatureFlagM, FeatureSSBS,
-                                 FeatureSVE2];
+                                 FeatureSVE2, FeatureComplxNum, FeatureCRC, FeatureFPARMv8, FeatureJS,
+                                 FeatureLSE, FeatureRAS, FeatureRCPC, FeatureRDM, FeatureDotProd];
   list<SubtargetFeature> X4 =   [HasV9_2aOps,
                                  FeaturePerfMon, FeatureETE, FeatureTRBE,
                                  FeatureSPE, FeatureMTE, FeatureSVE2BitPerm,
                                  FeatureFP16FML, FeatureSPE_EEF,
                                  FeatureSB, FeatureSSBS, FeaturePAuth, FeatureFlagM, FeaturePredRes,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureComplxNum, FeatureCRC, FeatureDotProd,
+                                 FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8, FeatureJS, FeatureLSE,
+                                 FeatureNEON, FeatureRAS, FeatureRCPC, FeatureRDM, FeatureBF16];
   list<SubtargetFeature> X925 = [HasV9_2aOps, FeatureMTE, FeatureFP16FML,
                                  FeatureETE, FeaturePerfMon, FeatureSPE,
                                  FeatureSVE2BitPerm, FeatureSPE_EEF, FeatureTRBE,
                                  FeatureFlagM, FeaturePredRes, FeatureSB, FeatureSSBS,
-                                 FeatureSVE, FeatureSVE2];
+                                 FeatureSVE, FeatureSVE2, FeatureBF16, FeatureComplxNum, FeatureCRC,
+                                 FeatureDotProd, FeatureFPARMv8, FeatureFullFP16, FeatureMatMulInt8,
+                                 FeatureJS, FeatureLSE, FeatureNEON, FeaturePAuth, FeatureRAS,
+                                 FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> A64FX    = [HasV8_2aOps, FeatureFPARMv8, FeatureNEON,
                                      FeatureSHA2, FeaturePerfMon, FeatureFullFP16,
                                      FeatureSVE, FeatureComplxNum,
-                                     FeatureAES];
+                                     FeatureAES, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> Carmel   = [HasV8_2aOps, FeatureNEON, FeatureSHA2, FeatureAES,
-                                     FeatureFullFP16];
+                                     FeatureFullFP16, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM,
+                                     FeatureFPARMv8];
   list<SubtargetFeature> AppleA7  = [HasV8_0aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON,FeaturePerfMon, FeatureAppleA7SysReg];
   list<SubtargetFeature> AppleA10 = [HasV8_0aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON, FeaturePerfMon, FeatureCRC,
                                      FeatureRDM, FeaturePAN, FeatureLOR, FeatureVH];
   list<SubtargetFeature> AppleA11 = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
-                                     FeatureNEON, FeaturePerfMon, FeatureFullFP16];
+                                     FeatureNEON, FeaturePerfMon, FeatureFullFP16, FeatureCRC,
+                                     FeatureLSE, FeatureRAS, FeatureRDM];
   list<SubtargetFeature> AppleA12 = [HasV8_3aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
-                                     FeatureNEON, FeaturePerfMon, FeatureFullFP16];
+                                     FeatureNEON, FeaturePerfMon, FeatureFullFP16,
+                                     FeatureComplxNum, FeatureCRC, FeatureJS, FeatureLSE,
+                                     FeaturePAuth, FeatureRAS, FeatureRCPC, FeatureRDM];
   list<SubtargetFeature> AppleA13 = [HasV8_4aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON, FeaturePerfMon, FeatureFullFP16,
-                                     FeatureFP16FML, FeatureSHA3];
+                                     FeatureFP16FML, FeatureSHA3, FeatureComplxNum, FeatureCRC, FeatureJS, FeatureLSE,
+                                     FeaturePAuth, FeatureRAS, FeatureRCPC, FeatureRDM, FeatureDotProd];
   list<SubtargetFeature> AppleA14 = [HasV8_4aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON, FeaturePerfMon,
                                      FeatureFullFP16, FeatureFP16FML, FeatureSHA3,
                                      // ArmV8.5-a extensions, excluding BTI:
                                      FeatureAltFPCmp, FeatureFRInt3264,
                                      FeatureSpecRestrict, FeatureSSBS, FeatureSB,
-                                     FeaturePredRes, FeatureCacheDeepPersist];
+                                     FeaturePredRes, FeatureCacheDeepPersist,
+                                     FeatureComplxNum, FeatureCRC, FeatureJS, FeatureLSE,
+                                     FeaturePAuth, FeatureRAS, FeatureRCPC, FeatureRDM,
+                                     FeatureDotProd];
   list<SubtargetFeature> AppleA15 = [HasV8_6aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON, FeaturePerfMon, FeatureSHA3,
-                                     FeatureFullFP16, FeatureFP16FML];
+                                     FeatureFullFP16, FeatureFP16FML,
+                                     FeatureComplxNum, FeatureCRC, FeatureJS, FeatureLSE,
+                                     FeaturePAuth, FeatureRAS, FeatureRCPC, FeatureRDM,
+                                     FeatureBF16, FeatureDotProd, FeatureMatMulInt8];
   list<SubtargetFeature> AppleA16 = [HasV8_6aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON, FeaturePerfMon, FeatureSHA3,
                                      FeatureFullFP16, FeatureFP16FML,
-                                     FeatureHCX];
+                                     FeatureHCX,
+                                     FeatureComplxNum, FeatureCRC, FeatureJS, FeatureLSE,
+                                     FeaturePAuth, FeatureRAS, FeatureRCPC, FeatureRDM,
+                                     FeatureBF16, FeatureDotProd, FeatureMatMulInt8];
   list<SubtargetFeature> AppleA17 = [HasV8_6aOps, FeatureSHA2, FeatureAES, FeatureFPARMv8,
                                      FeatureNEON, FeaturePerfMon, FeatureSHA3,
                                      FeatureFullFP16, FeatureFP16FML,
-                                     FeatureHCX];
-  // Technically apple-m4 is ARMv9.2a, but a quirk of LLVM defines v9.0 as
-  // requiring SVE, which is optional according to the Arm ARM and not
-  // supported by the core. ARMv8.7a is the next closest choice.
-  list<SubtargetFeature> AppleM4 = [HasV8_7aOps, FeatureSHA2, FeatureFPARMv8,
+         ...
[truncated]

Copy link

github-actions bot commented Jul 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Jul 10, 2024
@llvmbot llvmbot added the mc Machine (object) code label Jul 10, 2024
Copy link
Contributor

@tmatheson-arm tmatheson-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I haven't checked all the CPU feature changes, but the tests give a lot of confidence. I also haven't checked all the TRMs but the changes there make sense given the MEC/RME architecture changes.

Comment on lines -168 to +167
AArch64::ExtensionBitset ImpliedExts;
ImpliedExts |= DefaultExtensions;
ImpliedExts |= Arch.DefaultExts;
return ImpliedExts;
return DefaultExtensions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the idea here is that we don't want to add the architecture's default extensions to the CPU, because they are not mandatory, and there's no way to disable them for a CPU that chooses not to implement. This is why all the CPUs have had their list of features expanded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that idea is the problem: automatically adding default extensions kinda works for extensions that the compiler doesn't generate automatically where their use is driven by intrinsics, but as soon as you have something like SVE in the list you get incompatible codegen left and right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This choice also makes -march= meaningless, so @ahmedbougacha and I are considering making that an error at least on Darwin platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow... Do you mean -march= has no effect if you specify -mcpu? That should probably be at least a warning.

Copy link
Contributor Author

@jroelofs jroelofs Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that code built with -march=armv9.2a may use SVE instructions because FeatureSVE is in the default list for v9.0, but this will crash on an M4 iPad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1626,7 +1628,7 @@ INSTANTIATE_TEST_SUITE_P(
AArch64::AEK_FP16FML, AArch64::AEK_SHA3, AArch64::AEK_BF16,
AArch64::AEK_I8MM, AArch64::AEK_JSCVT, AArch64::AEK_FCMA,
AArch64::AEK_PAUTH, AArch64::AEK_PERFMON, AArch64::AEK_HCX}),
AArch64CPUTestParams("apple-m4", "armv8.7-a",
AArch64CPUTestParams("apple-m4", "armv9.2-a",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like the Optimisation Guide has been updated for the M4 yet.

@jroelofs jroelofs merged commit c66e1d6 into llvm:main Jul 11, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 12, 2024

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/597

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lld :: ELF/basic-sparcv9.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llvm-mc -filetype=obj -triple=sparc64-unknown-openbsd /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/lld/test/ELF/basic-sparcv9.s -o /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llvm-mc -filetype=obj -triple=sparc64-unknown-openbsd /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/lld/test/ELF/basic-sparcv9.s -o /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp
RUN: at line 3: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/ld.lld /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp -o /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp2
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/ld.lld /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp -o /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp2
RUN: at line 4: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llvm-readobj --file-headers --sections -l --symbols /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp2    | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/lld/test/ELF/basic-sparcv9.s
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/llvm-readobj --file-headers --sections -l --symbols /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/tools/lld/test/ELF/Output/basic-sparcv9.s.tmp2
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/lld/test/ELF/basic-sparcv9.s
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/lld/test/ELF/basic-sparcv9.s:20:15: error: CHECK-NEXT: expected string not found in input
# CHECK-NEXT: OS/ABI: SystemV (0x0)
              ^
<stdin>:12:16: note: scanning from here
 FileVersion: 1
               ^
<stdin>:13:2: note: possible intended match here
 OS/ABI: OpenBSD (0xC)
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/lld/test/ELF/basic-sparcv9.s

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
           7: ElfHeader { 
           8:  Ident { 
           9:  Magic: (7F 45 4C 46) 
          10:  Class: 64-bit (0x2) 
          11:  DataEncoding: BigEndian (0x2) 
          12:  FileVersion: 1 
next:20'0                    X error: no match found
          13:  OS/ABI: OpenBSD (0xC) 
next:20'0     ~~~~~~~~~~~~~~~~~~~~~~~
next:20'1      ?                      possible intended match
          14:  ABIVersion: 0 
next:20'0     ~~~~~~~~~~~~~~~
          15:  Unused: (00 00 00 00 00 00 00) 
next:20'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          16:  } 
next:20'0     ~~~
          17:  Type: Executable (0x2) 
next:20'0     ~~~~~~~~~~~~~~~~~~~~~~~~
...

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
But since SVE and friends have been added to the default extensions
list, and every CPU was opted into those extensions by default, we
couldn't correctly announce its architecutral version to the backend.
Additionally, we FEAT_MEC from llvm's "required" list for v9.0 to the
optional list for v9.2, as the spec considers it optional, and M4 does
not implement it. Similarly, fixes up several bugs w.r.t. FEAT_RME.

As a drive-by, I noticed that saphira did not have an
AArch64CPUTestParams entry, and thus added one.
MattPD referenced this pull request in ashvardanian/SimSIMD Aug 16, 2024
This commit add new capability levels for Arm allowing
us to differentiate f16, bf16. and i8-supporting generations
of CPUs, becoming increasingly popular in the datacenter.

This breaks compilation of Rust and Python bindings
due to the "target specific options mismatch".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants