[MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule #14209

dsbarinov1 · 2023-03-06T11:04:10Z

Motivation:
The purpose of this PR is to add support for intrinsics to optimize matrix multiplication operations (e.g. matmul, convolution) during tuning with MetaScheduler.

Information about PR:
The present PR integrates the existing neon and dotprod (namely, sdot and udot) ARM CPU intrinsics into MetaScheduler, introduces a new "hybrid" dotprod intrinsic ("hdot") working with uint8, uint8 -> int32 data types, and changes the intrinsic selection and application processes for the ARM CPU case, since we operate with multiple intrinsics, rather than with a specific one.

tvm-bot · 2023-03-06T11:04:14Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @ibsidorenko _{See #10317 for details}

_{Generated by tvm-bot}

vvchernov

I leave comments here to think about how to make the code cleaner, but they are not required to fix

vvchernov · 2023-03-09T05:29:31Z

include/tvm/meta_schedule/schedule_rule.h

@@ -180,7 +180,7 @@ class ScheduleRule : public runtime::ObjectRef {
   * \return The schedule rule created
   */
  TVM_DLL static ScheduleRule MultiLevelTilingWithIntrin(
-      String intrin_name, String structure, Optional<Array<String>> tile_binds,
+      Array<String> intrin_name, String structure, Optional<Array<String>> tile_binds,


I understand why it is done (String -> Array), but it should be rethink one more time due to API changing affects other places not only your own task

Yes, new API changes were reverted to the original ones, while keeping the same new functionality.

vvchernov · 2023-03-09T05:33:07Z

src/meta_schedule/schedule_rule/multi_level_tiling_with_intrin.cc

@@ -85,21 +101,23 @@ class MultiLevelTilingWithIntrinNode : public MultiLevelTilingNode {

 public:
  /*! \brief The name of a tensor intrinsic. */
-  String intrin_name;
+  Array<String> intrin_name;


if the field type is still be changed, I recommend to rename it to intrin_names for the sake of clarity

vvchernov · 2023-03-09T05:38:16Z

src/meta_schedule/space_generator/space_generator.cc

@@ -110,6 +155,16 @@ void SpaceGeneratorNode::InitializeWithTuneContext(const TuneContext& context) {
      default_sch_rules = ScheduleRule::DefaultMicro();
      default_postprocs = Postproc::DefaultMicro();
      default_mutator_probs = Mutator::DefaultMicro();
+    } else if (kind == "neon") {


It looks like different levels of target types are checked here. Possibly it should be "arm" type with splitting to "neon"/"dotprod" in separated method.

ibsidorenko · 2023-03-10T12:46:27Z

src/meta_schedule/space_generator/space_generator.cc

+//   return HasFlag_(attr.value(), flag);
+// }
+
+static inline bool HasFlag_(Optional<Array<String>> attr, std::string flag) {


Looks like we have the same code in src/target/parsers/aprofile.cc.
Instead of code duplication, can we move it into common place?

Code duplication fixed, we are now using different method to pull specific keys from the target.

ibsidorenko · 2023-03-10T12:59:21Z

src/meta_schedule/schedule_rule/schedule_rule.cc

+      ScheduleRule::AddRFactor(
+          /*max_jobs_per_core=*/8,
+          /*max_innermost_factor=*/Integer(32)),
+      ScheduleRule::MultiLevelTilingWithIntrin(


As I understand, new API in MultiLevelTilingWithIntrin is not required anymore?

Yes, it is not in use rn.

dsbarinov1 · 2023-03-21T11:29:07Z

@ibsidorenko could you review my changes, please ? :)

ibsidorenko · 2023-03-21T12:24:58Z

python/tvm/tir/tensor_intrin/arm_cpu.py

+        vec_c = C.vload([0], dtype="int32x4")
+
+        C[T.ramp(T.int32(0), 1, 4)] = T.call_llvm_pure_intrin(
+            T.llvm_lookup_intrinsic_id("llvm.aarch64.neon.udot.v4u32.v16u8"),


You use the same intrinsic id "llvm.aarch64.neon.udot.v4u32.v16u8" in both cases. Is it Ok?

When experimenting with tflite_mobilenet_v3_quant model, we encountered convolution with multiplication of uint8 uint8 tensors into int32 accumulator, which did not allow to apply existing sdot/udot intrinsics, so we had to create new hdot intrinsic, which already works with such dtypes layout. From my knowledge, there is no such neon instruction to work with u8u8i32 layout of dtypes, for that reason we could try to call the closest instruction, which we did and the intrinsic succesfully applied to that type of convolution, bringing us a performance benefit.

dsbarinov1 · 2023-03-21T16:34:03Z

@echuraev, @masahi, could you review my changes, please :)

echuraev

LGTM. Thanks!

masahi · 2023-03-21T19:49:23Z

python/tvm/tir/tensor_intrin/arm_cpu.py

+            vec_a,
+            vec_b,
+            dtype="int32x4",
+        )


It should be possible to clean up a lot of code duplication between different dtypes here. See tensor_intrin/cuda.py for examples.

masahi · 2023-03-21T19:50:30Z

src/meta_schedule/schedule_rule/schedule_rule.cc

+  };
+}
+
+Array<ScheduleRule> ScheduleRule::DefaultARMDotprod() {


Please remove the dup with DefaultARMDotprod, to make the difference obvious.

dsbarinov1 · 2023-03-28T23:16:05Z

include/tvm/runtime/container/array.h

+  }
+
+  template <typename... Args>
+  static void AgregateImpl(Array<T>& dest) {}  // NOLINT(*)


This edit (NOLINT) was made on the consideration that quote "Google C++ Style Guide seems to have allowed using non-const references as parameters".
Reference: https://github.innominds.com/cpplint/cpplint/issues/148

masahi · 2023-03-29T07:22:57Z

@tvm-bot rerun

masahi · 2023-03-29T19:37:52Z

@tvm-bot rerun

dsbarinov1 · 2023-03-31T08:33:51Z

@masahi, could you review my changes, please? :)

masahi · 2023-03-31T08:46:20Z

src/meta_schedule/schedule_rule/schedule_rule.cc

+  };
+}
+
+Array<ScheduleRule> GetDotprodSpecificRules() {


This is specific to "ARM" dot product only so the naming is not the best. I'll merge it for now, but please fix this when you get a chance later.

masahi · 2023-03-31T08:46:59Z

sorry I forgot to take another look

dsbarinov1 marked this pull request as draft March 6, 2023 11:08

vvchernov reviewed Mar 9, 2023

View reviewed changes

ibsidorenko reviewed Mar 10, 2023

View reviewed changes

dsbarinov1 force-pushed the dbarinov/metaschedule_arm_cpu_intrin branch 2 times, most recently from c3cc7bc to 5baa4fd Compare March 20, 2023 19:39

dsbarinov1 marked this pull request as ready for review March 21, 2023 11:26

ibsidorenko reviewed Mar 21, 2023

View reviewed changes

ibsidorenko approved these changes Mar 21, 2023

View reviewed changes

echuraev approved these changes Mar 21, 2023

View reviewed changes

masahi requested changes Mar 21, 2023

View reviewed changes

dsbarinov1 force-pushed the dbarinov/metaschedule_arm_cpu_intrin branch 6 times, most recently from 7439490 to d08f54c Compare March 28, 2023 18:55

[MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule

615d61f

dsbarinov1 force-pushed the dbarinov/metaschedule_arm_cpu_intrin branch from d08f54c to 615d61f Compare March 28, 2023 23:09

dsbarinov1 commented Mar 28, 2023

View reviewed changes

masahi reviewed Mar 31, 2023

View reviewed changes

masahi approved these changes Mar 31, 2023

View reviewed changes

masahi merged commit b724c87 into apache:main Mar 31, 2023

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule #14209

[MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule #14209

dsbarinov1 commented Mar 6, 2023

tvm-bot commented Mar 6, 2023

vvchernov left a comment

vvchernov Mar 9, 2023

dsbarinov1 Mar 21, 2023

vvchernov Mar 9, 2023

dsbarinov1 Mar 21, 2023

vvchernov Mar 9, 2023

dsbarinov1 Mar 21, 2023

ibsidorenko Mar 10, 2023

dsbarinov1 Mar 21, 2023

ibsidorenko Mar 10, 2023

dsbarinov1 Mar 21, 2023

dsbarinov1 commented Mar 21, 2023

ibsidorenko Mar 21, 2023

dsbarinov1 Mar 21, 2023

dsbarinov1 commented Mar 21, 2023

echuraev left a comment

masahi Mar 21, 2023

masahi Mar 21, 2023

dsbarinov1 Mar 28, 2023

masahi commented Mar 29, 2023

masahi commented Mar 29, 2023

dsbarinov1 commented Mar 31, 2023

masahi Mar 31, 2023

masahi commented Mar 31, 2023

[MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule #14209

[MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule #14209

Conversation

dsbarinov1 commented Mar 6, 2023

tvm-bot commented Mar 6, 2023

vvchernov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsbarinov1 commented Mar 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsbarinov1 commented Mar 21, 2023

echuraev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi commented Mar 29, 2023

masahi commented Mar 29, 2023

dsbarinov1 commented Mar 31, 2023

Choose a reason for hiding this comment

masahi commented Mar 31, 2023