[Target] Make `key=arm_cpu` --> `key=arm_cpu,cpu` on AArch64 #13775

AndrewZhaoLuo · 2023-01-12T18:53:24Z

The key in a target is used in dispatching to different strategies.

In most of the code base arm_cpu implies also having cpu. E.g. in python/tvm/target/target.py:

def arm_cpu(...):
    opts = ["-keys=arm_cpu,cpu", "-device=arm_cpu"] + pre_defined_opt
    opts = _merge_opts(opts, options)
    return Target(" ".join(["llvm"] + opts))

In src/target/parsers/mprofile.cc:

static Array<String> MergeKeys(Optional<Array<String>> existing_keys) {
  const Array<String> kExtraKeys = {"arm_cpu", "cpu"};

  if (!existing_keys) {
    return kExtraKeys;
  }

  Array<String> keys = existing_keys.value();
  for (String key : kExtraKeys) {
    if (std::find(keys.begin(), keys.end(), key) == keys.end()) {
      keys.push_back(key);
    }
  }
  return keys;
}

However A-series targets did not have both arm_cpu and cpu keys. Rather they only had arm_cpu keys. This PR makes it so it includes both, matching everywhere else in codebase.

This also may lead to increased performance speedups as previously A-series targets would dispatch to generic strategy instead of cpu optimized strategy. With this, they should properly dispatch to arm_cpu strategy, and fallback to cpu strategy.

tvm-bot · 2023-01-12T18:53:27Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: target _{See #10317 for details}

_{Generated by tvm-bot}

Mousius

Argh, this was an unintended side effect of #12474, thanks for fixing this @AndrewZhaoLuo!

…ncatenate NPU used generic implementation for concatenate before cpu schedules were made the default fallback schedules. This leads to performance degradation as this blocks fusion with nearby ops. This commit adds Relay op strategy for arm_cpu implementation which makes it use arm_cpu schedule before cpu one. Reference: apache#13775 Co-authored-by: Luke Hutton <luke.hutton@arm.com> Change-Id: If6e65d74309702daf1f837a24e7b19c2912d0d9c

…atenate NPU used generic implementation for concatenate before cpu schedules were made the default fallback schedules. This leads to performance degradation as this blocks fusion with nearby ops. This commit adds Relay op strategy for arm_cpu implementation which makes it use arm_cpu schedule before cpu one. Reference: apache#13775 Co-authored-by: Luke Hutton <luke.hutton@arm.com> Change-Id: If6e65d74309702daf1f837a24e7b19c2912d0d9c

…atenate NPU used generic implementation for concatenate before cpu schedules were made the default fallback schedules. This leads to performance degradation as this blocks fusion with nearby ops. This commit adds Relay op strategy for arm_cpu implementation which makes it use arm_cpu schedule before cpu one. Reference: apache#13775 Co-authored-by: Luke Hutton <luke.hutton@arm.com>

…oncat (#14270) Previously used generic implementation for concatenate before cpu schedules were made the default fallback schedules. This leads to performance degradation as this blocks fusion with nearby ops. This commit adds Relay op strategy for arm_cpu implementation which makes it use arm_cpu schedule before cpu one. Reference: #13775 Co-authored-by: Luke Hutton <luke.hutton@arm.com>

…13775) * arm cpu is cpu * init commit * fix test

Currently the fallback used when compiling a dense operation with targets such as `llvm -device=arm_cpu` is `dense.generic`. This results very poor performance. Although apache#13775 meant that x86 schedules are used in cases where no strategy is provided by arm_cpu, the dense strategy is registered due to the existance of specialized schedules for arm_cpu e.g. a schedule for embedded devices. This commit ensures x86 schedules are used inplace of a generic schedule which yeilds much better performance. The commit also follows the same approach for the `dense.generic` schedule as the x86 strategy. This will only be used when autoscheduler is enabled. A test has been added to check the intended schedules are picked when compiling with `arm_cpu`. Change-Id: I8697f630d4acfab71a9626cf9e0dc3086987f163

Currently the fallback used when compiling a dense operation with targets such as `llvm -device=arm_cpu` is `dense.generic`. This results very poor performance. Although #13775 meant that x86 schedules are used in cases where no strategy is provided by arm_cpu, the dense strategy is registered due to the existance of specialized schedules for arm_cpu e.g. a schedule for embedded devices. This commit ensures x86 schedules are used inplace of a generic schedule which yeilds much better performance. The commit also follows the same approach for the `dense.generic` schedule as the x86 strategy. This will only be used when autoscheduler is enabled. A test has been added to check the intended schedules are picked when compiling with `arm_cpu`. Change-Id: I8697f630d4acfab71a9626cf9e0dc3086987f163

AndrewZhaoLuo added 2 commits January 12, 2023 10:32

arm cpu is cpu

0e412df

init commit

87b1486

AndrewZhaoLuo requested a review from Mousius January 12, 2023 18:53

AndrewZhaoLuo mentioned this pull request Jan 12, 2023

[Generic] Forward MS and AS rewrites for generic schedules #13754

Closed

fix test

b69bd22

Mousius approved these changes Jan 12, 2023

View reviewed changes

Mousius changed the title ~~[Target] Make key=arm_cpu --> key=arm_cpu,cpu~~ [Target] Make key=arm_cpu --> key=arm_cpu,cpu on AArch64 Jan 12, 2023

AndrewZhaoLuo merged commit 5878f60 into apache:main Jan 13, 2023

ashutosh-arm mentioned this pull request Mar 10, 2023

[Relay][Op] Connect existing arm_cpu schedule to relay strategy for concat #14270

Merged

fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023

[Target] Make key=arm_cpu --> key=arm_cpu,cpu on AArch64 (apache#…

254e8f5

…13775) * arm cpu is cpu * init commit * fix test

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

lhutton1 mentioned this pull request Aug 3, 2023

[Relay][Strategy] Use x86 dense schedules for arm_cpu #15470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Target] Make `key=arm_cpu` --> `key=arm_cpu,cpu` on AArch64 #13775

[Target] Make `key=arm_cpu` --> `key=arm_cpu,cpu` on AArch64 #13775

AndrewZhaoLuo commented Jan 12, 2023 •

edited

Loading

tvm-bot commented Jan 12, 2023

Mousius left a comment

[Target] Make key=arm_cpu --> key=arm_cpu,cpu on AArch64 #13775

[Target] Make key=arm_cpu --> key=arm_cpu,cpu on AArch64 #13775

Conversation

AndrewZhaoLuo commented Jan 12, 2023 • edited Loading

tvm-bot commented Jan 12, 2023

Mousius left a comment

Choose a reason for hiding this comment

[Target] Make `key=arm_cpu` --> `key=arm_cpu,cpu` on AArch64 #13775

[Target] Make `key=arm_cpu` --> `key=arm_cpu,cpu` on AArch64 #13775

AndrewZhaoLuo commented Jan 12, 2023 •

edited

Loading