[Relay] Fix index order in conv2d computation for Arm CPU. #8361

AnastasiaStulova · 2021-06-28T16:52:45Z

When dilation is larger than value 1 in conv2d with NHWC
layout, the ordering of indexes when accessing data array
in computation of convolution appears to be incorrect.

'data_vec' is defined as

lambda n, oho, owo, kh, kw, ic, ohi, owi:

but accessed as

data_vec[n, oho, owo, kh, kw, ohi, owi, ic]

This patch fixes the order of indexes and modifies the test
so that it is suitable for running on an AArch64 CPU.

AnastasiaStulova · 2021-06-28T16:53:02Z

@giuseros

giuseros

Nice catch @AnastasiaStulova ! So yes, I agree on the testing part. If you could add a separate PR that enables arm_cpu tests, that would be very helpful

giuseros · 2021-06-29T14:08:23Z

@mbaret @leandron could you double check this and possibly merge?

mbaret · 2021-06-29T14:53:08Z

This patch looks good to me, ideally we should have test cases for this though. I'm not up to speed on the status of Arm CI so am unsure how immediately feasible this is, perhaps @u99127 or @leandron know more.

u99127 · 2021-06-30T09:08:24Z

This patch looks good to me, ideally we should have test cases for this though. I'm not up to speed on the status of Arm CI so am unsure how immediately feasible this is, perhaps @u99127 or @leandron know more.

do the existing topi tests catch the failure on AArch64 ?

AnastasiaStulova · 2021-06-30T12:36:08Z

This patch looks good to me, ideally we should have test cases for this though. I'm not up to speed on the status of Arm CI so am unsure how immediately feasible this is, perhaps @u99127 or @leandron know more.

do the existing topi tests catch the failure on AArch64 ?

@u99127 The test indeed caught the bug and it already what we need for arm_cpu testing
https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_conv2d_nhwc.py#L33
but it is not enabled by default in the list of tested devices:
https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_conv2d_nhwc.py#L78
It is very easy to add it to the list "llvm -device=arm_cpu" then it runs straight away.

AnastasiaStulova · 2021-06-30T12:40:28Z

FYI there is also https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_conv2d_nchw.py for testing NCHW layout that already runs on all available targets. So if we run it where Arm CPU is enabled it will test it automatically.

The only problem could be that it is a bit long-running as it has more than 350 test cases but we could think of creating a fast running mode too.

u99127 · 2021-07-01T22:20:37Z

FYI there is also https://github.com/apache/tvm/blob/main/tests/python/topi/python/test_topi_conv2d_nchw.py for testing NCHW layout that already runs on all available targets. So if we run it where Arm CPU is enabled it will test it automatically.

The only problem could be that it is a bit long-running as it has more than 350 test cases but we could think of creating a fast running mode too.

Got it - yes I could just reproduce the failure with that test but as you say I did have to add 'llvm -device=arm_cpu' to the devices list as you said to see the AssertionError failure, so may be the solution here is that we add 'llvm -device=arm_cpu' to the test here or actually fix the test to only use 'llvm -device=arm_cpu' when running on ci_arm rather than having it run twice on the GPU CI which is where TOPI testing is turned on.

I suspect the following things need to happen here to progress this with the tests.

We run the topi tests in Jenkins using the Task script and just turn that on for the CI on AArch64. That is a one liner change and hopefully it works. However while pipe cleaning that on my local machine , I've hit a few testisms in the topi tests that need to be cleaned up. Especially with some tests being suitable only for CUDA but not being marked as so. I can help with this. That's pretty easy and straightforward but will need buy in from the project that running these tests in AArch64 CI is a good thing ! I think this bug just shows the value of running these tests in CI. I would prefer to start with the entirety of the TOPI tests, it appears to take me about 30-40 minutes on a machine that has similar characteristics to the one in CI.
I agree with you that we should look to add to this PR an additional change to the test script you mention by adding 'llvm -device=arm_cpu' to the list of targets only if one is running the test on AArch64 hardware, otherwise when we come around to doing 1 above we'd be running this test once for the llvm target and once for an llvm -device=arm_cpu and just duplicating testing for a path that is unneeded . For bonus points we should audit the topi test directory and clean it up as we go along because I note that many tests again run for both llvm and llvm -device=arm_cpu which probably just uses more runtime during CI when it is not needed.

regards
Ramana

When dilation is larger than value 1 in conv2d with NHWC layout, the ordering of indexes when accessing data array in computation of convolution appears to be incorrect. 'data_vec' is defined as lambda n, oho, owo, kh, kw, ic, ohi, owi: But accessed as data_vec[n, oho, owo, kh, kw, ohi, owi, ic] This patch fixes the order of indexes and modifies the test so that it is suitable for running on an AArch64 CPU.

This pull request cleans up the TOPI testuite for use on the AArch64 CI target by doing the following: - Introducing a script to run the tests on AArch64 with a suitable invocation of the llvm target string by setting the TVM_TEST_TARGETS environment variable. - Cleaning up the use of hard coded targets and moving the testsuite to testing more sensibly with the use of tvm.testing.enabled_targets. - Cleanup the use of tvm.target.create. - The above allows for the use of tests reasonably with the topi tests and cleans up what is needed from the testsuite. Putting this up for a test run on ci_gpu and ci_cpu to see the effects of moving TOPI test runs to AArch64 CPU before firing up the Jenkins changes. The motivation was from apache#8361 to pipeclean and add this support.

This pull request cleans up the TOPI testuite for use on the AArch64 CI target by doing the following: - Introducing a script to run the tests on AArch64 with a suitable invocation of the llvm target string by setting the TVM_TEST_TARGETS environment variable. - Cleaning up the use of hard coded targets and moving the testsuite to testing more sensibly with the use of tvm.testing.enabled_targets. - Cleanup the use of tvm.target.create vs tvm.target.Target - The above allows for the use of tests reasonably with the topi tests and cleans up what is needed from the testsuite. Putting this up for a test run on ci_gpu and ci_cpu to see the effects of moving TOPI test runs to AArch64 CPU before firing up the Jenkins changes. The motivation was from apache#8361 to pipeclean and add this support.

AnastasiaStulova · 2021-07-06T12:42:15Z

This PR now allows enabling testing of this functionality in CI for Aarch64 after the following #8409 gets merged.

jcf94

Thanks! LGTM.

Does this PR has any dependency with #8409 ? I think it's ready for merge? @AnastasiaStulova

u99127 · 2021-07-08T14:04:52Z

I think this can be merged before #8409 as it would make my life easier to know that upstream tests are as clan as possible.

jcf94 · 2021-07-08T14:06:18Z

Thanks! @u99127 @AnastasiaStulova @giuseros

This pull request cleans up the TOPI testuite for use on the AArch64 CI target by doing the following: - Introducing a script to run the tests on AArch64 with a suitable invocation of the llvm target string by setting the TVM_TEST_TARGETS environment variable. - Cleaning up the use of hard coded targets and moving the testsuite to testing more sensibly with the use of tvm.testing.enabled_targets. - Cleanup the use of tvm.target.create. - The above allows for the use of tests reasonably with the topi tests and cleans up what is needed from the testsuite. Putting this up for a test run on ci_gpu and ci_cpu to see the effects of moving TOPI test runs to AArch64 CPU before firing up the Jenkins changes. The motivation was from apache#8361 to pipeclean and add this support.

When dilation is larger than value 1 in conv2d with NHWC layout, the ordering of indexes when accessing data array in computation of convolution appears to be incorrect. 'data_vec' is defined as lambda n, oho, owo, kh, kw, ic, ohi, owi: But accessed as data_vec[n, oho, owo, kh, kw, ohi, owi, ic] This patch fixes the order of indexes and modifies the test so that it is suitable for running on an AArch64 CPU.

giuseros approved these changes Jun 29, 2021

View reviewed changes

AnastasiaStulova force-pushed the armcpu_conv2d_nhwc branch from b5e3063 to dc557ae Compare July 6, 2021 11:12

u99127 mentioned this pull request Jul 6, 2021

Prepare the topi tests for AArch64 CI. #8409

Closed

AnastasiaStulova force-pushed the armcpu_conv2d_nhwc branch from dc557ae to 125cb5e Compare July 6, 2021 14:02

AnastasiaStulova mentioned this pull request Jul 8, 2021

[Relay] Add support of conv2d with NHWC for Mali #8422

Merged

jcf94 approved these changes Jul 8, 2021

View reviewed changes

jcf94 merged commit 0b39af7 into apache:main Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] Fix index order in conv2d computation for Arm CPU. #8361

[Relay] Fix index order in conv2d computation for Arm CPU. #8361

AnastasiaStulova commented Jun 28, 2021 •

edited

Loading

AnastasiaStulova commented Jun 28, 2021

giuseros left a comment

giuseros commented Jun 29, 2021

mbaret commented Jun 29, 2021

u99127 commented Jun 30, 2021

AnastasiaStulova commented Jun 30, 2021

AnastasiaStulova commented Jun 30, 2021

u99127 commented Jul 1, 2021

AnastasiaStulova commented Jul 6, 2021

jcf94 left a comment •

edited

Loading

u99127 commented Jul 8, 2021

jcf94 commented Jul 8, 2021 •

edited

Loading

[Relay] Fix index order in conv2d computation for Arm CPU. #8361

[Relay] Fix index order in conv2d computation for Arm CPU. #8361

Conversation

AnastasiaStulova commented Jun 28, 2021 • edited Loading

AnastasiaStulova commented Jun 28, 2021

giuseros left a comment

Choose a reason for hiding this comment

giuseros commented Jun 29, 2021

mbaret commented Jun 29, 2021

u99127 commented Jun 30, 2021

AnastasiaStulova commented Jun 30, 2021

AnastasiaStulova commented Jun 30, 2021

u99127 commented Jul 1, 2021

AnastasiaStulova commented Jul 6, 2021

jcf94 left a comment • edited Loading

Choose a reason for hiding this comment

u99127 commented Jul 8, 2021

jcf94 commented Jul 8, 2021 • edited Loading

AnastasiaStulova commented Jun 28, 2021 •

edited

Loading

jcf94 left a comment •

edited

Loading

jcf94 commented Jul 8, 2021 •

edited

Loading