Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Improve stack operator performance by oneDNN #20621

Merged
merged 4 commits into from
Nov 22, 2021
Merged

Conversation

bgawrych
Copy link
Contributor

Description

Improves performance of stack operation. Performance results shows significant speedup on axis=0 (up to 7x faster).

Performance results collected on CLX8280 with KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 OMP_NUM_THREADS=28 numactl --physcpubind=0-27 --membind=0:

    master onednn
shape axis time time
(128, 128) 0 0.007561 0.008217
(128, 128) 1 0.004158 0.00457
(128, 512) 0 0.014108 0.007263
(128, 512) 1 0.004416 0.005567
(128, 1024) 0 0.024753 0.009431
(128, 1024) 1 0.0046 0.004892
(128, 4096) 0 0.088938 0.025933
(128, 4096) 1 0.006305 0.006167
(512, 128) 0 0.012593 0.006721
(512, 128) 1 0.004545 0.00462
(512, 512) 0 0.043897 0.01301
(512, 512) 1 0.005042 0.005218
(512, 1024) 0 0.079853 0.016997
(512, 1024) 1 0.006117 0.006382
(512, 4096) 0 0.517834 0.097284
(512, 4096) 1 0.070154 0.038691
(1024, 128) 0 0.022151 0.008327
(1024, 128) 1 0.004755 0.004991
(1024, 512) 0 0.080592 0.017348
(1024, 512) 1 0.006391 0.006452
(1024, 1024) 0 0.205667 0.040287
(1024, 1024) 1 0.013286 0.013144
(1024, 4096) 0 1.159914 0.267409
(1024, 4096) 1 0.174798 0.153152
(4096, 128) 0 0.081543 0.017331
(4096, 128) 1 0.006936 0.006952
(4096, 512) 0 0.575121 0.079814
(4096, 512) 1 0.084379 0.040853
(4096, 1024) 0 1.244555 0.251577
(4096, 1024) 1 0.1782 0.154799
(4096, 4096) 0 5.169306 1.180926
(4096, 4096) 1 0.766602 0.740192
(32, 128, 128) 0 0.080957 0.017508
(32, 128, 128) 1 0.00692 0.006721
(32, 128, 128) 2 0.006921 0.006859
(32, 128, 512) 0 0.555404 0.081633
(32, 128, 512) 1 0.077143 0.037545
(32, 128, 512) 2 0.083525 0.041425
(32, 128, 1024) 0 1.225558 0.255515
(32, 128, 1024) 1 0.190202 0.154146
(32, 128, 1024) 2 0.177495 0.1549
(32, 128, 4096) 0 5.006225 1.090737
(32, 128, 4096) 1 0.831286 0.759118
(32, 128, 4096) 2 0.765793 0.742179
(32, 512, 128) 0 0.560635 0.090112
(32, 512, 128) 1 0.076585 0.042584
(32, 512, 128) 2 0.095465 0.04338
(32, 512, 512) 0 2.536246 0.541157
(32, 512, 512) 1 0.397728 0.341854
(32, 512, 512) 2 0.407399 0.35051
(32, 512, 1024) 0 5.034069 1.092211
(32, 512, 1024) 1 0.830025 0.760654
(32, 512, 1024) 2 0.772979 0.740602
(32, 512, 4096) 0 20.72267 4.655413
(32, 512, 4096) 1 3.503717 3.075174
(32, 512, 4096) 2 3.02452 3.002688
(32, 1024, 128) 0 1.196986 0.24314
(32, 1024, 128) 1 0.190801 0.15396
(32, 1024, 128) 2 0.220551 0.154176
(32, 1024, 512) 0 5.024947 1.09671
(32, 1024, 512) 1 0.828758 0.768377
(32, 1024, 512) 2 0.842505 0.748963
(32, 1024, 1024) 0 10.38974 2.242758
(32, 1024, 1024) 1 1.869875 1.547855
(32, 1024, 1024) 2 1.538614 1.496964
(32, 1024, 4096) 0 41.49604 9.043207
(32, 1024, 4096) 1 7.476183 6.120244
(32, 1024, 4096) 2 6.035282 6.005883
(32, 4096, 128) 0 5.003981 1.093519
(32, 4096, 128) 1 0.826404 0.769504
(32, 4096, 128) 2 0.926172 0.751195
(32, 4096, 512) 0 20.09485 4.335339
(32, 4096, 512) 1 3.502722 3.074266
(32, 4096, 512) 2 3.359437 3.006657
(32, 4096, 1024) 0 40.23293 8.423752
(32, 4096, 1024) 1 7.462769 6.156365
(32, 4096, 1024) 2 6.141823 6.012172
(32, 4096, 4096) 0 154.2752 35.87757
(32, 4096, 4096) 1 28.58106 24.12571
(32, 4096, 4096) 2 24.46488 23.94641
import mxnet
import mxnet.gluon.nn as nn
import mxnet.numpy as np
import time

class TestStack(nn.HybridBlock):
    def __init__(self, axis=None):
        super(TestStack, self).__init__()
        self._axis = axis

    def forward(self, a, *args):
        return np.stack([a] + list(args), axis=self._axis)

dims = [128, 512, 1024, 4096]
print("shape;axis;time")
for ndim in range (2):
   for dim1 in dims:
     for dim2 in dims:
        shape = (dim1, dim2) if ndim == 0 else (32, dim1, dim2)
        a = np.random.uniform(-1.0, 1.0, shape).astype(np.float32)
        b = np.random.uniform(-1.0, 1.0, shape).astype(np.float32)
        c = np.random.uniform(-1.0, 1.0, shape).astype(np.float32)
        d = np.random.uniform(-1.0, 1.0, shape).astype(np.float32)
        for axis in range(2 + ndim):
            stack = TestStack(axis)
            stack.hybridize()
            tic = time.time()
            for i in range(100):
                out = np.stack([a, b, c, d], axis=axis)
                out.wait_to_read()
            toc = time.time()
            print(f"{shape};{axis};{toc-tic}")

@mxnet-bot
Copy link

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, windows-gpu, sanity, windows-cpu, centos-gpu, unix-cpu, website, edge, clang, miscellaneous, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Sep 29, 2021
Copy link
Contributor

@bartekkuncer bartekkuncer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all the MKLDNN nomenclature to DNNL.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 30, 2021
@bgawrych bgawrych requested a review from szha as a code owner October 4, 2021 06:03
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 4, 2021
@bgawrych
Copy link
Contributor Author

bgawrych commented Oct 5, 2021

@mxnet-bot run ci [centos-cpu, unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, centos-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 5, 2021
@bgawrych
Copy link
Contributor Author

bgawrych commented Oct 6, 2021

@mxnet-bot run ci [centos-cpu, unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 10, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 10, 2021
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu, miscellaneous, unix-cpu, unix-gpu, website, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [website, windows-gpu, centos-gpu, unix-gpu, unix-cpu, miscellaneous]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 16, 2021
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu, unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Nov 16, 2021
@mozga-intel
Copy link
Contributor

@szha Could you help with the merge, thanks!

@szha szha merged commit 1a8f6e6 into apache:master Nov 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants