[MXNET-1185] Support large array in several operators (part 1) #13418

apeforest · 2018-11-27T07:46:34Z

Description

This PR fixed the large array issue (#13036, #13070) in the following operators:
ndarray.ones
ndarray.zeros
ndarray.sum
ndarray.slice
ndarray.random.uniform
ndarray.empty

This PR is only the first effort that addresses some limitations in some basic operators. More tests and fix are coming as we identify more operators. This is part of a large project tracked in JIRA by https://issues.apache.org/jira/browse/MXNET-1184

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Nightly tests are added for complicated/long-running ones
Code is well-documented:
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Support large array (> 3billion elements) in certain operators
Added nightly tests on arrays of 5 billion elements

Comments

This PR only supports the total number of elements in the array to go beyond 2 billion. The max shape size in any of the dimension is still limited to 2^31 because this PR does not update the C API. Due to the large change required in all language bindings, we decided to break the original PR (#13191) into several smaller parts. And this PR is part 1.

This PR only changes the indexing variable type from int32_t to int64_t, not the data type of each element. No performance impact observed. For future guidance of data types to choose, please refer to https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support

zheng-da · 2018-11-27T08:20:38Z

The PR looks good to me.

zheng-da · 2018-11-27T14:03:08Z

thanks for the great work. your code modifies a lot of code and the test doesn't cover all of them.
could you please add more unit tests to make sure the modified code is covered?
Here I tried a bunch of tests. Most of them work. Unfortunately, broadcast div fails. Could you please double check the problem in broadcast div?

import mxnet as mx
import numpy as np
long_dim = 50000000
arr = mx.nd.ones(shape=(long_dim, 200))

deg = mx.nd.ones(shape=(long_dim,1)) * 2
# Test broadcast div
res = arr/deg
assert np.sum(arr[-1].asnumpy() == 0.5) == arr.shape[1]

# Test element-wise
arr2 = mx.nd.ones(shape=(long_dim, 200))
res = arr + arr2
assert np.sum(res[-1].asnumpy() == 2) == arr.shape[1]
res = arr + 1
assert np.sum(res[-1].asnumpy() == 2) == arr.shape[1]
res = mx.nd.sqrt(arr + 3)
assert np.sum(res[-1].asnumpy() == 2) == arr.shape[1]

# Test reduce
assert mx.nd.sum(arr).asnumpy() == arr.shape[0] * arr.shape[1]

# Test dot
weight = mx.nd.ones(shape=(200, 100))
res = mx.nd.dot(arr, weight)
assert np.sum(res[-1].asnumpy() == 200) == weight.shape[1]

# Test FullyConnected
res = mx.nd.FullyConnected(arr, weight, num_hidden=weight.shape[1], no_bias=True)
assert np.sum(res[-1].asnumpy() == 200) == weight.shape[1]

# Test broadcast
range = mx.nd.arange(0, long_dim).reshape(long_dim, 1)
res = mx.nd.broadcast_to(range, shape=(range.shape[0], 200))
assert np.sum(res[-1].asnumpy() == long_dim) == res.shape[1]
res = mx.nd.broadcast_like(range, arr)
assert np.sum(res[-1].asnumpy() == long_dim) == arr.shape[1]

# Test clip
data = res
res = mx.nd.clip(data, a_min=100, a_max=1000)
assert np.sum(res[-1].asnumpy() == 1000) == arr.shape[1]


# Test take
idx = mx.nd.arange(long_dim-1000, long_dim)
res = mx.nd.take(arr, idx)
assert np.sum(res[-1].asnumpy() == 1) == res.shape[1]

# Test slice
res = mx.nd.slice(arr, begin=(long_dim-1000, 1), end=(long_dim, 100))
assert np.sum(res[-1].asnumpy() == 1) == res.shape[1]

# Test slice assign
res = arr.copy()
res[long_dim-1:long_dim] = 1000
assert np.sum(res[-1].asnumpy() == 1000) == arr.shape[1]

# Test expand_dims
res = mx.nd.expand_dims(arr, axis=1)
assert res.shape == (arr.shape[0], 1, arr.shape[1])

# Test squeeze
data = res
res = mx.nd.squeeze(data)
assert sum(res.shape == arr.shape) == 2

vandanavk · 2018-11-27T17:13:38Z

@mxnet-label-bot add [pr-awaiting-review, NDArray]

ChaiBapchya

This is pretty useful. However, going forward to ensure consistency, could you guide us as to
when to use
size_t, index_t and auto (instead of the usual int)
Thanks!

samskalicky

LGTM. How are the test coming? If you need help with the test creation, can you point out a group of things needing testing and i'll help out.

apeforest · 2018-11-28T01:09:03Z

@zheng-da I have fixed the bug in broadcast_div and added the tests you provided. Thanks for your review and testing!

tests/nightly/test_large_array.py

zheng-da · 2018-11-28T07:08:22Z

Could you fix this bug as well? thanks.

def run_linear():
    data = mx.nd.ones(shape=(50*1000*1000, 100))
    linear = gluon.nn.Dense(100)
    linear.initialize(ctx=mx.cpu(0))
    res = linear(data)
    res.wait_to_read()

apeforest · 2018-11-28T21:16:52Z

@zheng-da I don't see problem with this test. I have added it to the nightly test script. Please review.

apeforest · 2018-11-28T21:26:27Z

@ChaiBapchya Yes, I will document this in cwiki after this change is merged. In general, index_t should be used for indexing elements in the tensor. size_t is used for returning the size of object or total number of elements. auto is a keyword in C++11 which derives the data type based on the rhs.

I found the Google C++ style guide to be a very good one to follow in programming: https://google.github.io/styleguide/cppguide.html#Integer_Types

…y-part1

zheng-da · 2018-11-29T03:36:04Z

@apeforest i can't run the test for the dense layer in my C5.18x.

zheng-da · 2018-11-29T03:37:30Z

do you build your code with mkldnn or without mkldnn?

zheng-da · 2018-11-29T03:43:07Z

i can verify that the test code for dense works when mxnet isn't built with mkldnn.

…y-part1

anirudh2290 · 2018-11-29T07:40:41Z

@pengzhao-intel raised some good concerns on #13036 about performance. Also from @wkcn tests I see div op performance affected between int32 and int64. Have we done any micro-benchmarks for the operators affected ?

lebeg

👍

tests/nightly/test_large_array.py

…y-part1

apeforest · 2018-11-29T18:16:14Z

@anirudh2290 This PR only changes the indexing variable type from int32_t to int64_t, not the data type of each element. So the performance impact is different from what @wkcn test.

I also did a test in broadcast_div operator in MXNet and found no obvious runtime change (int64_t version actually is slightly faster)

import mxnet as mx
import numpy as np
from mxnet.test_utils import check_speed

a = mx.sym.ones(shape=(100000, 10000))
b = mx.sym.ones(shape=(100000, 1)) * 2
c = mx.sym.broadcast_div(a, b)

print(check_speed(c))

Master branch: 15.357867193222045 seconds
PR branch: 12.930877685546875 seconds

yuxihu

LGTM. You have some good information in the conversation which I suggest to add to the comments section of the PR. It will make things clearer and easier for reviewers to catch up in the first place. I think you should mention: (1) This PR only changes the indexing variable type from int32_t to int64_t, not the data type of each element. No performance impact observed. (2) Provide a cwiki link where you will add guidance like: "In general, index_t should be used for indexing elements in the tensor. size_t is used for returning the size of object or total number of elements. auto is a keyword in C++11 which derives the data type based on the rhs."

apeforest · 2018-11-29T19:23:40Z

@yuxihu Thanks for your nice suggestion. I have updated the PR comment section.

anirudh2290

Thanks for the clarification. Looks good overall. Can we also add a unit test for scatter_nd.

…y-part1

apeforest · 2018-11-30T00:03:38Z

@anirudh2290 Given the overhead of repeating CI, can we get this merged and add unit test for scatter_nd in an second PR? This PR is only part-1 of a larger PR anyways. Thanks.

pengzhao-intel · 2018-11-30T01:52:34Z

@pengzhao-intel raised some good concerns on #13036 about performance. Also from @wkcn tests I see div op performance affected between int32 and int64. Have we done any micro-benchmarks for the operators affected ?

@anirudh2290 Agree with you.
I see the changes make the INT64 by default in several OPs.
Is it a little aggresive? How about use template to switch the data type?

zheng-da · 2018-11-30T02:35:49Z

@pengzhao-intel I think the performance impact is minimal. Only the index type is changed. The operations on the index is basically add and comparison. All other computation is much more expensive. I double there is measurable performance difference.

pengzhao-intel · 2018-11-30T03:46:53Z

@zheng-da @apeforest I am fine with the current PR since it's only the first part of the changes.

I have re-visited the proposal in here and I suggest to add more details and data, such as how CUDA and MKL-DNN supports INT64 type and any potential impacts.

…y-part1

anirudh2290

Please add a TODO (JIRA) for the remaining test. Also, makes sense to bring this proposal up for discussion on the dev list.

apeforest · 2018-12-01T01:10:29Z

Thank god the CI tests finally passed!
@anirudh2290 A JIRA epic has already been created to track this project: https://issues.apache.org/jira/browse/MXNET-1184. I will bring this proposal to dev list for discusssion. Since this change should update APIs in all language bindings, help is needed from experts in different language support.

…ile (#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test #13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

…e#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI

…ile (apache#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (apache#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (apache#13431) * Skip flaky test apache#13446 (apache#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (apache#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (apache#13463) * [MXNET-1158] JVM Memory Management Documentation (apache#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (apache#13414) Update row_sparse tutorial * Add resiliency to onnx export code (apache#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (apache#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (apache#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (apache#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (apache#13495) * clarify ops faq regarding docs strings (apache#13492) * Add graph_compact operator. (apache#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (apache#13474) * update github location for sampled_block.py (apache#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * apache#13453 [Clojure] - Add Spec Validations to the Optimizer namespace (apache#13499) * ONNX export: Logical operators (apache#12852) * Fix cmake options parsing in dev_menu (apache#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (apache#12380)" (apache#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (apache#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (apache#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (apache#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (apache#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: apache#13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

fix a few operators with large arrays (# of elements)

d2132ea

apeforest requested a review from anirudh2290 as a code owner November 27, 2018 07:46

apeforest mentioned this pull request Nov 27, 2018

[MXNET-1185] [WIP] Support large array in several operators #13191

Closed

7 tasks

marcoabreu added NDArray pr-awaiting-review PR is waiting for code review labels Nov 27, 2018

ChaiBapchya approved these changes Nov 27, 2018

View reviewed changes

samskalicky approved these changes Nov 28, 2018

View reviewed changes

fix bug in broadcast_div and add tests

1077428

zheng-da reviewed Nov 28, 2018

View reviewed changes

tests/nightly/test_large_array.py Outdated Show resolved Hide resolved

tests/nightly/test_large_array.py Outdated Show resolved Hide resolved

address reviewer comment

cbcee23

add unit test

48a8b60

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

3099d35

…y-part1

zheng-da approved these changes Nov 29, 2018

View reviewed changes

apeforest added 2 commits November 28, 2018 21:42

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

1910d89

…y-part1

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

c8503cd

…y-part1

lebeg approved these changes Nov 29, 2018

View reviewed changes

tests/nightly/test_large_array.py Outdated Show resolved Hide resolved

apeforest added 2 commits November 29, 2018 09:42

add empty line

a2b70d0

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

ad2ee47

…y-part1

yuxihu approved these changes Nov 29, 2018

View reviewed changes

anirudh2290 reviewed Nov 29, 2018

View reviewed changes

apeforest added 2 commits November 29, 2018 13:22

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

8a53c86

…y-part1

retrigger CI

e5f78c0

apeforest added 2 commits November 29, 2018 21:19

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

45a754f

…y-part1

Merge remote-tracking branch 'upstream/master' into bugfix/large-arra…

b354f04

…y-part1

anirudh2290 approved these changes Nov 30, 2018

View reviewed changes

anirudh2290 merged commit 0d480fb into apache:master Dec 1, 2018

This was referenced Feb 5, 2019

[MXNET-891] Support tuple of scales in upsample operator #14042

Closed

Fix nd.pick large array issue #14082

Merged

leezu mentioned this pull request Oct 21, 2019

[Discussion] 1.6.0 Roadmap #15589

Closed

apeforest deleted the bugfix/large-array-part1 branch February 25, 2020 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1185] Support large array in several operators (part 1) #13418

[MXNET-1185] Support large array in several operators (part 1) #13418

apeforest commented Nov 27, 2018 •

edited

Loading

zheng-da commented Nov 27, 2018

zheng-da commented Nov 27, 2018

vandanavk commented Nov 27, 2018

ChaiBapchya left a comment

samskalicky left a comment

apeforest commented Nov 28, 2018

zheng-da commented Nov 28, 2018

apeforest commented Nov 28, 2018

apeforest commented Nov 28, 2018

zheng-da commented Nov 29, 2018

zheng-da commented Nov 29, 2018

zheng-da commented Nov 29, 2018

anirudh2290 commented Nov 29, 2018

lebeg left a comment

apeforest commented Nov 29, 2018 •

edited

Loading

yuxihu left a comment

apeforest commented Nov 29, 2018

anirudh2290 left a comment

apeforest commented Nov 30, 2018

pengzhao-intel commented Nov 30, 2018

zheng-da commented Nov 30, 2018

pengzhao-intel commented Nov 30, 2018

anirudh2290 left a comment •

edited

Loading

apeforest commented Dec 1, 2018

[MXNET-1185] Support large array in several operators (part 1) #13418

[MXNET-1185] Support large array in several operators (part 1) #13418

Conversation

apeforest commented Nov 27, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

zheng-da commented Nov 27, 2018

zheng-da commented Nov 27, 2018

vandanavk commented Nov 27, 2018

ChaiBapchya left a comment

Choose a reason for hiding this comment

samskalicky left a comment

Choose a reason for hiding this comment

apeforest commented Nov 28, 2018

zheng-da commented Nov 28, 2018

apeforest commented Nov 28, 2018

apeforest commented Nov 28, 2018

zheng-da commented Nov 29, 2018

zheng-da commented Nov 29, 2018

zheng-da commented Nov 29, 2018

anirudh2290 commented Nov 29, 2018

lebeg left a comment

Choose a reason for hiding this comment

apeforest commented Nov 29, 2018 • edited Loading

yuxihu left a comment

Choose a reason for hiding this comment

apeforest commented Nov 29, 2018

anirudh2290 left a comment

Choose a reason for hiding this comment

apeforest commented Nov 30, 2018

pengzhao-intel commented Nov 30, 2018

zheng-da commented Nov 30, 2018

pengzhao-intel commented Nov 30, 2018

anirudh2290 left a comment • edited Loading

Choose a reason for hiding this comment

apeforest commented Dec 1, 2018

apeforest commented Nov 27, 2018 •

edited

Loading

apeforest commented Nov 29, 2018 •

edited

Loading

anirudh2290 left a comment •

edited

Loading