[numpy] Support zero-dim and zero-size tensors in MXNet #14661

reminisce · 2019-04-10T07:39:09Z

Description

This PR provides the infrastructure of supporting zero-dim and zero-size tensors as the first outcome of the initiative of introducing NumPy compatible coding experience into MXNet (see this RFC). Great thanks to many folks who have contributed design, implementation and code review to this feature. The credits should go to all of them. Sorry if I missed anyone below. It would be impossible to make so many things correct scattered all over the codebase with just a couple of hands.

C++ backend and Python: @junrushao1994 @eric-haibin-lin @szha @zheng-da @wkcn @mli
MKLDNN: @pengzhao-intel @TaoLv @lihaofd @ZhennanQin
Scala: @yzhliu
R: @hetong007
Perl: @sergeykolychev

FAQ

What are zero-dim and zero-size tensors?

Zero-dim tensors are scalars with shape equal to ().
A zero-size tensor is the one whose shape has at least one dimension size as 0. For example, np.ones((1, 0, 3)) would generate a zero-size tensor with shape equal to (1, 0, 3).

Why are they important?

Mathematically speaking, their presence keeps the completeness and consistency of the all the tensor operations. For example, given x = mx.nd.array([1, 2, 3]), x[0] should return 1, instead of [1], and x[0:0] should return [] with shape equal to (0,). Zero-size tensors can also be convenient in easing the code logic as placeholders for accumulations or aggregations.

I find this thread provides very good insights on the importance of zero-dim tensors.

How are they implemented?

In the backend (C++), we use ndim = -1 to represent unknown shapes and dim_size = -1 for unknown dim sizes. Before, they were represented by 0.

Is backward compatibility guaranteed in this PR?

Yes, the backward compatibility is guaranteed by default. That means 0 still represents unknown ndim or dim size in any frontend language bindings. It's just converted to -1 in the infer shape logic of the backend.

How to enable zero-dim or zero-size tensors?

Since we are committed to keep backward compatibility, we provided APIs for users to decide whether to opt in for this NumPy compatibility. Users can call mx.set_np_compat(True) to opt in and mx.set_np_compat(False) to opt out. Or in a safer way, use the with statement. To turn on this:

 with mx.np_compat(active=True):
     # A scalar tensor's shape is `()`, whose `ndim` is `0`.
     scalar = mx.nd.ones(shape=())
     assert scalar.shape == ()
 
     # In NumPy compatible mode, 0 in a shape means that dimension contains zero elements.
     data = mx.sym.var("data", shape=(0, 2, 3))
     ret = mx.sym.sin(data)
     arg_shapes, out_shapes, _ = ret.infer_shape()
     assert arg_shapes[0] == (0, 2, 3)
     assert out_shapes[0] == (0, 2, 3)
 
     # -1 means unknown shape dimension size in the new NumPy-compatible shape definition
     data = mx.sym.var("data", shape=(-1, 2, 3))
     ret = mx.sym.sin(data)
     arg_shapes, out_shapes, _ = ret.infer_shape_partial()
     assert arg_shapes[0] == (-1, 2, 3)
     assert out_shapes[0] == (-1, 2, 3)
 
     # When a shape is completely unknown in NumPy-compatible mode, it is
     # represented as `None` in Python.
     data = mx.sym.var("data")
     ret = mx.sym.sin(data)
     arg_shapes, out_shapes, _ = ret.infer_shape_partial()
     assert arg_shapes[0] is None
     assert out_shapes[0] is None

or to disable this:

 with mx.np_compat(active=False):
     # 0 means unknown shape dimension size in the legacy shape definition.
     data = mx.sym.var("data", shape=(0, 2, 3))
     ret = mx.sym.sin(data)
     arg_shapes, out_shapes, _ = ret.infer_shape_partial()
     assert arg_shapes[0] == (0, 2, 3)
     assert out_shapes[0] == (0, 2, 3)
 
     # When a shape is completely unknown in the legacy mode (default), its ndim is
     # equal to 0 and it is represented as `()` in Python.
     data = mx.sym.var("data")
     ret = mx.sym.sin(data)
     arg_shapes, out_shapes, _ = ret.infer_shape_partial()
     assert arg_shapes[0] == ()
     assert out_shapes[0] == ()

Does this mean that every existing operator should support zero-dim or zero-size tensors in this PR?

Please note that the existing operators were implemented when these two types of tensors were not supported in MXNet. Some strong assumptions may have been made in their implementation and hence, lead to errors when NumPy compatibility is turned on. This PR only provides the infrastructure of supporting zero-dim and zero-size tensors in the backend. It does not guarantee that every existing operator would deal with zero-dim/size tensors correctly as in NumPy. As discussed in the RFC, we are going to implement NumPy operators under mxnet.numpy that would fully support these two types of tensors.

wkcn · 2019-04-10T11:56:02Z

Great work! Thank you!

eric-haibin-lin

I reviewed all functions in *.py files, they look good to me

reminisce · 2019-04-13T21:07:01Z

@hetong007 @yzhliu @sergeykolychev Could you please help review the changes in R, Scala, and Perl respectively? We would like to merge this PR sooner than later since our subsequent work depends on this and it's becoming harder to rebase.

szha · 2019-04-14T05:42:10Z

@jeremiedb @lanking520 could you help out on this for reviews too? Thanks!

yzhliu · 2019-04-14T05:45:10Z

LGTM

cpp-package/include/mxnet-cpp/symbol.hpp

include/mxnet/tuple.h

python/mxnet/base.py

lanking520

Please add some comments on the changes for Scala part.

scala-package/core/src/main/scala/org/apache/mxnet/NumpyScope.scala

scala-package/native/src/main/native/org_apache_mxnet_native_c_api.cc

jeremiedb · 2019-04-15T13:27:08Z

@hetong007 Were you seing a need for adding integrating a set_np_compat option in the R-package? Otherwise, LGTM.

TaoLv · 2019-04-15T15:46:16Z

Reviewed changes for mkldnn files. @juliusshufan you might want to have a performance validation on this PR.

hetong007 · 2019-04-15T18:22:57Z

@hetong007 Were you seing a need for adding integrating a set_np_compat option in the R-package? Otherwise, LGTM.

@jeremiedb Would you elaborate more on this need? I think the current R API maps nd.array to native R array so this PR has minimum effect on R-related uses.

* Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set

…ib` (apache#14409) * Initial commit * Address comments

* Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func

…che#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py

* fix. * remove type conversion. * remove type cast.

* Initial commit * Address comments from Jun

* Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2

anirudh2290 · 2019-04-15T22:31:08Z

Thanks for the decorator! I think customers may not understand the internal workings of custom op and the dispatch to a different thread workflow, and run into this pitfall. Can we synchronize the state of is_numpy_compat when pushing the op to custom op thread. Custom op is one area where customers use numpy operations and I feel there may be users running into this.

reminisce · 2019-04-16T01:10:39Z

@anirudh2290 That's a good point from users' perspective. But this may not be a trivial implementation by propagating the np_compat from main thread to worker thread as potentially there are other pitfalls. I discussed this @eric-haibin-lin, maybe we can attach a compatibility state to the generated custom op. This can become a follow-up implementation in another PR. This PR does not need to be blocked by this.

szha · 2019-04-16T04:27:28Z

@anirudh2290 good points on custom op. Given the size of this PR and the fact that users need to opt-in to be affected by this change, I'd recommend we address these points in a follow-up PR, so that @reminisce won't have to keep full-time resolving conflicts :)

szha

python, C API, and operator changes LGTM

anirudh2290 · 2019-04-16T06:15:23Z

I am okay with making custom op changes later. just to be sure, the the is_numpy_compat thread local variable is set in the main thread, and is expected to be accessed only from main thread ? For example, this wont cause any issue the multi threaded inference interface provided by frontends like Scala ?

addressed

junrushao

LGTM

reminisce · 2019-04-16T06:43:59Z

@anirudh2290 Thanks for the review. In practice, np_compat needs to be set in the same thread as the one invoking shape inference functions, because that's where zero-dim and zero-size shapes are treated as unknown. In your example, the custom op's forward function invokes mx.nd.ones whose shape inference function was executed on the worker thread, and hence leads to failure. We are going to move all ops towards the direction of being numpy compatible, and that should include the custom op, where the forward/backward functions should always be scoped by the numpy-compatible state, and this problem can be resolved without asking users to use the decorator.

I'm not familiar with how scala multi-threaded inference is implemented, but as long as the np_compat is set in the same thread as the one invoking shape inference functions, the result should be as expected.

TaoLv

Verified performance and accuracy for several CNN models with MKL-DNN backend. They look good to me.

* [numpy] Shape support scalar tensor (apache#14315) * Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set * [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (apache#14409) * Initial commit * Address comments * [WIP] Use new shape definition (apache#14453) * Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func * [numpy] Fix unit tests after introducing numpy compatible shapes (apache#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py * Fix a bug to pass the test in test_contrib_rnn (apache#14520) * fix. * remove type conversion. * remove type cast. * [numpy] Fix test_dynamic_shape.test_dynamic_shape (apache#14538) * Initial commit * Address comments from Jun * [numpy] Fix numpy import in python2 (apache#14537) * Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2 * fix concat and slice (apache#14549) * fix R-package (apache#14536) * Fix cpp package build after using new shape definition (apache#14554) * Fix pooling_v1 and deformable_convolution param initialization (apache#14577) * Fix pooling_v1 param initialization * Fix deformable_convolution param initialization * [Numpy] Misc fix (apache#14612) * [Numpy] Misc Fix * fix build * !shape_is_none => shape_is_known * Address comments * Fix * [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (apache#14557) * Fix test_operator_gpu.test_upsampling_bilinear_with_type * Address comments * [Numpy] Java/Scala modification (apache#14625) * modify jni to support 0 dim/shape * fix transpose axes default value * fix shape index bug (apache#14630) * fix jni lint (apache#14634) * [numpy] Fix numpy branch failing tests in CI (apache#14639) * Remove numpy namespaces for operator registration * Fix bug when shape is compeltely unknown * Fix singed/unsigned compare warning * Fix CI * Fix pylint * Avoid launching gpu kernels for zero-size output tensors * Fix test_ndarray * Fix binary broadcast with zero-size tensors * Better error message for infer shape failure in imperative * Fix TShape constructor ambiguity on certain platforms * Fix mkldnn build failure * Fix build failure in gpu and cpp test * Fix gpu cpp test build with mkldnn * Fix mkldnn cpp test * Fix concatenating zero-size tensors * Avoid letting mkldnn handle zero-size tensors in concat * Fix quantized_concat infer shape * Try to fix perl c api * fix invalid ndarray dispose (apache#14657) * swig fixes for the changes in c_api.h (apache#14655) * Rename np_comp to np_compat for readability * Fix import error * Keep old c apis unchanged * Fix lint * Rebase and fix build * Fix R build failure * Fix Perl build failure * Rebase with master * Address cr comments * Use just one scope to represent numpy compatibility * Add code comment to NumpyScope object in Scala * Add use_np_compat decorator * Fix pylint

Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with apache#14661.

* Implement the index_array operator * Add index_array operator tests * Add index_array operator GPU tests * Add the index_array operator to the Python docs autosummary * Add the author of the index_array operator to CONTRIBUTORS.md * Make index_array compatible with zero-dim and zero-size arrays Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with #14661. * Fix the index_array gradient checks in the unit tests In the previous implementation, the output gradient had an incorrect shape. This commit fixes the shapes and makes the tests more readable. * Add zero-dim and zero-size array tests for index_array * Use mxnet::Tuple<int> instead of TShape for the axes parameter * Fix incorrect array indexing in index_array Solves access violations when compiling with MSVC++ 14.0. * Avoid copying the input shape array in the index_array shape function * Add unknown shape handling to index_array * Use SHAPE_ASSIGN_CHECK to assign the shape in index_array * Remove the redundant index_array GPU tests from test_operator_gpu.py * Move the index_array tests into a single function (test_index_array) * Use @mx.use_np_compat instead of mx.np_compat in index_array op tests * Remove the use of template specialization for IndexArrayForward * Add the index_array operator to the AMP symbol list * Retrigger CI

* [numpy] Shape support scalar tensor (apache#14315) * Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set * [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (apache#14409) * Initial commit * Address comments * [WIP] Use new shape definition (apache#14453) * Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func * [numpy] Fix unit tests after introducing numpy compatible shapes (apache#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py * Fix a bug to pass the test in test_contrib_rnn (apache#14520) * fix. * remove type conversion. * remove type cast. * [numpy] Fix test_dynamic_shape.test_dynamic_shape (apache#14538) * Initial commit * Address comments from Jun * [numpy] Fix numpy import in python2 (apache#14537) * Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2 * fix concat and slice (apache#14549) * fix R-package (apache#14536) * Fix cpp package build after using new shape definition (apache#14554) * Fix pooling_v1 and deformable_convolution param initialization (apache#14577) * Fix pooling_v1 param initialization * Fix deformable_convolution param initialization * [Numpy] Misc fix (apache#14612) * [Numpy] Misc Fix * fix build * !shape_is_none => shape_is_known * Address comments * Fix * [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (apache#14557) * Fix test_operator_gpu.test_upsampling_bilinear_with_type * Address comments * [Numpy] Java/Scala modification (apache#14625) * modify jni to support 0 dim/shape * fix transpose axes default value * fix shape index bug (apache#14630) * fix jni lint (apache#14634) * [numpy] Fix numpy branch failing tests in CI (apache#14639) * Remove numpy namespaces for operator registration * Fix bug when shape is compeltely unknown * Fix singed/unsigned compare warning * Fix CI * Fix pylint * Avoid launching gpu kernels for zero-size output tensors * Fix test_ndarray * Fix binary broadcast with zero-size tensors * Better error message for infer shape failure in imperative * Fix TShape constructor ambiguity on certain platforms * Fix mkldnn build failure * Fix build failure in gpu and cpp test * Fix gpu cpp test build with mkldnn * Fix mkldnn cpp test * Fix concatenating zero-size tensors * Avoid letting mkldnn handle zero-size tensors in concat * Fix quantized_concat infer shape * Try to fix perl c api * fix invalid ndarray dispose (apache#14657) * swig fixes for the changes in c_api.h (apache#14655) * Rename np_comp to np_compat for readability * Fix import error * Keep old c apis unchanged * Fix lint * Rebase and fix build * Fix R build failure * Fix Perl build failure * Rebase with master * Address cr comments * Use just one scope to represent numpy compatibility * Add code comment to NumpyScope object in Scala * Add use_np_compat decorator * Fix pylint

* Implement the index_array operator * Add index_array operator tests * Add index_array operator GPU tests * Add the index_array operator to the Python docs autosummary * Add the author of the index_array operator to CONTRIBUTORS.md * Make index_array compatible with zero-dim and zero-size arrays Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with apache#14661. * Fix the index_array gradient checks in the unit tests In the previous implementation, the output gradient had an incorrect shape. This commit fixes the shapes and makes the tests more readable. * Add zero-dim and zero-size array tests for index_array * Use mxnet::Tuple<int> instead of TShape for the axes parameter * Fix incorrect array indexing in index_array Solves access violations when compiling with MSVC++ 14.0. * Avoid copying the input shape array in the index_array shape function * Add unknown shape handling to index_array * Use SHAPE_ASSIGN_CHECK to assign the shape in index_array * Remove the redundant index_array GPU tests from test_operator_gpu.py * Move the index_array tests into a single function (test_index_array) * Use @mx.use_np_compat instead of mx.np_compat in index_array op tests * Remove the use of template specialization for IndexArrayForward * Add the index_array operator to the AMP symbol list * Retrigger CI

leezu · 2019-09-03T12:23:40Z

src/operator/tensor/matrix_op-inl.h

  }
  ReverseReshapeInferShape(&dshape, oshape);
+#if 0


@reminisce is this #if 0 intentional? Will it be removed eventually?

reminisce requested review from anirudh2290, nswamy, sergeykolychev, szha and yzhliu as code owners April 10, 2019 07:39

reminisce added the Numpy label Apr 11, 2019

eric-haibin-lin reviewed Apr 11, 2019

View reviewed changes

reminisce force-pushed the final_merge_numpy_to_master branch 3 times, most recently from 5cf1517 to 1b6132d Compare April 13, 2019 21:02

szha reviewed Apr 14, 2019

View reviewed changes

lanking520 reviewed Apr 14, 2019

View reviewed changes

scala-package/core/src/main/scala/org/apache/mxnet/NumpyScope.scala Show resolved Hide resolved

scala-package/native/src/main/native/org_apache_mxnet_native_c_api.cc Show resolved Hide resolved

reminisce and others added 10 commits April 15, 2019 11:33

[numpy] Shape support scalar tensor (apache#14315)

708c933

* Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set

[Numpy] Change semantics of ndim for operators in `src/operator/contr…

a297a9a

…ib` (apache#14409) * Initial commit * Address comments

[WIP] Use new shape definition (apache#14453)

1179a59

* Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func

Fix a bug to pass the test in test_contrib_rnn (apache#14520)

098d189

* fix. * remove type conversion. * remove type cast.

[numpy] Fix test_dynamic_shape.test_dynamic_shape (apache#14538)

9c77e3e

* Initial commit * Address comments from Jun

[numpy] Fix numpy import in python2 (apache#14537)

ebc7d4d

* Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2

fix concat and slice (apache#14549)

c63555e

fix R-package (apache#14536)

48cf659

Fix cpp package build after using new shape definition (apache#14554)

b165b35

Fix pylint

bd7c4fb

szha approved these changes Apr 16, 2019

View reviewed changes

junrushao approved these changes Apr 16, 2019

View reviewed changes

TaoLv approved these changes Apr 16, 2019

View reviewed changes

szha merged commit 3f3ba92 into apache:master Apr 16, 2019

arcadiaphy mentioned this pull request Apr 20, 2019

fix min max on zero-sized ndarray #14745

Merged

7 tasks

This was referenced Apr 25, 2019

Model build error: Check failed: i >= 0 && i < ndim(): index = -2 must be in range [0, -1) #14751

Open

[Numpy][Keras-MXNet] Infer shape partial failed for unknown shapes #14833

Closed

kohr-h mentioned this pull request May 5, 2019

[MXNET-1206] Support NDArray indexing with None and Ellipsis #13143

Merged

9 tasks

reminisce mentioned this pull request May 20, 2019

Revert the change on broadcast_to param shape #14998

Merged

5 tasks

roywei mentioned this pull request May 25, 2019

Softmax operator shape inference awslabs/keras-apache-mxnet#233

Open

reminisce mentioned this pull request May 25, 2019

Restore save/load ndarray to 1.4.1 #15073

Merged

5 tasks

ziyuang mentioned this pull request Jun 28, 2019

An empty NDArray should have size 0 #14717

Closed

7 tasks

roywei mentioned this pull request Jul 5, 2019

Operator Performance Regression on CPU #15429

Open

6 tasks

samskalicky mentioned this pull request Aug 7, 2019

Simple_Bind failure in 1.5.0 #15784

Closed

leezu reviewed Sep 3, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[numpy] Support zero-dim and zero-size tensors in MXNet #14661

[numpy] Support zero-dim and zero-size tensors in MXNet #14661

reminisce commented Apr 10, 2019 •

edited

Loading

wkcn commented Apr 10, 2019 •

edited

Loading

eric-haibin-lin left a comment

reminisce commented Apr 13, 2019

szha commented Apr 14, 2019

yzhliu commented Apr 14, 2019

lanking520 left a comment

jeremiedb commented Apr 15, 2019

TaoLv commented Apr 15, 2019

hetong007 commented Apr 15, 2019 •

edited

Loading

anirudh2290 commented Apr 15, 2019

reminisce commented Apr 16, 2019

szha commented Apr 16, 2019

szha left a comment

anirudh2290 commented Apr 16, 2019

junrushao left a comment

reminisce commented Apr 16, 2019

TaoLv left a comment

leezu Sep 3, 2019

[numpy] Support zero-dim and zero-size tensors in MXNet #14661

[numpy] Support zero-dim and zero-size tensors in MXNet #14661

Conversation

reminisce commented Apr 10, 2019 • edited Loading

Description

FAQ

What are zero-dim and zero-size tensors?

Why are they important?

How are they implemented?

Is backward compatibility guaranteed in this PR?

How to enable zero-dim or zero-size tensors?

Does this mean that every existing operator should support zero-dim or zero-size tensors in this PR?

wkcn commented Apr 10, 2019 • edited Loading

eric-haibin-lin left a comment

Choose a reason for hiding this comment

reminisce commented Apr 13, 2019

szha commented Apr 14, 2019

yzhliu commented Apr 14, 2019

lanking520 left a comment

Choose a reason for hiding this comment

jeremiedb commented Apr 15, 2019

TaoLv commented Apr 15, 2019

hetong007 commented Apr 15, 2019 • edited Loading

anirudh2290 commented Apr 15, 2019

reminisce commented Apr 16, 2019

szha commented Apr 16, 2019

szha left a comment

Choose a reason for hiding this comment

anirudh2290 commented Apr 16, 2019

junrushao left a comment

Choose a reason for hiding this comment

reminisce commented Apr 16, 2019

TaoLv left a comment

Choose a reason for hiding this comment

leezu Sep 3, 2019

Choose a reason for hiding this comment

reminisce commented Apr 10, 2019 •

edited

Loading

wkcn commented Apr 10, 2019 •

edited

Loading

hetong007 commented Apr 15, 2019 •

edited

Loading