-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[numpy] Support zero-dim and zero-size tensors in MXNet #14661
Conversation
Great work! Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed all functions in *.py files, they look good to me
5cf1517
to
1b6132d
Compare
@hetong007 @yzhliu @sergeykolychev Could you please help review the changes in R, Scala, and Perl respectively? We would like to merge this PR sooner than later since our subsequent work depends on this and it's becoming harder to rebase. |
@jeremiedb @lanking520 could you help out on this for reviews too? Thanks! |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments on the changes for Scala part.
@hetong007 Were you seing a need for adding integrating a |
Reviewed changes for mkldnn files. @juliusshufan you might want to have a performance validation on this PR. |
@jeremiedb Would you elaborate more on this need? I think the current R API maps nd.array to native R array so this PR has minimum effect on R-related uses. |
* Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set
…ib` (apache#14409) * Initial commit * Address comments
* Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func
…che#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py
* fix. * remove type conversion. * remove type cast.
* Initial commit * Address comments from Jun
* Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2
Thanks for the decorator! I think customers may not understand the internal workings of custom op and the dispatch to a different thread workflow, and run into this pitfall. Can we synchronize the state of is_numpy_compat when pushing the op to custom op thread. Custom op is one area where customers use numpy operations and I feel there may be users running into this. |
@anirudh2290 That's a good point from users' perspective. But this may not be a trivial implementation by propagating the |
@anirudh2290 good points on custom op. Given the size of this PR and the fact that users need to opt-in to be affected by this change, I'd recommend we address these points in a follow-up PR, so that @reminisce won't have to keep full-time resolving conflicts :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python, C API, and operator changes LGTM
I am okay with making custom op changes later. just to be sure, the the is_numpy_compat thread local variable is set in the main thread, and is expected to be accessed only from main thread ? For example, this wont cause any issue the multi threaded inference interface provided by frontends like Scala ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@anirudh2290 Thanks for the review. In practice, I'm not familiar with how scala multi-threaded inference is implemented, but as long as the np_compat is set in the same thread as the one invoking shape inference functions, the result should be as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified performance and accuracy for several CNN models with MKL-DNN backend. They look good to me.
* [numpy] Shape support scalar tensor (apache#14315) * Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set * [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (apache#14409) * Initial commit * Address comments * [WIP] Use new shape definition (apache#14453) * Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func * [numpy] Fix unit tests after introducing numpy compatible shapes (apache#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py * Fix a bug to pass the test in test_contrib_rnn (apache#14520) * fix. * remove type conversion. * remove type cast. * [numpy] Fix test_dynamic_shape.test_dynamic_shape (apache#14538) * Initial commit * Address comments from Jun * [numpy] Fix numpy import in python2 (apache#14537) * Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2 * fix concat and slice (apache#14549) * fix R-package (apache#14536) * Fix cpp package build after using new shape definition (apache#14554) * Fix pooling_v1 and deformable_convolution param initialization (apache#14577) * Fix pooling_v1 param initialization * Fix deformable_convolution param initialization * [Numpy] Misc fix (apache#14612) * [Numpy] Misc Fix * fix build * !shape_is_none => shape_is_known * Address comments * Fix * [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (apache#14557) * Fix test_operator_gpu.test_upsampling_bilinear_with_type * Address comments * [Numpy] Java/Scala modification (apache#14625) * modify jni to support 0 dim/shape * fix transpose axes default value * fix shape index bug (apache#14630) * fix jni lint (apache#14634) * [numpy] Fix numpy branch failing tests in CI (apache#14639) * Remove numpy namespaces for operator registration * Fix bug when shape is compeltely unknown * Fix singed/unsigned compare warning * Fix CI * Fix pylint * Avoid launching gpu kernels for zero-size output tensors * Fix test_ndarray * Fix binary broadcast with zero-size tensors * Better error message for infer shape failure in imperative * Fix TShape constructor ambiguity on certain platforms * Fix mkldnn build failure * Fix build failure in gpu and cpp test * Fix gpu cpp test build with mkldnn * Fix mkldnn cpp test * Fix concatenating zero-size tensors * Avoid letting mkldnn handle zero-size tensors in concat * Fix quantized_concat infer shape * Try to fix perl c api * fix invalid ndarray dispose (apache#14657) * swig fixes for the changes in c_api.h (apache#14655) * Rename np_comp to np_compat for readability * Fix import error * Keep old c apis unchanged * Fix lint * Rebase and fix build * Fix R build failure * Fix Perl build failure * Rebase with master * Address cr comments * Use just one scope to represent numpy compatibility * Add code comment to NumpyScope object in Scala * Add use_np_compat decorator * Fix pylint
Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with apache#14661.
Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with apache#14661.
* Implement the index_array operator * Add index_array operator tests * Add index_array operator GPU tests * Add the index_array operator to the Python docs autosummary * Add the author of the index_array operator to CONTRIBUTORS.md * Make index_array compatible with zero-dim and zero-size arrays Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with #14661. * Fix the index_array gradient checks in the unit tests In the previous implementation, the output gradient had an incorrect shape. This commit fixes the shapes and makes the tests more readable. * Add zero-dim and zero-size array tests for index_array * Use mxnet::Tuple<int> instead of TShape for the axes parameter * Fix incorrect array indexing in index_array Solves access violations when compiling with MSVC++ 14.0. * Avoid copying the input shape array in the index_array shape function * Add unknown shape handling to index_array * Use SHAPE_ASSIGN_CHECK to assign the shape in index_array * Remove the redundant index_array GPU tests from test_operator_gpu.py * Move the index_array tests into a single function (test_index_array) * Use @mx.use_np_compat instead of mx.np_compat in index_array op tests * Remove the use of template specialization for IndexArrayForward * Add the index_array operator to the AMP symbol list * Retrigger CI
* [numpy] Shape support scalar tensor (apache#14315) * Support scalar and zero-size tensors with np.sum * Add sanity check when ndim is set * [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (apache#14409) * Initial commit * Address comments * [WIP] Use new shape definition (apache#14453) * Init checkin * Fix ndarray alloc bug * Use TShape(0) as default empty tuple params * Fix bugs * Fix TShape init value * Fix infer shape pass shape type and reshape infer shape func * [numpy] Fix unit tests after introducing numpy compatible shapes (apache#14487) * Fix infer shape rnn * Fix boolean mask and custom op unit tests * Fix multi proposal * Fix diag * Add global switch for backward compatibility and fix infer shape bugs * Fix slice op infer shape * Fix rnn infer shape * Add util funcs for ndim_is_known and dim_size_is_known * Revert rnn_cell.py * Fix a bug to pass the test in test_contrib_rnn (apache#14520) * fix. * remove type conversion. * remove type cast. * [numpy] Fix test_dynamic_shape.test_dynamic_shape (apache#14538) * Initial commit * Address comments from Jun * [numpy] Fix numpy import in python2 (apache#14537) * Fix several test failures * Fix subgraph op infer shape * Fix sparse slice * Fix deconv infer shape * Fix numpy import compatibility problem in python2 * fix concat and slice (apache#14549) * fix R-package (apache#14536) * Fix cpp package build after using new shape definition (apache#14554) * Fix pooling_v1 and deformable_convolution param initialization (apache#14577) * Fix pooling_v1 param initialization * Fix deformable_convolution param initialization * [Numpy] Misc fix (apache#14612) * [Numpy] Misc Fix * fix build * !shape_is_none => shape_is_known * Address comments * Fix * [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (apache#14557) * Fix test_operator_gpu.test_upsampling_bilinear_with_type * Address comments * [Numpy] Java/Scala modification (apache#14625) * modify jni to support 0 dim/shape * fix transpose axes default value * fix shape index bug (apache#14630) * fix jni lint (apache#14634) * [numpy] Fix numpy branch failing tests in CI (apache#14639) * Remove numpy namespaces for operator registration * Fix bug when shape is compeltely unknown * Fix singed/unsigned compare warning * Fix CI * Fix pylint * Avoid launching gpu kernels for zero-size output tensors * Fix test_ndarray * Fix binary broadcast with zero-size tensors * Better error message for infer shape failure in imperative * Fix TShape constructor ambiguity on certain platforms * Fix mkldnn build failure * Fix build failure in gpu and cpp test * Fix gpu cpp test build with mkldnn * Fix mkldnn cpp test * Fix concatenating zero-size tensors * Avoid letting mkldnn handle zero-size tensors in concat * Fix quantized_concat infer shape * Try to fix perl c api * fix invalid ndarray dispose (apache#14657) * swig fixes for the changes in c_api.h (apache#14655) * Rename np_comp to np_compat for readability * Fix import error * Keep old c apis unchanged * Fix lint * Rebase and fix build * Fix R build failure * Fix Perl build failure * Rebase with master * Address cr comments * Use just one scope to represent numpy compatibility * Add code comment to NumpyScope object in Scala * Add use_np_compat decorator * Fix pylint
* Implement the index_array operator * Add index_array operator tests * Add index_array operator GPU tests * Add the index_array operator to the Python docs autosummary * Add the author of the index_array operator to CONTRIBUTORS.md * Make index_array compatible with zero-dim and zero-size arrays Changes the implementation of index_array to be compatible with the recently merged support for zero-dim and zero-size arrays. Resolves the incompatibilities with apache#14661. * Fix the index_array gradient checks in the unit tests In the previous implementation, the output gradient had an incorrect shape. This commit fixes the shapes and makes the tests more readable. * Add zero-dim and zero-size array tests for index_array * Use mxnet::Tuple<int> instead of TShape for the axes parameter * Fix incorrect array indexing in index_array Solves access violations when compiling with MSVC++ 14.0. * Avoid copying the input shape array in the index_array shape function * Add unknown shape handling to index_array * Use SHAPE_ASSIGN_CHECK to assign the shape in index_array * Remove the redundant index_array GPU tests from test_operator_gpu.py * Move the index_array tests into a single function (test_index_array) * Use @mx.use_np_compat instead of mx.np_compat in index_array op tests * Remove the use of template specialization for IndexArrayForward * Add the index_array operator to the AMP symbol list * Retrigger CI
} | ||
ReverseReshapeInferShape(&dshape, oshape); | ||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reminisce is this #if 0
intentional? Will it be removed eventually?
Description
This PR provides the infrastructure of supporting zero-dim and zero-size tensors as the first outcome of the initiative of introducing NumPy compatible coding experience into MXNet (see this RFC). Great thanks to many folks who have contributed design, implementation and code review to this feature. The credits should go to all of them. Sorry if I missed anyone below. It would be impossible to make so many things correct scattered all over the codebase with just a couple of hands.
FAQ
What are zero-dim and zero-size tensors?
()
.0
. For example,np.ones((1, 0, 3))
would generate a zero-size tensor with shape equal to(1, 0, 3)
.Why are they important?
Mathematically speaking, their presence keeps the completeness and consistency of the all the tensor operations. For example, given
x = mx.nd.array([1, 2, 3])
,x[0]
should return1
, instead of[1]
, andx[0:0]
should return[]
with shape equal to(0,)
. Zero-size tensors can also be convenient in easing the code logic as placeholders for accumulations or aggregations.I find this thread provides very good insights on the importance of zero-dim tensors.
How are they implemented?
In the backend (C++), we use
ndim = -1
to represent unknown shapes anddim_size = -1
for unknown dim sizes. Before, they were represented by0
.Is backward compatibility guaranteed in this PR?
Yes, the backward compatibility is guaranteed by default. That means
0
still represents unknownndim
or dim size in any frontend language bindings. It's just converted to-1
in the infer shape logic of the backend.How to enable zero-dim or zero-size tensors?
Since we are committed to keep backward compatibility, we provided APIs for users to decide whether to opt in for this NumPy compatibility. Users can call
mx.set_np_compat(True)
to opt in andmx.set_np_compat(False)
to opt out. Or in a safer way, use thewith
statement. To turn on this:or to disable this:
Does this mean that every existing operator should support zero-dim or zero-size tensors in this PR?
Please note that the existing operators were implemented when these two types of tensors were not supported in MXNet. Some strong assumptions may have been made in their implementation and hence, lead to errors when NumPy compatibility is turned on. This PR only provides the infrastructure of supporting zero-dim and zero-size tensors in the backend. It does not guarantee that every existing operator would deal with zero-dim/size tensors correctly as in NumPy. As discussed in the RFC, we are going to implement NumPy operators under
mxnet.numpy
that would fully support these two types of tensors.