[MXNET-93] Sparse support for Custom Op #10374

anirudh2290 · 2018-04-02T21:26:13Z

Description

Adds sparse support for custom op. Registers InferStorageType and InferStorageTypeBackward interface for custom op. registers Forward and Backward with FStatefulComputeEx interface. Adds NDarray API to update chunk of a sparse ndarray from an existing ndarray.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

@piiswrong @eric-haibin-lin

…to refactor_for_customop

eric-haibin-lin

@piiswrong can you help check if the python API is reasonable?

eric-haibin-lin · 2018-04-02T22:33:56Z

example/numpy-ops/custom_sparse_sqr.py

+    '''Example of how to use custom op with sparse ndarrays
+    '''
+    def forward(self, is_train, req, in_data, out_data, aux):
+        #self.assign(out_data[0], req[0], mx.nd.sparse.square(in_data[0]))


Would it make more sense to have an example without using mx.nd.square? For example:

input = in_data[0] data = input.data output = sparse.csr_matrix((data*data, input.indptr, input.indices), shape=...) self.assign(out_data[0], req[0], output)

eric-haibin-lin · 2018-04-03T00:31:42Z

src/operator/operator_common.h

@@ -314,6 +314,32 @@ inline bool dispatch_mode_assign(DispatchMode *y, const DispatchMode& x) {
  }
 #endif

+/*! \brief allocate ndarrays from existing ndarrays


Not sure if this is the best place to put this function. Would moving this inside custom.h work? If we put it here it's very likely that somebody misuses the fucntion looking at the current doc

eric-haibin-lin · 2018-04-03T00:33:00Z

src/operator/custom/custom.cc

-void Forward(const OpStatePtr& state,
-             const OpContext& ctx,
-             const std::vector<TBlob>& inputs,
+void Forward(const OpStatePtr& state, const OpContext& ctx,


Maybe rename to ForwardEx to follow the convention for ComputeEx?

eric-haibin-lin · 2018-04-03T00:33:48Z

src/operator/custom/custom.cc

  const CustomParam& params = state.get_state<CustomParam>();
  std::vector<void*> ptrs;
  std::vector<int> tags;
  std::vector<NDArray> cpys;
+  std::unordered_set<int> input_tags({0, 4});


Need better documentation to explain what the magic numbers are for...

eric-haibin-lin · 2018-04-03T00:37:14Z

src/operator/custom/custom.cc

+
+  if (params.info->num_callbacks <= kCustomOpPropBackwardInferStorageType) {
+    for (size_t i = 0; i < iattr->size(); i++) {
+      STORAGE_TYPE_ASSIGN_CHECK(*iattr, i, kDefaultStorage);


what if one of the input/output is sparse??? Would the check fail? Shouldn't it only assign stype to the undefined ones?

(same comment for forward stype inference)

this is for backward compatibility with other frontends which dont support sparse for customops. will never go into if clause for python frontend.

So if a perl user creates a custom op with sparse ndarray (without custom infer storage function), would this break?

yes, because sparse is not supported for perl. will have to be added seperately, infer_storage_type and infer_storage_type_backward need to be registered.

eric-haibin-lin · 2018-04-03T00:41:07Z

src/operator/custom/custom-inl.h

-            bool training,
-            const std::vector<NDArray>& arrs) {
+  template <typename Func>
+  void Push(const Func& func, const OpContext& ctx, bool recording,


Add some doc explaining why we need to pass inputs/outputs array?

eric-haibin-lin · 2018-04-03T00:42:23Z

tests/python/unittest/test_operator.py

+    # test for backward compatibility, i.e. the correctness of default implementation of
+    # infer storage in custom operator
+    class Mult(mx.operator.CustomOp):
+            def forward(self, is_train, req, in_data, out_data, aux):


nit: 8 space indentation?

eric-haibin-lin · 2018-04-03T00:42:58Z

tests/python/unittest/test_operator.py

@@ -4059,6 +4059,79 @@ def create_operator(self, ctx, shapes, dtypes):
    with mx.contrib.autograd.train_section():
        y = mx.nd.Custom(x, aux, op_type='sqr')
        y.backward()
+    y.wait_to_read()
+    x.grad.wait_to_read()
+


Is the test case for sparse input not added here?

eric-haibin-lin · 2018-04-03T00:45:17Z

include/mxnet/ndarray.h

+    auto stype = arr.storage_type();
+    CHECK(stype == kCSRStorage || stype == kRowSparseStorage)
+        << "Only to be used with CSR and RSP storage types";
+    ptr_->shandle.dptr = arr.ptr_->shandle.dptr;


Would ptr_->shandle = arr.ptr_->shandle be sufficient?

this doesn't work because the Handle struct also stores a pointer to data. doing ptr_->shandle = arr.ptr_->shandle would make copies of dptr which point to same data. But then sparse updates the dptr at runtime and this wont reflect in the copied shandle.

eric-haibin-lin · 2018-04-03T00:46:06Z

include/mxnet/ndarray.h

@@ -507,6 +507,35 @@ class NDArray {
    ret.reuse_ = true;
    return ret;
  }
+
+  inline void SparseUpdateChunk(const NDArray &arr) const {


We definitely need to add doc for this fucntion to prevent others from misusage

anirudh2290 · 2018-04-04T00:31:18Z

Thank you for reviewing @eric-haibin-lin . I have addressed your comments.

eric-haibin-lin · 2018-04-04T00:42:11Z

example/numpy-ops/custom_sparse_sqr.py

+    def forward(self, is_train, req, in_data, out_data, aux):
+        inp = in_data[0]
+        if inp.stype == 'csr':
+            csr_m = inp * inp


Does this work? Did you mean in_data[0].data ?

used to fallback to dense. have fixed now.

eric-haibin-lin · 2018-04-04T00:50:06Z

src/operator/custom/custom.cc

@@ -45,6 +46,31 @@ struct CustomParam {
  std::shared_ptr<MXCallbackList> info;
 };

+/*! \brief allocate ndarrays from existing ndarrays
+ */
+inline void allocate_ndarray_copy(NDArray** nd,


use CamelCase ?

eric-haibin-lin · 2018-04-04T17:21:53Z

example/numpy-ops/custom_sparse_sqr.py

+        inp = in_data[0]
+        if inp.stype == 'csr':
+            csr_m = inp.data
+            csr_m = csr_m.reshape(inp.shape[0] * inp.shape[1])


why do you need to reshape?

eric-haibin-lin · 2018-04-04T17:31:26Z

example/numpy-ops/custom_sparse_sqr.py

+        self.assign(out_data[0], req[0], out)
+
+    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
+        self.assign(in_grad[0], req[0], 2 * in_data[0] * out_grad[0])


maybe use sparse.elemwise_mul(csr, csr) so that it doesn't fall back to dense?

eric-haibin-lin · 2018-04-04T17:33:32Z

tests/python/unittest/test_operator.py

+            else:
+                inp = in_data[0]
+                csr_m = inp.data * inp.data
+                csr_m = csr_m.reshape(inp.shape[0] * inp.shape[1])


inp.data should already be 1-D

…to refactor_for_customop

anirudh2290 · 2018-04-05T03:43:03Z

@piiswrong WDYT ?

piiswrong · 2018-04-05T18:17:44Z

include/mxnet/ndarray.h

+    ptr_->storage_shape = arr.ptr_->storage_shape;
+    ptr_->storage_type = arr.ptr_->storage_type;
+    ptr_->ctx = arr.ptr_->ctx;
+    ptr_->aux_handles = arr.ptr_->aux_handles;


This is causing memory leak. You should do swaps instead.

piiswrong · 2018-04-05T18:18:06Z

include/mxnet/ndarray.h

+   */
+  inline void SparseUpdateChunk(const NDArray &arr) const {
+    auto stype = arr.storage_type();
+    CHECK(stype == kCSRStorage || stype == kRowSparseStorage)


Check that shape and dtype are the same.

…to refactor_for_customop

piiswrong · 2018-04-06T22:55:47Z

tests/python/unittest/test_operator.py

+    rhs = mx.nd.array(np.random.uniform(-1, 1, size=(4, 10)))
+    lhs.attach_grad()
+    rhs.attach_grad()
+    with mx.contrib.autograd.train_section():


This should be record()

piiswrong · 2018-04-06T22:56:00Z

tests/python/unittest/test_operator.py

+            return MultNoGrad()
+
+        def infer_storage_type_backward(self, ograd_stype, in_stype, out_stype, igrad_stype, aux_stype):
+            return [], [], [], ['default'], []


why are the returned values all empty?

earlier my interface was such that it was okay to have empty lists and it would infer it as default. after talking to @eric-haibin-lin we decided to enforce users implementing infer_storage_type_backward interface to return lists with same size as input lists. also now, any undefined stypes will throw exception

piiswrong · 2018-04-06T22:56:26Z

python/mxnet/operator.py

+        aux_stype : list
+            list of inferred storage types for auxiliary states.
+        """
+        return list(ograd_stype), list(in_stype), list(out_stype), \


This default implementation didn't do anything

eric-haibin-lin · 2018-04-07T00:40:58Z

src/operator/custom/custom.cc

  std::vector<int> tags;
  std::vector<NDArray> cpys;

  ptrs.reserve(total);
  tags.reserve(total);
+  cpys.reserve(total);
+
+  std::unordered_set<int> input_tags({3, 0, 1, 4});


add some comment?

anirudh2290 · 2018-04-09T18:24:47Z

@piiswrong @eric-haibin-lin I have addressed your comments.

…to refactor_for_customop

eric-haibin-lin · 2018-04-10T05:10:58Z

python/mxnet/operator.py

+
+        Parameters
+        ----------
+        in_stype : list of stypes, Valid stypes are default, row_sparse and


Valid -> valid

eric-haibin-lin · 2018-04-10T05:17:49Z

python/mxnet/operator.py

+        Will raise an error if undefined storage type is returned.
+        Returned lists have to be the same size as the input lists to infer_storage_type_backward,
+        otherwise an exception will be thrown. When this interface is not implemented,
+        all stypes will fallback to default.


I will be careful about using the word "fallback" here since it has specific meaning for sparse ops.

I'm a little bit confused about the default behavior of forward stype inference vs. backward stype inference.
In forward you replicated the stype of in_stype to outputs. In backward you replicated the "default" stype to all outputs.

for the default implementation only default stypes are supported. that is why i replicated stype of in_stypes. I have added asserts now in infer_storage_type and infer_storage_type_backward to prevent misuse.

* Initial changes * Add custom op suppoort in backend * Add operator changes * Add refactor changes * Add 3p * Add custom ops support * Move out common code to a function * Fix changes * Fix custom op changes * Remove whitespace * Fix * Add fix * Remove test dependency * Add example for custom sparse sqr * Remove extra line * Add comments for InferStorageTypeBackward * Fix lint * Address review comments * Fix for shandle * Fix for shandle second * Fix naive engine bug * Fix * Remove reshape * Add swap logic for shandles * Add rtol atol * Fix op * Fix custom op * Fix pylint * Add assert * Fix lint * Add check for undefined for igrad stypes

anirudh2290 added 19 commits March 21, 2018 22:33

Initial changes

c4ee70d

Add custom op suppoort in backend

331a704

Add operator changes

8f46567

Add refactor changes

f84f035

Fix

ee7ff71

Add 3p

7a1d3e6

Add custom ops support

5eb5ce7

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

ef4b4ed

…to refactor_for_customop

Move out common code to a function

3f9b03d

Fix changes

a6e81a2

Fix custom op changes

2ee4c02

Remove whitespace

22a0159

Fix

f6fb37b

Add fix

40f734c

Remove test dependency

7142098

Add example for custom sparse sqr

15fdec3

Remove extra line

31f9d23

Add comments for InferStorageTypeBackward

08e28c1

Fix lint

7752210

anirudh2290 requested review from cjolivier01 and szha as code owners April 2, 2018 21:26

Resolve conflicts

07fbfbc

eric-haibin-lin self-assigned this Apr 2, 2018

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

11a4494

…to refactor_for_customop

eric-haibin-lin reviewed Apr 3, 2018

View reviewed changes

anirudh2290 added 3 commits April 3, 2018 18:24

Address review comments

66f9367

Fix for shandle

06958b3

Fix for shandle second

a1f80c2

anirudh2290 changed the title ~~Sparse support for Custom Op~~ [WIP] Sparse support for Custom Op Apr 3, 2018

Fix naive engine bug

e721897

anirudh2290 changed the title ~~[WIP] Sparse support for Custom Op~~ Sparse support for Custom Op Apr 4, 2018

eric-haibin-lin reviewed Apr 4, 2018

View reviewed changes

Fix

a8086cc

eric-haibin-lin reviewed Apr 4, 2018

View reviewed changes

anirudh2290 added 2 commits April 4, 2018 18:59

Remove reshape

f55860a

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

9124bb0

…to refactor_for_customop

piiswrong suggested changes Apr 5, 2018

View reviewed changes

anirudh2290 added 4 commits April 5, 2018 20:46

Add swap logic for shandles

f2f2d7c

Add rtol atol

0628e75

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

3008e1c

…to refactor_for_customop

Fix op

a88c48e

piiswrong reviewed Apr 6, 2018

View reviewed changes

eric-haibin-lin reviewed Apr 7, 2018

View reviewed changes

anirudh2290 added 2 commits April 7, 2018 06:39

Fix custom op

0844351

Fix pylint

6bc06b4

anirudh2290 changed the title ~~Sparse support for Custom Op~~ [MXNET-93] Sparse support for Custom Op Apr 8, 2018

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

f1e7bf1

…to refactor_for_customop

eric-haibin-lin reviewed Apr 10, 2018

View reviewed changes

anirudh2290 added 3 commits April 10, 2018 07:37

Add assert

3987f87

Fix lint

eaf3841

Add check for undefined for igrad stypes

4101773

piiswrong merged commit 656e352 into apache:master Apr 10, 2018

fhieber mentioned this pull request Apr 11, 2018

CustomOp error with latest master #10503

Closed

[MXNET-93] Sparse support for Custom Op #10374

[MXNET-93] Sparse support for Custom Op #10374

Conversation

anirudh2290 commented Apr 2, 2018

Description

Checklist

Essentials

Changes

Comments

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 commented Apr 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 commented Apr 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 commented Apr 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment