Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-93] Sparse support for Custom Op #10374

Merged
merged 38 commits into from
Apr 10, 2018

Conversation

anirudh2290
Copy link
Member

Description

Adds sparse support for custom op. Registers InferStorageType and InferStorageTypeBackward interface for custom op. registers Forward and Backward with FStatefulComputeEx interface. Adds NDarray API to update chunk of a sparse ndarray from an existing ndarray.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@piiswrong @eric-haibin-lin

@eric-haibin-lin eric-haibin-lin self-assigned this Apr 2, 2018
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piiswrong can you help check if the python API is reasonable?

'''Example of how to use custom op with sparse ndarrays
'''
def forward(self, is_train, req, in_data, out_data, aux):
#self.assign(out_data[0], req[0], mx.nd.sparse.square(in_data[0]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to have an example without using mx.nd.square? For example:

input = in_data[0]
data = input.data
output = sparse.csr_matrix((data*data, input.indptr, input.indices), shape=...)
self.assign(out_data[0], req[0], output)

@@ -314,6 +314,32 @@ inline bool dispatch_mode_assign(DispatchMode *y, const DispatchMode& x) {
}
#endif

/*! \brief allocate ndarrays from existing ndarrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best place to put this function. Would moving this inside custom.h work? If we put it here it's very likely that somebody misuses the fucntion looking at the current doc

void Forward(const OpStatePtr& state,
const OpContext& ctx,
const std::vector<TBlob>& inputs,
void Forward(const OpStatePtr& state, const OpContext& ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to ForwardEx to follow the convention for ComputeEx?

const CustomParam& params = state.get_state<CustomParam>();
std::vector<void*> ptrs;
std::vector<int> tags;
std::vector<NDArray> cpys;
std::unordered_set<int> input_tags({0, 4});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need better documentation to explain what the magic numbers are for...


if (params.info->num_callbacks <= kCustomOpPropBackwardInferStorageType) {
for (size_t i = 0; i < iattr->size(); i++) {
STORAGE_TYPE_ASSIGN_CHECK(*iattr, i, kDefaultStorage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if one of the input/output is sparse??? Would the check fail? Shouldn't it only assign stype to the undefined ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment for forward stype inference)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for backward compatibility with other frontends which dont support sparse for customops. will never go into if clause for python frontend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a perl user creates a custom op with sparse ndarray (without custom infer storage function), would this break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because sparse is not supported for perl. will have to be added seperately, infer_storage_type and infer_storage_type_backward need to be registered.

bool training,
const std::vector<NDArray>& arrs) {
template <typename Func>
void Push(const Func& func, const OpContext& ctx, bool recording,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some doc explaining why we need to pass inputs/outputs array?

# test for backward compatibility, i.e. the correctness of default implementation of
# infer storage in custom operator
class Mult(mx.operator.CustomOp):
def forward(self, is_train, req, in_data, out_data, aux):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 8 space indentation?

@@ -4059,6 +4059,79 @@ def create_operator(self, ctx, shapes, dtypes):
with mx.contrib.autograd.train_section():
y = mx.nd.Custom(x, aux, op_type='sqr')
y.backward()
y.wait_to_read()
x.grad.wait_to_read()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test case for sparse input not added here?

auto stype = arr.storage_type();
CHECK(stype == kCSRStorage || stype == kRowSparseStorage)
<< "Only to be used with CSR and RSP storage types";
ptr_->shandle.dptr = arr.ptr_->shandle.dptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ptr_->shandle = arr.ptr_->shandle be sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work because the Handle struct also stores a pointer to data. doing ptr_->shandle = arr.ptr_->shandle would make copies of dptr which point to same data. But then sparse updates the dptr at runtime and this wont reflect in the copied shandle.

@@ -507,6 +507,35 @@ class NDArray {
ret.reuse_ = true;
return ret;
}

inline void SparseUpdateChunk(const NDArray &arr) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to add doc for this fucntion to prevent others from misusage

@anirudh2290 anirudh2290 changed the title Sparse support for Custom Op [WIP] Sparse support for Custom Op Apr 3, 2018
@anirudh2290 anirudh2290 changed the title [WIP] Sparse support for Custom Op Sparse support for Custom Op Apr 4, 2018
@anirudh2290
Copy link
Member Author

Thank you for reviewing @eric-haibin-lin . I have addressed your comments.

def forward(self, is_train, req, in_data, out_data, aux):
inp = in_data[0]
if inp.stype == 'csr':
csr_m = inp * inp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? Did you mean in_data[0].data ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used to fallback to dense. have fixed now.

@@ -45,6 +46,31 @@ struct CustomParam {
std::shared_ptr<MXCallbackList> info;
};

/*! \brief allocate ndarrays from existing ndarrays
*/
inline void allocate_ndarray_copy(NDArray** nd,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use CamelCase ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

inp = in_data[0]
if inp.stype == 'csr':
csr_m = inp.data
csr_m = csr_m.reshape(inp.shape[0] * inp.shape[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to reshape?

self.assign(out_data[0], req[0], out)

def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
self.assign(in_grad[0], req[0], 2 * in_data[0] * out_grad[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use sparse.elemwise_mul(csr, csr) so that it doesn't fall back to dense?

else:
inp = in_data[0]
csr_m = inp.data * inp.data
csr_m = csr_m.reshape(inp.shape[0] * inp.shape[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inp.data should already be 1-D

@anirudh2290
Copy link
Member Author

@piiswrong WDYT ?

ptr_->storage_shape = arr.ptr_->storage_shape;
ptr_->storage_type = arr.ptr_->storage_type;
ptr_->ctx = arr.ptr_->ctx;
ptr_->aux_handles = arr.ptr_->aux_handles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing memory leak. You should do swaps instead.

*/
inline void SparseUpdateChunk(const NDArray &arr) const {
auto stype = arr.storage_type();
CHECK(stype == kCSRStorage || stype == kRowSparseStorage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that shape and dtype are the same.

rhs = mx.nd.array(np.random.uniform(-1, 1, size=(4, 10)))
lhs.attach_grad()
rhs.attach_grad()
with mx.contrib.autograd.train_section():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be record()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

return MultNoGrad()

def infer_storage_type_backward(self, ograd_stype, in_stype, out_stype, igrad_stype, aux_stype):
return [], [], [], ['default'], []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the returned values all empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

earlier my interface was such that it was okay to have empty lists and it would infer it as default. after talking to @eric-haibin-lin we decided to enforce users implementing infer_storage_type_backward interface to return lists with same size as input lists. also now, any undefined stypes will throw exception

aux_stype : list
list of inferred storage types for auxiliary states.
"""
return list(ograd_stype), list(in_stype), list(out_stype), \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default implementation didn't do anything

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

std::vector<int> tags;
std::vector<NDArray> cpys;

ptrs.reserve(total);
tags.reserve(total);
cpys.reserve(total);

std::unordered_set<int> input_tags({3, 0, 1, 4});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@anirudh2290 anirudh2290 changed the title Sparse support for Custom Op [MXNET-93] Sparse support for Custom Op Apr 8, 2018
@anirudh2290
Copy link
Member Author

@piiswrong @eric-haibin-lin I have addressed your comments.


Parameters
----------
in_stype : list of stypes, Valid stypes are default, row_sparse and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid -> valid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Will raise an error if undefined storage type is returned.
Returned lists have to be the same size as the input lists to infer_storage_type_backward,
otherwise an exception will be thrown. When this interface is not implemented,
all stypes will fallback to default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be careful about using the word "fallback" here since it has specific meaning for sparse ops.

I'm a little bit confused about the default behavior of forward stype inference vs. backward stype inference.
In forward you replicated the stype of in_stype to outputs. In backward you replicated the "default" stype to all outputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the default implementation only default stypes are supported. that is why i replicated stype of in_stypes. I have added asserts now in infer_storage_type and infer_storage_type_backward to prevent misuse.

@piiswrong piiswrong merged commit 656e352 into apache:master Apr 10, 2018
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* Initial changes

* Add custom op suppoort in backend

* Add operator changes

* Add refactor changes

* Add 3p

* Add custom ops support

* Move out common code to a function

* Fix changes

* Fix custom op changes

* Remove whitespace

* Fix

* Add fix

* Remove test dependency

* Add example for custom sparse sqr

* Remove extra line

* Add comments for InferStorageTypeBackward

* Fix lint

* Address review comments

* Fix for shandle

* Fix for shandle second

* Fix naive engine bug

* Fix

* Remove reshape

* Add swap logic for shandles

* Add rtol atol

* Fix op

* Fix custom op

* Fix pylint

* Add assert

* Fix lint

* Add check for undefined for igrad stypes
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* Initial changes

* Add custom op suppoort in backend

* Add operator changes

* Add refactor changes

* Add 3p

* Add custom ops support

* Move out common code to a function

* Fix changes

* Fix custom op changes

* Remove whitespace

* Fix

* Add fix

* Remove test dependency

* Add example for custom sparse sqr

* Remove extra line

* Add comments for InferStorageTypeBackward

* Fix lint

* Address review comments

* Fix for shandle

* Fix for shandle second

* Fix naive engine bug

* Fix

* Remove reshape

* Add swap logic for shandles

* Add rtol atol

* Fix op

* Fix custom op

* Fix pylint

* Add assert

* Fix lint

* Add check for undefined for igrad stypes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants