Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix the bug of MXEnginePushAsyncND and MXEnginePushSyncND #15751

Merged
merged 11 commits into from
Aug 8, 2019

Conversation

wkcn
Copy link
Member

@wkcn wkcn commented Aug 5, 2019

Description

Sorry that I wrote a bug in the two APIs MXEnginePushAsyncND and MXEnginePushSyncND, whose argument should be an array pointer of NDArrayHandle.

Issue: #15774
master branch: #15751
v1.5.x branch: #15775

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Modify the type of arguments to 'NDArry**'
  • Update the unittest

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@wkcn wkcn mentioned this pull request Aug 5, 2019
EngineFnPropertyHandle prop_handle, int priority,
const char* opr_name, bool wait) {
API_BEGIN();
NDArray** const_nds = reinterpret_cast<NDArray**>(const_nds_handle);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reinterpret_cast is necessary for the cast from void** to NDArray**.

@sxjscience
Copy link
Member

Why did the code pass the previous tests?

@wkcn
Copy link
Member Author

wkcn commented Aug 5, 2019

@sxjscience The previous test is also wrong. In the previous test, I passed the memory address of a NDArray into these two APIs. However, the correct method is to pass the memory address of an array of NDarray pointers.

@wkcn
Copy link
Member Author

wkcn commented Aug 6, 2019

@sxjscience @anirudh2290 @eric-haibin-lin
Hi, could you please help take a review? Thank you!

@KellenSunderland
Copy link
Contributor

Should we backport this to the v1.5.x branch as well?

@wkcn
Copy link
Member Author

wkcn commented Aug 6, 2019

@KellenSunderland Yes.

@@ -258,47 +258,50 @@ TEST(Engine, PushFunc) {
TEST(Engine, PushFuncND) {
auto ctx = mxnet::Context{};
mxnet::NDArray nd(ctx);
std::vector<mxnet::NDArray*> nds;
nds.push_back(&nd);
void** pnds = reinterpret_cast<void**>(nds.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want some tests for cases with more than 1 NDArray present in the argument call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KellenSunderland, I have added the testcase with more than 1 NDArray : )

@KellenSunderland
Copy link
Contributor

LGTM. After considering the API a little I think I wouldn't' consider this a breaking change. Consumers of this package would have gotten errors when trying to pass lists of NDArrays to the function. It's clear that this was the intent of the API. Instead I'd consider this a patch/bugfix.

@sxjscience
Copy link
Member

sxjscience commented Aug 7, 2019

I think it looks good. But where is this API used? Is it used in some other packages?

Ignore this comment.
After reviewing the history I learned that it is will be used in MobulaOP

LOG(INFO) << "===== Test #8: PushSyncND invalid number of mutable nds =====";
res = MXEnginePushSyncND(FooSyncFunc, nullptr, nullptr, &ctx, nullptr, 0, &nd, -1);
EXPECT_EQ(res, -1);
std::vector<mxnet::NDArray*> nds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I'm thinking if it's more appropriate to use

std::vector<mxnet::NDArray> nds;

Do we have to use std::vector<mxnet::NDArray*>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to use std::vector<mxnet::NDArray*>, because the type of the argument of the two APIs is an array pointer of NDArray*. Besides, it avoids the potential copy of NDArray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an argument Context in the constructor of NDArray. I do not know how to use std::vector<mxnet::NDArray>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to use std::vector<NDArray*> if we keep the interface to be NDArrayHandle *. However, I'm thinking whether we could directly use the std::vector<mxnet::NDArray> vec; and use nds.emplace_back(mxnet::NDArray(ctx)) or nds.push_back(std::move(temp_arr)) to fill the vector. In that case, the existing API will not be changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good, but I am worry that the type of arguments (const_vars_handle and mutable_vars_handle) of the two APIs MXEnginePushAsync and MXEnginePushSync is EngineVarHandle, namely void*. It casts void* to VarHandle*, namely Var** in https://github.com/apache/incubator-mxnet/blob/master/src/c_api/c_api.cc#L1475. Therefore, I don't know how to decide the type of const_nds_handle and mutable_nds_handle in MXEnginePushAsyncND and MXEnginePushSyncND.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the consistency, const_var_handle and mutable_var_handle is the pointer of an array of VarHandle, and const_nds_handle and mutable_var_handle is the pointer of an array of NDArrayHandle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now understand the logic here. To make the API consistent, I think we should also change the interface of MXEnginePushAsync and MXEnginePushSync. We should be safe to replace EngineVarHandle with VarHandle*. Am I right here ? @apeforest @yuxihu

Copy link
Member

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The question left is whether to change the interface of MXEnginePushAsync and MXEnginePushSync to make the API consistent.

@wkcn wkcn added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Aug 7, 2019
@wkcn wkcn merged commit 79d8d86 into apache:master Aug 8, 2019
wkcn added a commit to wkcn/incubator-mxnet that referenced this pull request Aug 8, 2019
…#15751)

* fix push sync nd api

* align code

* update test for syncnd

* fix bug in tests/cpp/engine/threaded_engine_test

* add more testcases for MXEnginePushSyncND and MXEnginePushAsyncND

* fix test

* fix

* fix

* lint

* ci

* retrigger CI
wkcn added a commit that referenced this pull request Aug 8, 2019
#15792)

* fix push sync nd api

* align code

* update test for syncnd

* fix bug in tests/cpp/engine/threaded_engine_test

* add more testcases for MXEnginePushSyncND and MXEnginePushAsyncND

* fix test

* fix

* fix

* lint

* ci

* retrigger CI
anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
…#15751)

* fix push sync nd api

* align code

* update test for syncnd

* fix bug in tests/cpp/engine/threaded_engine_test

* add more testcases for MXEnginePushSyncND and MXEnginePushAsyncND

* fix test

* fix

* fix

* lint

* ci

* retrigger CI
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API change Bug pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants