-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Improve sparse pull performance for gluon trainer #11429
Conversation
@junrushao1994 pls help review engine code. Thanks! |
The engine code looks good to me |
update_on_kvstore = False | ||
if 'dist' in kvstore.type: | ||
# kv.pull(row_sparse_grad) is not supported for dist kvstore | ||
update_on_kvstore = self._contains_sparse_weight or self._contains_sparse_grad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the comment I'm guessing you meant not self._contains_sparse_weight and not self._contains_sparse_grad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended. kv.pull(row_sparse_grad) is not supported for dist kvstore, so we want to set update_on_kvstore = True if there's sparse grad.
how does the performance look? @eric-haibin-lin @leezu |
@szha For dense weight with sparse grad, this PR pulls sparse grad instead of dense weight. Reduces gpu2gpu copy time from 60ms to less than 1ms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
include/mxnet/c_api.h
Outdated
* \return 0 when success, -1 when failure happens | ||
*/ | ||
MXNET_DLL int MXKVStorePull(KVStoreHandle handle, | ||
mx_uint num, | ||
const int* keys, | ||
NDArrayHandle* vals, | ||
int priority); | ||
int priority, | ||
bool ignore_sparse = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C API doesn't support default value
include/mxnet/c_api.h
Outdated
* \return 0 when success, -1 when failure happens | ||
*/ | ||
MXNET_DLL int MXKVStorePullEx(KVStoreHandle handle, | ||
mx_uint num, | ||
const char** keys, | ||
NDArrayHandle* vals, | ||
int priority); | ||
int priority, | ||
bool ignore_sparse = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added extra CAPIs instead of adding default value to this one
include/mxnet/engine.h
Outdated
@@ -84,6 +84,8 @@ enum class FnProperty { | |||
kCopyToGPU, | |||
/*! \brief Prioritized sync operation on CPU */ | |||
kCPUPrioritized, | |||
/*! \brief Prioritized sync operation on GPU */ | |||
kGPUPrioritized, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add it at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the end.
raise RuntimeError("Cannot set update_on_kvstore to False when sparse " | ||
"gradients and/or sparse weights are present for " | ||
"Parameter '%s'."%param.name) | ||
raise RuntimeError("Cannot set update_on_kvstore to False when sparse weights " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be a error, or can it be a warning and automatically use update_on_kvstore ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't this be outside the if contains_sparse_weight condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default update_on_kvstore is None. It's only set if user provides a value on purpose. I think an explicit err is better, since we cannot satisfy user's original intent.
If user set update_on_kvstore to False and the model contains no sparse weight, it's totally fine. Why should this be outside the if condition?
For `RowSparseNDArray` values, this call is ignored, | ||
please use ``row_sparse_pull`` instead. | ||
pull with `RowSparseNDArray` is not supported for dist kvstore. | ||
Please use ``row_sparse_pull`` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ignore_sparse be defaulted to false to be consistent with previous behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous behavior is to always ignore sparse. So it's consistent
@piiswrong @rahul003 pls review again, thanks. |
* clip sparse grad. fix _reduce for rowsparse param * fix kvstore init for local kv * trigger * pull with ignore sparse * rsp pull with priority * add doc; * fix bug in sparse kvstore * +kvstore test * add dist kvstore test * enhance dist kv test * fix lint * fix lint * CR comments
Description
@leezu @rahul003
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments