You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Operators such as dot and sparse_embedding generates row_sparse gradients, one can use SGD with momentum or adam as the optimizer. The problem with these optimizers is, only lazy update is supported: i.e. the states (momentum in SGD, m & v in adam) are only updated if their row indices appear in the gradient of the current batch. Whereas the standard optimizer updates all rows of the states.
Therefore, an user cannot use sparse gradient to perform standard update in MXNet right now, which makes it harder for people to adopt sparse operators with existing models because the update rule is different.
To support standard use case, we can add lazy_update params to optimizer and updater operators, which performs lazy update only if lazy_update=True, weight.stype=row_sparse and grad.stype=row_sparse. If lazy_update=False, or weight/grad is dense, standard update is applied.
The text was updated successfully, but these errors were encountered:
Per @mg0880gm's request:
Operators such as dot and sparse_embedding generates row_sparse gradients, one can use SGD with momentum or adam as the optimizer. The problem with these optimizers is, only lazy update is supported: i.e. the states (momentum in SGD, m & v in adam) are only updated if their row indices appear in the gradient of the current batch. Whereas the standard optimizer updates all rows of the states.
Therefore, an user cannot use sparse gradient to perform standard update in MXNet right now, which makes it harder for people to adopt sparse operators with existing models because the update rule is different.
To support standard use case, we can add
lazy_update
params to optimizer and updater operators, which performs lazy update only iflazy_update=True
,weight.stype=row_sparse
andgrad.stype=row_sparse
. Iflazy_update=False
, or weight/grad is dense, standard update is applied.The text was updated successfully, but these errors were encountered: