Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Support standard optimizer with sparse gradient #9177

Closed
eric-haibin-lin opened this issue Dec 22, 2017 · 0 comments
Closed

Support standard optimizer with sparse gradient #9177

eric-haibin-lin opened this issue Dec 22, 2017 · 0 comments
Labels

Comments

@eric-haibin-lin
Copy link
Member

Per @mg0880gm's request:

Operators such as dot and sparse_embedding generates row_sparse gradients, one can use SGD with momentum or adam as the optimizer. The problem with these optimizers is, only lazy update is supported: i.e. the states (momentum in SGD, m & v in adam) are only updated if their row indices appear in the gradient of the current batch. Whereas the standard optimizer updates all rows of the states.

Therefore, an user cannot use sparse gradient to perform standard update in MXNet right now, which makes it harder for people to adopt sparse operators with existing models because the update rule is different.

To support standard use case, we can add lazy_update params to optimizer and updater operators, which performs lazy update only if lazy_update=True, weight.stype=row_sparse and grad.stype=row_sparse. If lazy_update=False, or weight/grad is dense, standard update is applied.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant