Support standard optimizer with sparse gradient #9177

eric-haibin-lin · 2017-12-22T05:28:26Z

Per @mg0880gm's request:

Operators such as dot and sparse_embedding generates row_sparse gradients, one can use SGD with momentum or adam as the optimizer. The problem with these optimizers is, only lazy update is supported: i.e. the states (momentum in SGD, m & v in adam) are only updated if their row indices appear in the gradient of the current batch. Whereas the standard optimizer updates all rows of the states.

Therefore, an user cannot use sparse gradient to perform standard update in MXNet right now, which makes it harder for people to adopt sparse operators with existing models because the update rule is different.

To support standard use case, we can add lazy_update params to optimizer and updater operators, which performs lazy update only if lazy_update=True, weight.stype=row_sparse and grad.stype=row_sparse. If lazy_update=False, or weight/grad is dense, standard update is applied.

The text was updated successfully, but these errors were encountered:

eric-haibin-lin added API change Sparse and removed API change labels Dec 22, 2017

ZiyueHuang mentioned this issue Dec 23, 2017

standard update for sparse sgd_mom_update #9189

Merged

6 tasks

ZiyueHuang mentioned this issue Jan 17, 2018

standard adam update for sparse tensor #9468

Merged

6 tasks

eric-haibin-lin closed this as completed Jan 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support standard optimizer with sparse gradient #9177

Support standard optimizer with sparse gradient #9177

eric-haibin-lin commented Dec 22, 2017

Support standard optimizer with sparse gradient #9177

Support standard optimizer with sparse gradient #9177

Comments

eric-haibin-lin commented Dec 22, 2017