Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized constraints: post update hooks #1214

Open
albertz opened this issue Nov 14, 2022 · 0 comments
Open

Generalized constraints: post update hooks #1214

albertz opened this issue Nov 14, 2022 · 0 comments

Comments

@albertz
Copy link
Member

albertz commented Nov 14, 2022

Currently our implemented constraints are:

  • L2 on weights (L2 option on a layer)
  • Some exotic things on activations (darc1, spatial_smoothing)

We already have the possibility to decouple the constraints from the normal loss computation, via decouple_constraints. In #1206, this behavior will change a bit, and then it decouples only the data-independent constraints, i.e. namely only L2 currently.

L2 is equivalent to weight decay when SGD is used. With the new decoupled constraints code (#1206), it explicitly does:

                return var.assign_sub(var * (l2 * 2.), use_locking=self.use_locking, read_value=False)

We can generalize such updates, and allow the user to perform some generic post updates on parameters.

For example, in rwth-i6/returnn_common#241 it was suggested to extend L2 to have some decay_center. But instead of having such a L2-specific additional option, we can allow the user to perform any custom post updates, similar as the code above. Then the user could easily do such delay_center logic, but also many other things as well.

Also related: rwth-i6/returnn_common#90

How would the API look like on RETURNN side? It's maybe also ok to only do this for the VariableLayer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant