Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix #15

Closed
jlindsey15 opened this issue Jul 8, 2020 · 2 comments
Closed

Bug fix #15

jlindsey15 opened this issue Jul 8, 2020 · 2 comments

Comments

@jlindsey15
Copy link

Hi! I noticed the code was recently updated to (among other things) fix a bug in the meta-gradient computation. Could you explain what the bug was, and what would be the minimal change needed to fix it, starting from the earlier version of the code? Thanks a bunch!

@khurramjaved96
Copy link
Owner

Hi!

First step is to set create_graph=True in the torch.autograd.grad function. Without this flag, the second order gradients are never computed and the code uses the first-order approximation -- far from ideal.

However, setting create_graph = True in the old code will make the experiments prohibitively slower. This is because the old code was computing gradients of the RLN parameters in every inner step even though the gradient is not needed. To fix that, only compute inner loop gradients for the PLN network.

The two changes boil down to changing:

grad = torch.autograd.grad(loss, fast_weights)

to

torch.autograd.grad(loss, self.net.get_adaptation_parameters(fast_weights), create_graph=True)

where get_adaptation_parameters removes meta-parameters that are fixed in the inner loop.

Computing the correct gradients significantly decreases the time to convergence on the two benchmarks.

@jlindsey15
Copy link
Author

Thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants