-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use one Optimizer per feedback layer #6
Conversation
@lebrice I have taken a pass and added a few comments. Once these comments are resolved, we'll have a bug-free parallel DTP implementation that Sean can use for hyperparam tuning. I'll update the legacy tests and push changes in this branch soon. |
I agree but I think we should keep it this way for now because it's just easier to test/validate + save models where you can just use existing codebases structure off the shelf. |
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
This reverts commit 871d590.
(cherry picked from commit 82d57258a1036bcf304feccdfde7fcb47f70d88e)
a9ce179
to
daf5072
Compare
Uses one optimizer per feedback layer in the PL implementation of DTP.
ParallelDTP uses a single optimizer for all feedback layers.
@amoudgl Let me know what you think.
Also: The ordering of the hyper-parameters for the feedback network (lr, iterations, noise scales, etc) in my implementation are a bit annoying to work with, now that I look at it.
x_r = self.G[i](self.F[i](x))
.self.G
wouldn't be usable end-to-end like a regular nn.Sequential, (i.e.self.G(self.F(x))
through the entire network wouldn't work)