Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use one Optimizer per feedback layer #6

Merged
merged 14 commits into from
Jan 8, 2022
Merged

Conversation

lebrice
Copy link
Collaborator

@lebrice lebrice commented Jan 6, 2022

Uses one optimizer per feedback layer in the PL implementation of DTP.

ParallelDTP uses a single optimizer for all feedback layers.

@amoudgl Let me know what you think.

Also: The ordering of the hyper-parameters for the feedback network (lr, iterations, noise scales, etc) in my implementation are a bit annoying to work with, now that I look at it.

  • It would probably be a lot simpler to just have self.F and self.G (where x_r = self.G[i](self.F[i](x)).
  • The only real downside of doing it this way is just that self.G wouldn't be usable end-to-end like a regular nn.Sequential, (i.e. self.G(self.F(x)) through the entire network wouldn't work)

@lebrice lebrice requested a review from amoudgl January 6, 2022 00:38
@amoudgl
Copy link
Collaborator

amoudgl commented Jan 6, 2022

@lebrice I have taken a pass and added a few comments. Once these comments are resolved, we'll have a bug-free parallel DTP implementation that Sean can use for hyperparam tuning.

I'll update the legacy tests and push changes in this branch soon.

target_prop/models/dtp.py Show resolved Hide resolved
target_prop/models/dtp.py Outdated Show resolved Hide resolved
target_prop/models/parallel_dtp.py Outdated Show resolved Hide resolved
target_prop/models/parallel_dtp.py Outdated Show resolved Hide resolved
@amoudgl
Copy link
Collaborator

amoudgl commented Jan 6, 2022

Also: The ordering of the hyper-parameters for the feedback network (lr, iterations, noise scales, etc) in my implementation are a bit annoying to work with, now that I look at it.

I agree but I think we should keep it this way for now because it's just easier to test/validate + save models where you can just use existing codebases structure off the shelf.

target_prop/models/dtp.py Outdated Show resolved Hide resolved
@amoudgl
Copy link
Collaborator

amoudgl commented Jan 6, 2022

@lebrice I updated Maxence legacy implementation to use a separate feedback optimizer per layer in c566066 and updated tests accordingly -- all legacy unit tests now pass.

lebrice and others added 13 commits January 7, 2022 14:01
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
(cherry picked from commit 82d57258a1036bcf304feccdfde7fcb47f70d88e)
@amoudgl amoudgl force-pushed the optimizer_per_layer branch from a9ce179 to daf5072 Compare January 7, 2022 21:10
@lebrice lebrice merged commit 774d559 into master Jan 8, 2022
@lebrice lebrice deleted the optimizer_per_layer branch June 8, 2022 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants