Use one Optimizer per feedback layer #6

lebrice · 2022-01-06T00:38:04Z

Uses one optimizer per feedback layer in the PL implementation of DTP.

ParallelDTP uses a single optimizer for all feedback layers.

@amoudgl Let me know what you think.

Also: The ordering of the hyper-parameters for the feedback network (lr, iterations, noise scales, etc) in my implementation are a bit annoying to work with, now that I look at it.

It would probably be a lot simpler to just have self.F and self.G (where x_r = self.G[i](self.F[i](x)).
The only real downside of doing it this way is just that self.G wouldn't be usable end-to-end like a regular nn.Sequential, (i.e. self.G(self.F(x)) through the entire network wouldn't work)

amoudgl · 2022-01-06T22:28:35Z

@lebrice I have taken a pass and added a few comments. Once these comments are resolved, we'll have a bug-free parallel DTP implementation that Sean can use for hyperparam tuning.

I'll update the legacy tests and push changes in this branch soon.

target_prop/models/dtp.py

target_prop/models/parallel_dtp.py

amoudgl · 2022-01-06T22:39:12Z

Also: The ordering of the hyper-parameters for the feedback network (lr, iterations, noise scales, etc) in my implementation are a bit annoying to work with, now that I look at it.

I agree but I think we should keep it this way for now because it's just easier to test/validate + save models where you can just use existing codebases structure off the shelf.

target_prop/models/dtp.py

amoudgl · 2022-01-06T23:57:58Z

@lebrice I updated Maxence legacy implementation to use a separate feedback optimizer per layer in c566066 and updated tests accordingly -- all legacy unit tests now pass.

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

This reverts commit 871d590.

(cherry picked from commit 82d57258a1036bcf304feccdfde7fcb47f70d88e)

lebrice requested a review from amoudgl January 6, 2022 00:38

amoudgl approved these changes Jan 6, 2022

View reviewed changes

target_prop/models/dtp.py Show resolved Hide resolved

target_prop/models/dtp.py Outdated Show resolved Hide resolved

target_prop/models/parallel_dtp.py Outdated Show resolved Hide resolved

target_prop/models/parallel_dtp.py Outdated Show resolved Hide resolved

amoudgl reviewed Jan 6, 2022

View reviewed changes

target_prop/models/dtp.py Outdated Show resolved Hide resolved

lebrice and others added 13 commits January 7, 2022 14:01

Add individual optimizer for each feedback layer

70a9dc8

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Simplify OptimizerConfig, updating tests

5931fec

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Quickfix for compute_dist_angle function

58a7f5b

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Fix bugs in ParallelDTP, DTP ordering

5b50eb6

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Use single block for last reshape + Linear

5b630a8

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Fix pickling bug

4c480d3

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Use cached property for feedback_optimizers

0da8d5c

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>

Revert "Use cached property for feedback_optimizers"

93b395d

This reverts commit 871d590.

Attach feedback optimizer to each layer

8bb46f8

(cherry picked from commit 82d57258a1036bcf304feccdfde7fcb47f70d88e)

Update legacy tests to handle optimizer per layer

9bd7af3

Update LR scheduler hook

f0ecb97

Bug fix: do scheduler step every epoch in parallel DTP

10be327

Bug fix: use original input indices in parallel feedback loss

daf5072

amoudgl force-pushed the optimizer_per_layer branch from a9ce179 to daf5072 Compare January 7, 2022 21:10

Resolve conflict

d4954d1

lebrice merged commit 774d559 into master Jan 8, 2022

lebrice deleted the optimizer_per_layer branch June 8, 2022 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use one Optimizer per feedback layer #6

Use one Optimizer per feedback layer #6

lebrice commented Jan 6, 2022 •

edited

Loading

amoudgl commented Jan 6, 2022

amoudgl commented Jan 6, 2022

amoudgl commented Jan 6, 2022

Use one Optimizer per feedback layer #6

Use one Optimizer per feedback layer #6

Conversation

lebrice commented Jan 6, 2022 • edited Loading

amoudgl commented Jan 6, 2022

amoudgl commented Jan 6, 2022

amoudgl commented Jan 6, 2022

lebrice commented Jan 6, 2022 •

edited

Loading