SLS and parameter groups for larger datasets? #3

lessw2020 · 2019-12-15T20:08:48Z

I'm hitting an issue though in using/testing as the code seems to assume no parameter groups?
(from utils.py)
def get_grad_list(params):
return [p.grad for p in params]

this fails b/c p.grad is inside each param group ala

for group in self.param_groups:
for p in group["params"]:
if p.grad is None: <-- now you can access p.grad

Is there a way to adjust to handle parameter groups? I'm trying to integrate into FastAI which by default will create two param groups. I'll see if I can avoid that but I think param groups are quite common in most platforms so any tips here would be appreciated.

lessw2020 · 2019-12-15T22:48:03Z

so FastAI creates 2 param groups to split out l1 and l2 params....I've made a temp function to avoid that:

`def filter_all_params_no_split(layer_groups:Collection[nn.Module])->List[List[nn.Parameter]]:
pure = []
buffer=[]
for l in layer_groups:
for c in l.children():
buffer +=list(trainable_params(c))
pure += [uniqueify(buffer)]

return pure
`
though now hitting other issues inside SLS but I still think it's vital that SLS be able to handle param groups as that's the default for most optimizer code...

IssamLaradji · 2019-12-16T23:55:33Z

You are right that we should include param groups to be consistent with other optimizers. We will add that by the end of this week. Thanks for pointing this out!

lessw2020 · 2019-12-17T00:33:09Z

Hi @IssamLaradji
That's great to hear! I'm hoping to get it setup so your SLS is fully able to be integrated with FastAI2 and thus be readily available as an optimizer choice and help promote SLS. There is some tuning to be done as FastAI2 by default does not expose a closure, wants to call loss.backward, etc but hoping I can get that setup and integrated.
I'd also love to use SLS on two projects I'm consulting on so I having the multi-param group handling will definitely move that forward.
Thanks again and if you need any testing on the param implementation let me know as I have a modified FastAI version largely setup to work with SLS already so can run on ImageWoof/Nette etc for fast testing.

IssamLaradji · 2019-12-22T21:47:14Z

Thanks a lot.

I added param_groups, let me know how that works for you! thanks :)

lessw2020 · 2019-12-23T04:13:03Z

Excellent - testing it now!

lessw2020 · 2019-12-23T05:32:24Z

It's handling the param groups in the respect it doesnt' blow up like before.
However, it's not actually learning anything (loss ends up same as random..i.e. 10 classes = accuracy 10%).
I'm debugging some now....try_sgd_step is being called so it's passing back a step size, etc. but it doesn't seem the step size is ultimately changing...so not clear weights are actually being updated basically.

lessw2020 · 2019-12-23T05:41:22Z

Layer Groups Len 1
Len Split_params = 2
Opt results 1 Sls (
Parameter Group 0
beta_b: 0.9
beta_f: 2.0
bound_step_size: True
c: 0.1
eta_max: 10
gamma: 2.0
init_step_size: 1
line_search_fn: armijo
lr: 0
n_batches_per_epoch: 388
reset_option: 1

Parameter Group 1
beta_b: 0.9
beta_f: 2.0
bound_step_size: True
c: 0.1
eta_max: 10
gamma: 2.0
init_step_size: 1
line_search_fn: armijo
lr: 0
n_batches_per_epoch: 388
reset_option: 1
)
Opt results 2 OptimWrapper over Sls (
Parameter Group 0
beta_b: 0.9
beta_f: 2.0
bound_step_size: True
c: 0.1
eta_max: 10
gamma: 2.0
init_step_size: 1
line_search_fn: armijo
lr: 0
n_batches_per_epoch: 388
reset_option: 1

Parameter Group 1
beta_b: 0.9
beta_f: 2.0
bound_step_size: True
c: 0.1
eta_max: 10
gamma: 2.0
init_step_size: 1
line_search_fn: armijo
lr: 0
n_batches_per_epoch: 388
reset_option: 1
).
True weight decay: False

lessw2020 · 2019-12-23T05:45:12Z

I'll pickup on it again tomorrow and try to isolate it more. I can't tell exactly where it's not working at this point, but it's at least running now in FastAI with param groups vs couldn't get it runnning earlier :)

IssamLaradji · 2019-12-24T00:00:10Z

oh thanks for testing, could you pass me the script you used to reproduce the figure you generated with learn.fit? the issue is probably i am not registering step_sizefor every param_group.

lessw2020 · 2019-12-25T02:28:42Z

Hi @IssamLaradji - here's a relevant snippet but not sure how much that will help you. I had to make changes to three different FastAI files to get SLS to run as FastAI doesn't expect to have a closure, not call loss.backwards(), etc.
optar = partial(Sls,c= 0.1, n_batches_per_epoch = n_epochs) #,acceleration_method="polyak") model = mxresnet50(c_out=10, sa=1) learn = Learner(data, model, metrics=[accuracy],wd=None, #MixNet(input_size=256) #mxresnet50(c_out=10, sa=1), opt_func=optar, bn_wd=False, true_wd=False, loss_func = LabelSmoothingCrossEntropy()) learn.fit(2,4e-3)

If you have teamviewer maybe we can do a quick call on Thursday and I can walk you through the whole thing? (I'm in Seattle, WA PST).
Otherwise, I can debug more on Thursday and try to pin it down further. I may also simplify and set up in FastAI 1.9 which is nearly the FastAI 2.0 structure and run in a basic resnet to reduce the moving parts involved.

IssamLaradji · 2019-12-26T23:38:57Z

Hi @lessw2020, sorry I am out of town and will be back later this week. We can use teamviewer coming Monday if you like! On another note, does FastAI implement lbfgs? because lbfgs requires a closure to perform the line-search just like SLS.

lessw2020 · 2019-12-27T16:52:16Z

Hi @IssamLaradji - Monday works great. FastAI does not have lbfgs...I've had some discussions with Jeremy about how FastAI v2 can support optimizers like SLS, AliG, etc. that require passing in a loss or closure and hoping to use SLS to make the changes in the framework.
I'll try and send you a PM on Facebook with my direct contact info.

IssamLaradji · 2019-12-29T14:43:17Z

Thanks @lessw2020 , let's correspond there on Facebook :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLS and parameter groups for larger datasets? #3

SLS and parameter groups for larger datasets? #3

lessw2020 commented Dec 15, 2019

lessw2020 commented Dec 15, 2019

IssamLaradji commented Dec 16, 2019

lessw2020 commented Dec 17, 2019

IssamLaradji commented Dec 22, 2019

lessw2020 commented Dec 23, 2019

lessw2020 commented Dec 23, 2019

lessw2020 commented Dec 23, 2019

lessw2020 commented Dec 23, 2019

IssamLaradji commented Dec 24, 2019 •

edited

Loading

lessw2020 commented Dec 25, 2019

IssamLaradji commented Dec 26, 2019

lessw2020 commented Dec 27, 2019

IssamLaradji commented Dec 29, 2019 •

edited

Loading

SLS and parameter groups for larger datasets? #3

SLS and parameter groups for larger datasets? #3

Comments

lessw2020 commented Dec 15, 2019

lessw2020 commented Dec 15, 2019

IssamLaradji commented Dec 16, 2019

lessw2020 commented Dec 17, 2019

IssamLaradji commented Dec 22, 2019

lessw2020 commented Dec 23, 2019

lessw2020 commented Dec 23, 2019

lessw2020 commented Dec 23, 2019

lessw2020 commented Dec 23, 2019

IssamLaradji commented Dec 24, 2019 • edited Loading

lessw2020 commented Dec 25, 2019

IssamLaradji commented Dec 26, 2019

lessw2020 commented Dec 27, 2019

IssamLaradji commented Dec 29, 2019 • edited Loading

IssamLaradji commented Dec 24, 2019 •

edited

Loading

IssamLaradji commented Dec 29, 2019 •

edited

Loading