New optimisers interface #379

DhairyaLGandhi · 2018-08-28T12:35:39Z

Rebase of #283

MikeInnes · 2018-08-28T13:45:24Z

src/optimise/Optimise.jl

-# Base.convert(::Type{Param}, x::TrackedArray) = Param(x.data, x.grad)
-
-end
+end


Watch out for spurious whitespace changes here. Might be an editor settings issue.

MikeInnes · 2018-09-14T19:25:23Z

src/Flux.jl

-export SGD, ADAM, ADAMW, AdaMax, Momentum, Nesterov,
-       RMSProp, ADAGrad, ADADelta, AMSGrad, NADAM
+export Descent, ADAM, Momentum, Nesterov,
+       RMSProp, update!


I'd prefer not to export this; it's much too generic a name.

MikeInnes · 2018-09-14T19:31:02Z

src/tracker/array.jl

@@ -385,6 +385,10 @@ end

 using Requires

+Base.Broadcast.cat_nested(t::Base.Broadcast.Broadcasted, rest...) = (cat_nested(t.args...)..., cat_nested(rest...)...)


Can you explain what this is for? It seems oddly general, rather than doing anything Flux or TrackedArray-specific, to be here.

These were to help flatten a nested broadcast, taken from broadcast API. Will remove, them as they aren't necessary.

test/optimise.jl

MikeInnes · 2018-10-05T11:35:01Z

src/optimise/optimisers.jl

+DescentWeightDecay(η = 1, γ = 0) = DescentWeightDecay(η, γ)
+function update!(o::DescentWeightDecay, x,  Δ)
+  η, γ = o.eta, o.gamma
+  @. x = x - η * (Δ + γ * x)


Seems like something went wrong here; we shouldn't be modifying x in the optimiser since the delta will get applied later on. Also, can't this be Compose(WeightDecay(), Descent())?

MikeInnes · 2018-10-05T11:42:31Z

src/optimise/optimisers.jl

-    @. p.x = p.x - η * (p.Δ + γ * p.x) 
-    @. p.Δ = 0
-  end
+Descent(η = 0.1) = Descent(η)


Note that this method causes stack overflow.

MikeInnes · 2018-10-05T11:56:11Z

This really is nearly there now, thanks for the hard work @dhairyagandhi96. The one last thing we need is updates for the train! function so that it works with the new optimiser setup.

Also, would be great to get docstrings for the new decay optimisers, and tweak anything in the main docs that is out of date. We should document the optimiser setup in more detail, but that can wait for now.

MikeInnes · 2018-10-12T09:35:40Z

src/optimise/deprecations.jl

+    opt = opt
+  else
+    if opt isa ADAMW
+      opt = Optimiser(opt, DescentWeightDecay(1, decay))


Equivalent to WeightDecay(decay)?

MikeInnes · 2018-10-12T09:37:49Z

src/optimise/deprecations.jl

+# Train function
+function train!(loss::Function, data, opt; cb = () -> ())
+  depwarn("train!(loss, data, opt; cb) is deprecated; use train!(model, data, loss, opt; cb) instead", :train)
+  train!(opt.ps, loss, data, opt.opt; cb = cb)


Ideally this would be compatible with any opt, not just the updaterule defined above (and then the tests would not need to avoid ()->()).

MikeInnes · 2018-10-12T09:39:02Z

src/optimise/optimisers.jl

+  return Δ
+end
+
+# TODO: decay


I guess this is done now? :)

MikeInnes · 2018-10-12T09:43:56Z

src/optimise/optimisers.jl

+function update!(o::InvDecay, x, Δ)
+  γ, n = o.gamma, o.n
+  Δ .*= 1 / (1 + γ * n)
+  o.n += 1


Remember that the same optimiser object will be used in an update! with many different parameters. Right now n is counting the overall number of parameters in the model, not the number of times an individual parameter was seen. Any parameter-specific state should go in a dictionary.

MikeInnes · 2018-10-12T09:49:32Z

src/optimise/optimisers.jl

+
+WeightDecay(η = 1) = WeightDecay(η, 0)
+function update!(o::WeightDecay, x,  Δ)
+  η, wd = o.eta, o.wd


eta is not used here, I assume it can be removed

MikeInnes · 2018-10-12T09:49:44Z

src/optimise/optimisers.jl

+
+function update!(o::ExpDecay, x, Δ)
+  γ = o.gamma
+  @. Δ += γ * x


Right now this looks equivalent to weight decay

MikeInnes · 2018-10-12T09:50:59Z

src/optimise/train.jl

+function update!(opt, xs)
+  for x in xs
+    x, Δ = data(x), grad(x)
+    Δ = update!(opt, x, Δ)


You should pass x without unwrapping it here, since that will be used as the key in any dictionaries.

MikeInnes · 2018-10-12T09:52:33Z

src/optimise/train.jl

@@ -35,7 +45,7 @@ function stop()
 end

 """
-    train!(loss, data, opt)
+    train!(model, loss, data, opt)


I think we should have this be train!(loss, params, opt, data) and not try to call params for the user.

MikeInnes · 2018-10-12T09:52:52Z

src/optimise/train.jl

  cb = runall(cb)
  opt = runall(opt)
  @progress for d in data
    try
      l = loss(d...)
      @interrupts back!(l)
-      opt()
+      foreach(x -> x.data .-= update!(opt, x.data, x.grad), ps)


Wouldn't this be the time to use the update! you defined above?

MikeInnes · 2018-10-12T09:54:36Z

test/optimise.jl

+    for t = 1: 10^5
+      l = loss(rand(10))
+      back!(l)
+      delta = Optimise.update!(opt, w′.data, w′.grad)


also use the shorter update! method?

CarloLucibello · 2018-10-15T07:16:20Z

Can we also choose names more consistently?

Now we have,

export Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM,
ADAMW, InvDecay, ExpDecay, WeightDecay, DescentWeightDecay

while I would go for:

export Descent, Adam, Momentum, Nesterov, RmsProp,
AdaGrad, AdaMax, AdaDelta, AmsGrad,  Nadam,
Adamw, InvDecay, ExpDecay, WeightDecay, DescentWeightDecay

MikeInnes · 2018-10-15T10:55:55Z

It's a good idea, perhaps something we should do in a separate PR to avoid making this one more complex (we could also save the Descent change for that PR, but it doesn't really matter).

MikeInnes · 2018-10-31T16:39:07Z

This turned into a bit of a marathon but it's a great PR at this point. Thanks for all the great work @dhairyagandhi96!

initial sketch

a2d2d06

MikeInnes reviewed Aug 28, 2018

View reviewed changes

MikeInnes changed the title ~~Rebase branch newoptimizer to master https://github.com/FluxML/Flux.jl/pull/283~~ New optimisers interface Aug 28, 2018

MikeInnes force-pushed the master branch from 65799d1 to 193c4de Compare September 5, 2018 15:53

Merge branch 'master' of https://github.com/FluxML/Flux.jl

0b440f1

DhairyaLGandhi force-pushed the master branch from 89a24ac to 0b440f1 Compare September 11, 2018 11:20

Dhairya Gandhi added 3 commits September 11, 2018 18:30

pulled tracker from upstream

d933f20

fixed white lines

4860c1d

updated tests

63bc716

MikeInnes reviewed Sep 14, 2018

View reviewed changes

Dhairya Gandhi added 3 commits September 16, 2018 17:34

added remaining optimizers and tests

6665189

fixed Compose test

87c7e65

added deprecations and compose

b661db3

MikeInnes reviewed Oct 5, 2018

View reviewed changes

newline fixes

4abe518

MikeInnes reviewed Oct 5, 2018

View reviewed changes

tweaks

9bc9771

MikeInnes added 2 commits October 5, 2018 12:57

compose tweaks

0f2019e

compose tweaks

bfe85e6

MikeInnes force-pushed the master branch from 0f2019e to bfe85e6 Compare October 5, 2018 12:52

Dhairya Gandhi added 4 commits October 11, 2018 10:07

fixed weight decay definition

fe8c147

fix merge conflicts

d839429

fixed DescentWeightDecay parameters

1f0f2a5

fix train! test

edbcd3c

MikeInnes reviewed Oct 12, 2018

View reviewed changes

MikeInnes mentioned this pull request Oct 25, 2018

Match paper for Adam implementation and make epsilon use more consistent #448

Merged

decay fixes

815e8c2

Dhairya Gandhi and others added 8 commits October 27, 2018 19:39

use explicit update! rule

ea508a7

fixed ExpDecay test

32ce2d7

fixed ExpDecay update! rule

bebf4eb

tweaks

bffacee

correct SGD deprecation

4a54d30

return Params from params

554c4c7

tweak update rule

46049b9

require 1.0

b05cd41

MikeInnes merged commit 43c5f90 into FluxML:master Oct 31, 2018

MikeInnes mentioned this pull request Feb 22, 2019

[NewOptimizer] make optimisers not terrible #283

Closed

MikeInnes mentioned this pull request Mar 10, 2019

add trainstep! #666

Open

This was referenced Mar 26, 2019

How to use momentum and weightdecay #106

Closed

Adjusting optimizer learning rates on-the-fly #233

Closed

Explicitly calling the optimizers #228

Closed

bhvieira mentioned this pull request Apr 16, 2020

SGD exported but not defined #1121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New optimisers interface #379

New optimisers interface #379

DhairyaLGandhi commented Aug 28, 2018 •

edited by MikeInnes

Loading

MikeInnes Aug 28, 2018

MikeInnes Sep 14, 2018

DhairyaLGandhi Sep 16, 2018

MikeInnes Sep 14, 2018

DhairyaLGandhi Sep 16, 2018

MikeInnes Oct 5, 2018

MikeInnes Oct 5, 2018

MikeInnes commented Oct 5, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

MikeInnes Oct 12, 2018

CarloLucibello commented Oct 15, 2018

MikeInnes commented Oct 15, 2018

MikeInnes commented Oct 31, 2018

		@@ -385,6 +385,10 @@ end

		using Requires

		Base.Broadcast.cat_nested(t::Base.Broadcast.Broadcasted, rest...) = (cat_nested(t.args...)..., cat_nested(rest...)...)

New optimisers interface #379

New optimisers interface #379

Conversation

DhairyaLGandhi commented Aug 28, 2018 • edited by MikeInnes Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes commented Oct 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Oct 15, 2018

MikeInnes commented Oct 15, 2018

MikeInnes commented Oct 31, 2018

DhairyaLGandhi commented Aug 28, 2018 •

edited by MikeInnes

Loading