add ADAM #3

DhairyaLGandhi · 2020-06-08T09:34:58Z

Also adds AbstractOptimiser which allows to keep a familiar API to Flux's while adding optimisers.

I would also think that we should have a better name for init. I would imagine state(opt) to be better suited function for it.

DhairyaLGandhi · 2020-06-08T10:36:07Z

init to my ear sounds like it should return an initialised instance of the optimiser, but since what we are initialising is the state, a rename would be awesome.

MikeInnes · 2020-06-11T13:56:26Z

src/rules.jl

+abstract type AbstractOptimiser end
+
+(opt::AbstractOptimiser)(x, x̂, state) = update(opt, x, x̂, state)
+(opt::AbstractOptimiser)(m, m̂) = update(opt, m, m̂, state(opt, m))[1]


This method seems a bit sketchy to me. It makes sense for Descent but for everything else seems like it risks misleading people (eg they think everything's working but they are actually using ADAM without state). So maybe it's better as a special case on Descent.

MikeInnes · 2020-06-16T13:22:01Z

Can we split out the ADAM part of this out and follow up with the AbstractOptimiser part?

Also just noticed that this uses IdDict – but the goal of this API should be to remove things like that, and instead be functional (ie accepting and returning a new state, rather than holding a state dictionary).

DhairyaLGandhi · 2020-06-17T10:38:59Z

Yes, removing the IdDict is important, would init here returning the initial state of the param make more sense?

DhairyaLGandhi · 2020-06-17T11:02:22Z

src/rules.jl

+const ϵ = 1e-8
+
+function (o::ADAM)(m, m̄)
+  op = update(o, m, m̄, state(o, m))[1]


This method is still sketchy, it returns a (Chain, NamedTuple), but I am guessing we only need the Chain

Still not sure I understand what this method is for. It makes sense for Dense because the state is redundant, but for ADAM it seems like it'll always be incorrect; is there a use case for it?

This was just me needing something to call opt(m, m\bar) with. If we just have a default value to opt(..., state = state(o, m)) we would need a way to initialise this for any arbitrary model.

MikeInnes · 2020-06-17T11:20:50Z

Yes, init needs to returns the initial state (whatever would otherwise be used to initialise the dictionary).

On the rename, I think the idea is that users ultimately would call state rather than init, but we just need to separate what gets overloaded from what gets called (so that the user-facing methods can walk over trees automatically). We could do that with a name like _state, of course; I'm open to suggestions.

MikeInnes · 2020-06-17T11:26:58Z

src/rules.jl

@@ -22,3 +22,30 @@ end
 function (o::Descent)(m, m̄, st)
  update(o, m, m̄, st)
 end
+
+mutable struct ADAM{T,K}


I think it would be ok for this to be immutable, since if you want to change the parameters mid-training you can just make a new ADAM (since you can preserve the optimiser state explicitly). We could even add convenience methods like opt = ADAM(opt, beta = ...) etc. Any thoughts?

I would be alright with this, but in the case where we do end up modifying the fields, we would also have to return the new optimiser. Not sure I understand what the first argument to ADAM here is. Is it just an initialised optimiser so we can return a new optimiser with the modified fields?

Right, exactly; the new opt would inherit all the fields of the old opt, except where explicitly overridden by keyword arguments. That makes it quite convenient to update one field of the optimiser without mutability.

Oh cool, I would also imagine that we would want to explicitly create the new optimiser at the end with whatever the values in the fields are to return to the user for when it is desired to resume training or store the state of the optimiser

DhairyaLGandhi · 2020-06-17T13:48:35Z

init would be overloaded while creating optimisers, which would make it user-facing API, so a clear name would be good, even if just as a stub somewhere for people to know/ document what it has to be overloaded with.

DhairyaLGandhi · 2020-11-27T20:22:09Z

Let's add this in for now, I think a clean and consistent (::opt)(m, dm, state) would be a bit of work still, but we can iterate and make this a priority

add ADAM

f86dabc

MikeInnes reviewed Jun 11, 2020

View reviewed changes

DhairyaLGandhi mentioned this pull request Jun 16, 2020

WIP: Simple Regression with ADAM FluxML/XLA.jl#5

Draft

rm abstract stuff

2104909

rm IdDict

e813cfd

DhairyaLGandhi commented Jun 17, 2020

View reviewed changes

MikeInnes reviewed Jun 17, 2020

View reviewed changes

immutable adam structure

bcebfac

Dhairya Gandhi added 3 commits June 27, 2020 10:43

rm in place updates

3231216

correct broadcasting

918aed7

use Float32 by default

f0780c1

DhairyaLGandhi mentioned this pull request Nov 27, 2020

Gradient Interface Design FluxML/Flux.jl#628

Open

DhairyaLGandhi merged commit 7c78a5b into master Nov 27, 2020

DhairyaLGandhi mentioned this pull request Jan 29, 2021

Move all optimizers to Optimisers.jl #9

Merged

2 tasks

ToucheSir deleted the dg/adam branch January 30, 2022 03:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ADAM #3

add ADAM #3

DhairyaLGandhi commented Jun 8, 2020

DhairyaLGandhi commented Jun 8, 2020

MikeInnes Jun 11, 2020

DhairyaLGandhi Jun 11, 2020

MikeInnes commented Jun 16, 2020

DhairyaLGandhi commented Jun 17, 2020

DhairyaLGandhi Jun 17, 2020

MikeInnes Jun 17, 2020

DhairyaLGandhi Jun 17, 2020

MikeInnes commented Jun 17, 2020

MikeInnes Jun 17, 2020

DhairyaLGandhi Jun 17, 2020

MikeInnes Jun 17, 2020

DhairyaLGandhi Jun 17, 2020

DhairyaLGandhi commented Jun 17, 2020

DhairyaLGandhi commented Nov 27, 2020

add ADAM #3

add ADAM #3

Conversation

DhairyaLGandhi commented Jun 8, 2020

DhairyaLGandhi commented Jun 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes commented Jun 16, 2020

DhairyaLGandhi commented Jun 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes commented Jun 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Jun 17, 2020

DhairyaLGandhi commented Nov 27, 2020