Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ADAM #3

Merged
merged 7 commits into from
Nov 27, 2020
Merged

add ADAM #3

merged 7 commits into from
Nov 27, 2020

Conversation

DhairyaLGandhi
Copy link
Member

Also adds AbstractOptimiser which allows to keep a familiar API to Flux's while adding optimisers.

I would also think that we should have a better name for init. I would imagine state(opt) to be better suited function for it.

@DhairyaLGandhi
Copy link
Member Author

init to my ear sounds like it should return an initialised instance of the optimiser, but since what we are initialising is the state, a rename would be awesome.

src/rules.jl Outdated
abstract type AbstractOptimiser end

(opt::AbstractOptimiser)(x, x̂, state) = update(opt, x, x̂, state)
(opt::AbstractOptimiser)(m, m̂) = update(opt, m, m̂, state(opt, m))[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems a bit sketchy to me. It makes sense for Descent but for everything else seems like it risks misleading people (eg they think everything's working but they are actually using ADAM without state). So maybe it's better as a special case on Descent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@MikeInnes
Copy link
Member

Can we split out the ADAM part of this out and follow up with the AbstractOptimiser part?

Also just noticed that this uses IdDict – but the goal of this API should be to remove things like that, and instead be functional (ie accepting and returning a new state, rather than holding a state dictionary).

@DhairyaLGandhi
Copy link
Member Author

Yes, removing the IdDict is important, would init here returning the initial state of the param make more sense?

src/rules.jl Outdated
const ϵ = 1e-8

function (o::ADAM)(m, m̄)
op = update(o, m, m̄, state(o, m))[1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is still sketchy, it returns a (Chain, NamedTuple), but I am guessing we only need the Chain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure I understand what this method is for. It makes sense for Dense because the state is redundant, but for ADAM it seems like it'll always be incorrect; is there a use case for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just me needing something to call opt(m, m\bar) with. If we just have a default value to opt(..., state = state(o, m)) we would need a way to initialise this for any arbitrary model.

@MikeInnes
Copy link
Member

Yes, init needs to returns the initial state (whatever would otherwise be used to initialise the dictionary).

On the rename, I think the idea is that users ultimately would call state rather than init, but we just need to separate what gets overloaded from what gets called (so that the user-facing methods can walk over trees automatically). We could do that with a name like _state, of course; I'm open to suggestions.

src/rules.jl Outdated
@@ -22,3 +22,30 @@ end
function (o::Descent)(m, m̄, st)
update(o, m, m̄, st)
end

mutable struct ADAM{T,K}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be ok for this to be immutable, since if you want to change the parameters mid-training you can just make a new ADAM (since you can preserve the optimiser state explicitly). We could even add convenience methods like opt = ADAM(opt, beta = ...) etc. Any thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be alright with this, but in the case where we do end up modifying the fields, we would also have to return the new optimiser. Not sure I understand what the first argument to ADAM here is. Is it just an initialised optimiser so we can return a new optimiser with the modified fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, exactly; the new opt would inherit all the fields of the old opt, except where explicitly overridden by keyword arguments. That makes it quite convenient to update one field of the optimiser without mutability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, I would also imagine that we would want to explicitly create the new optimiser at the end with whatever the values in the fields are to return to the user for when it is desired to resume training or store the state of the optimiser

@DhairyaLGandhi
Copy link
Member Author

init would be overloaded while creating optimisers, which would make it user-facing API, so a clear name would be good, even if just as a stub somewhere for people to know/ document what it has to be overloaded with.

@DhairyaLGandhi
Copy link
Member Author

Let's add this in for now, I think a clean and consistent (::opt)(m, dm, state) would be a bit of work still, but we can iterate and make this a priority

@DhairyaLGandhi DhairyaLGandhi merged commit 7c78a5b into master Nov 27, 2020
@ToucheSir ToucheSir deleted the dg/adam branch January 30, 2022 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants