-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ADAM #3
add ADAM #3
Conversation
|
src/rules.jl
Outdated
abstract type AbstractOptimiser end | ||
|
||
(opt::AbstractOptimiser)(x, x̂, state) = update(opt, x, x̂, state) | ||
(opt::AbstractOptimiser)(m, m̂) = update(opt, m, m̂, state(opt, m))[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems a bit sketchy to me. It makes sense for Descent
but for everything else seems like it risks misleading people (eg they think everything's working but they are actually using ADAM without state). So maybe it's better as a special case on Descent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Can we split out the Also just noticed that this uses |
Yes, removing the IdDict is important, would |
src/rules.jl
Outdated
const ϵ = 1e-8 | ||
|
||
function (o::ADAM)(m, m̄) | ||
op = update(o, m, m̄, state(o, m))[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is still sketchy, it returns a (Chain, NamedTuple)
, but I am guessing we only need the Chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure I understand what this method is for. It makes sense for Dense
because the state is redundant, but for ADAM it seems like it'll always be incorrect; is there a use case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just me needing something to call opt(m, m\bar)
with. If we just have a default value to opt(..., state = state(o, m))
we would need a way to initialise this for any arbitrary model.
Yes, On the rename, I think the idea is that users ultimately would call |
src/rules.jl
Outdated
@@ -22,3 +22,30 @@ end | |||
function (o::Descent)(m, m̄, st) | |||
update(o, m, m̄, st) | |||
end | |||
|
|||
mutable struct ADAM{T,K} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be ok for this to be immutable, since if you want to change the parameters mid-training you can just make a new ADAM
(since you can preserve the optimiser state explicitly). We could even add convenience methods like opt = ADAM(opt, beta = ...)
etc. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be alright with this, but in the case where we do end up modifying the fields, we would also have to return the new optimiser. Not sure I understand what the first argument to ADAM here is. Is it just an initialised optimiser so we can return a new optimiser with the modified fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, exactly; the new opt
would inherit all the fields of the old opt
, except where explicitly overridden by keyword arguments. That makes it quite convenient to update one field of the optimiser without mutability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, I would also imagine that we would want to explicitly create the new optimiser at the end with whatever the values in the fields are to return to the user for when it is desired to resume training or store the state of the optimiser
|
Let's add this in for now, I think a clean and consistent |
Also adds
AbstractOptimiser
which allows to keep a familiar API to Flux's while adding optimisers.I would also think that we should have a better name for
init
. I would imaginestate(opt)
to be better suited function for it.