-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support all GalacticOptim-compatible optimizers #25
Changes from 28 commits
b3eb0b7
502fe96
e29835f
507b39f
dd3d694
a2ab6d0
e3763db
1ed4a41
28bb215
d2a6e59
e22b386
c5f244c
f88674b
8f6d1b5
9263b74
799ce9e
7d20bef
dbf6d38
9c9182a
290d43b
9d3a115
4ef2eb1
caa9f39
f9cad82
a712cd0
72b5c2d
3977049
93329da
91fc8b1
43f1f8e
872b87e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ jobs: | |
fail-fast: false | ||
matrix: | ||
version: | ||
- '1.6' | ||
- '1' | ||
os: | ||
- ubuntu-latest | ||
|
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
function build_optim_function(f; ad_backend=AD.ForwardDiffBackend()) | ||
∇f(x) = only(AD.gradient(ad_backend, f, x)) | ||
return build_optim_function(f, ∇f; ad_backend) | ||
end | ||
function build_optim_function(f, ∇f; ad_backend=AD.ForwardDiffBackend()) | ||
# because we need explicit access to grad, we generate these ourselves instead of using | ||
# GalacticOptim's auto-AD feature. | ||
# TODO: switch to caching API if available, see | ||
# https://github.com/JuliaDiff/AbstractDifferentiation.jl/issues/41 | ||
function grad(res, x, p...) | ||
∇fx = ∇f(x) | ||
@. res = -∇fx | ||
return res | ||
end | ||
function hess(res, x, p...) | ||
H = only(AD.hessian(ad_backend, f, x)) | ||
@. res = -H | ||
return res | ||
end | ||
function hv(res, x, v, p...) | ||
Hv = only(AD.lazy_hessian(ad_backend, f, x) * v) | ||
@. res = -Hv | ||
return res | ||
end | ||
return GalacticOptim.OptimizationFunction((x, p...) -> -f(x); grad, hess, hv) | ||
end | ||
|
||
function build_optim_problem(optim_fun, x₀; kwargs...) | ||
return GalacticOptim.OptimizationProblem(optim_fun, x₀, nothing; kwargs...) | ||
end | ||
|
||
function optimize_with_trace(prob, optimizer) | ||
u0 = prob.u0 | ||
fun = prob.f | ||
grad! = fun.grad | ||
function ∇f(x) | ||
∇fx = similar(x) | ||
grad!(∇fx, x, nothing) | ||
rmul!(∇fx, -1) | ||
return ∇fx | ||
end | ||
# caches for the trace of x, f(x), and ∇f(x) | ||
xs = typeof(u0)[] | ||
fxs = typeof(fun.f(u0, nothing))[] | ||
∇fxs = typeof(similar(u0))[] | ||
function callback(x, nfx, args...) | ||
# NOTE: GalacticOptim doesn't have an interface for accessing the gradient trace, | ||
# so we need to recompute it ourselves | ||
# see https://github.com/SciML/GalacticOptim.jl/issues/149 | ||
∇fx = ∇f(x) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. another reason to specialise the behaviour for Optim There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would you recommend specializing for Optim vs any other optimization package? Is it more widely used than other optimizations, or does it have more features? Or is it because we already depend on it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's more widely used and it's probably more honest to the original algorithm to use l-BFGS directly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After spending some time on this, special-casing Optim makes the interface quite a bit less clear, so a redesign would be necessary. Since I'm planning a design overhaul soon, I'll handle this in a separate PR. |
||
# terminate if optimization encounters NaNs | ||
(isnan(nfx) || any(isnan, x) || any(isnan, ∇fx)) && return true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is a good idea in general, some solvers can recover from NaN values There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, the important thing is that we don't use such steps for approximating the inverse Hessian, but that should be handled outside this function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That being said, since the current behavior is to terminate on |
||
# some backends mutate x, so we must copy it | ||
push!(xs, copy(x)) | ||
push!(fxs, -nfx) | ||
push!(∇fxs, ∇fx) | ||
return false | ||
end | ||
GalacticOptim.solve(prob, optimizer; cb=callback) | ||
return xs, fxs, ∇fxs | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using Optim, this can be suboptimal if the optimiser is asking for both the value and the gradient simultaneously which can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest specialising on Optim optimisers and using
value_and_gradient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GO might not be able to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not possible yet with GalacticOptim (see https://github.com/SciML/GalacticOptim.jl/issues/189). We can revisit when it's available there.