-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support all GalacticOptim-compatible optimizers #25
Conversation
Codecov Report
@@ Coverage Diff @@
## main SciML/Optimization.jl#25 +/- ##
==========================================
+ Coverage 98.66% 98.89% +0.23%
==========================================
Files 8 8
Lines 224 272 +48
==========================================
+ Hits 221 269 +48
Misses 3 3
Continue to review full report at Codecov.
|
@mohamed82008 this is the non-breaking part of the changes we discussed. Would you like to review? |
@mohamed82008 if you're not able to review this in the next day, no worries, I'll just merge, and we can always refine in a future PR if necessary. |
@@ -0,0 +1,61 @@ | |||
function build_optim_function(f; ad_backend=AD.ForwardDiffBackend()) | |||
∇f(x) = only(AD.gradient(ad_backend, f, x)) | |||
return build_optim_function(f, ∇f; ad_backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using Optim, this can be suboptimal if the optimiser is asking for both the value and the gradient simultaneously which can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest specialising on Optim optimisers and using value_and_gradient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GO might not be able to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not possible yet with GalacticOptim (see https://github.com/SciML/GalacticOptim.jl/issues/189). We can revisit when it's available there.
# see https://github.com/SciML/GalacticOptim.jl/issues/149 | ||
∇fx = ∇f(x) | ||
# terminate if optimization encounters NaNs | ||
(isnan(nfx) || any(isnan, x) || any(isnan, ∇fx)) && return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea in general, some solvers can recover from NaN values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the important thing is that we don't use such steps for approximating the inverse Hessian, but that should be handled outside this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, since the current behavior is to terminate on NaN
, this should be handled in a separate PR.
# NOTE: GalacticOptim doesn't have an interface for accessing the gradient trace, | ||
# so we need to recompute it ourselves | ||
# see https://github.com/SciML/GalacticOptim.jl/issues/149 | ||
∇fx = ∇f(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another reason to specialise the behaviour for Optim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you recommend specializing for Optim vs any other optimization package? Is it more widely used than other optimizations, or does it have more features? Or is it because we already depend on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more widely used and it's probably more honest to the original algorithm to use l-BFGS directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending some time on this, special-casing Optim makes the interface quite a bit less clear, so a redesign would be necessary. Since I'm planning a design overhaul soon, I'll handle this in a separate PR.
This PR is a non-breaking change that updates the internals to use GalacticOptim instead of Optim, which in principle allows any GalacticOptim-compatible optimizer to be used instead of
Optim.LBFGS
(it seems not all optimizers support the use of callbacks, so those can't be used here).For single-path Pathfinder,
pathfinder
methods that takeGalacticOptim.OptimizationFunction
orGalacticOptim.OptimizationProblem
are now exposed.Edit: Because the SciML ecosystem uses Julia 1.6 as the lower version bound, we also bump our lower bound to 1.6, the latest LTS.
Fixes SciML/Optimization.jl#8