Support ADTypes and autodiff by default #178

sethaxen · 2024-03-16T18:01:52Z

Turing, Optimization, and LogDensityProblems now support specifying an AD backend using ADTypes. Turing and Optimization do this via an adtype keyword. This PR adds an adtype parameter to pathfinder and multipathfinder, which then hooks into Optimization.jl's AD machinery to automatically differentiate the log-density function.

Since we depend on ForwardDiff indirectly via Optim, we use AutoForwardDiff as the default adtype, so that AD functions are automatically differentiated by default.

As a result, the PR also adds support for generic log-density functions (not just those that implement the LogDensityProblems interface).

Fixes #93

Optimization.jl doesn't like this

Seems to be optional, and Turing models don't support it

codecov · 2024-03-16T18:04:48Z

Codecov Report

Attention: Patch coverage is 96.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.61%. Comparing base (fae6c4b) to head (dccbc6d).

Files	Patch %	Lines
ext/PathfinderTuringExt.jl	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #178      +/-   ##
==========================================
- Coverage   84.72%   80.61%   -4.12%     
==========================================
  Files          13       13              
  Lines         609      614       +5     
==========================================
- Hits          516      495      -21     
- Misses         93      119      +26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Increasing these lower bounds ultimately produces a conflict since they require BangBang v0.4, but no Turing version yet is compatible with this version.

SamuelBrand1 · 2024-03-18T14:37:01Z

Hi @sethaxen,

Following up from #93 . For our example, pathfinder fails here with a MethodError on a promote_rule

MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.

Is that an error on Forward-over-reverse hessian calculation?

sethaxen · 2024-03-18T14:57:53Z

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.

At first glance, I don't think this is a Pathfinder issue. We only use log2π in one place, and its outside of anything AD should be seeing. Does this work fine?

using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())

If so, this is more or less what Pathfinder is doing internally, so the issue is probably somewhere in Pathfinder. If this fails, then the issue is somewhere else. Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)

Can you post the full stack trace?

Is that an error on Forward-over-reverse hessian calculation?

Not certain. Neither L-BFGS nor Pathfinder computes the Hessian, so if your model doesn't either, I don't see why this would be forward-over-reverse. But ReverseDiff does use ForwardDiff internally in a few places.

SamuelBrand1 · 2024-03-19T10:19:57Z

Its a bit of a complicated example so if that doesn't make a lot of sense; I can try and find a more minimal fail.
Does this work fine?
using Optimization, OptimizationOptimJL
prob = Turing.optim_problem(inference_mdl, MAP(); adtype = AutoReverseDiff(true), constrained=false)
sol = solve(prob, LBFGS())

For some reason that doesn't generate an init; I thought there was a fallback?

ERROR: MethodError: no method matching init(::@NamedTuple{…}, ::LBFGS{…})
Closest candidates are:
init(::OptimizationProblem, ::Any, Any...; kwargs...)
@ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164
init(::Roots.ZeroProblem, ::Any; kwargs...)
@ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:306
init(::Roots.AbstractUnivariateZeroMethod, ::Any, ::Roots.AbstractUnivariateZeroState)
@ Roots ~/.julia/packages/Roots/neTBD/src/find_zero.jl:311
...

Stacktrace:
[1] solve(::@NamedTuple{…}, ::Vararg{…}; kwargs::@kwargs{})
@ CommonSolve ~/.julia/packages/CommonSolve/JfpfI/src/CommonSolve.jl:23
[2] top-level scope
@ REPL[9]:1
Some type information was truncated. Use show(err) to see complete types.

EDIT: Turing.optim_problem does create a Turing.Optimisation.Init object as a field of prob but its obviously not dispatching for some reason.

SamuelBrand1 · 2024-03-19T10:21:50Z

Can you post the full stack trace?

Here you go:

ERROR: MethodError: promote_rule(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{OptimizationReverseDiffExt.OptimizationReverseDiffTag, Float64}, Float64, 12}}) is ambiguous.

Candidates:
promote_rule(::Type{S}, ::Type{T}) where {S<:AbstractIrrational, T<:Number}
@ Base irrationals.jl:45
promote_rule(::Type{<:AbstractIrrational}, ::Type{T}) where T<:Real
@ Base irrationals.jl:44
promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:Real, T, V, N}
@ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/dual.jl:427

Possible fix, define
promote_rule(::Type{R}, ::Type{ForwardDiff.Dual{T, V, N}}) where {R<:AbstractIrrational, T, V, N}

Stacktrace:
[1] promote_type(::Type{IrrationalConstants.Log2π}, ::Type{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}})
@ Base ./promotion.jl:313
[2] promote_type(::Type, ::Type, ::Type)
@ Base ./promotion.jl:299
[3] ForwardOptimize
@ ~/.julia/packages/ReverseDiff/UJhiD/src/macros.jl:122 [inlined]
[4] +
@ ~/.julia/packages/ReverseDiff/UJhiD/src/derivatives/scalars.jl:17 [inlined]
[5] normlogpdf(z::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing})
@ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:29
[6] normlogpdf(μ::Float64, σ::Float64, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing})
@ StatsFuns ~/.julia/packages/StatsFuns/mrf0e/src/distrs/norm.jl:41
[7] logpdf(d::Normal{Float64}, x::ReverseDiff.TrackedReal{ForwardDiff.Dual{…}, ForwardDiff.Dual{…}, Nothing})
@ Distributions ~/.julia/packages/Distributions/UaWBm/src/univariates.jl:645
[8] logpdf(d::Truncated{…}, x::ReverseDiff.TrackedReal{…})
@ Distributions ~/.julia/packages/Distributions/UaWBm/src/truncate.jl:161
[9] tilde_assume(ctx::Turing.Optimisation.OptimizationContext{…}, dist::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…})
@ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:60
[10] tilde_assume
@ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:63 [inlined]
[11] tilde_assume
@ ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:57 [inlined]
[12] tilde_assume!!(context::DynamicPPL.FixedContext{…}, right::Truncated{…}, vn::VarName{…}, vi::DynamicPPL.ThreadSafeVarInfo{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/context_implementations.jl:138
[13] generate_latent(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, latent_model::RandomWalk{…}, n::Int64)
@ EpiAware.EpiLatentModels ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/EpiLatentModels/randomwalk.jl:89
[14] _evaluate!!
@ ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963 [inlined]
[15] macro expansion
@ ~/.julia/packages/DynamicPPL/pg70d/src/submodel_macro.jl:244 [inlined]
[16] make_epi_aware(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::DynamicPPL.FixedContext{…}, y_t::Vector{…}, time_steps::Int64; epi_model::DirectInfections{…}, latent_model::RandomWalk{…}, observation_model::DelayObservations{…})
@ EpiAware ~/GitHub/CFA/Rt-without-renewal/EpiAware/src/make_epi_aware.jl:8
[17] _evaluate!!(model::Model{…}, varinfo::DynamicPPL.ThreadSafeVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:963
[18] evaluate_threadsafe!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:952
[19] evaluate!!(model::Model{…}, varinfo::TypedVarInfo{…}, context::Turing.Optimisation.OptimizationContext{…})
@ DynamicPPL ~/.julia/packages/DynamicPPL/pg70d/src/model.jl:887
[20] (::LogDensityFunction{…})(z::ReverseDiff.TrackedArray{…})
@ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:111
[21] (::Turing.Optimisation.var"#l#4"{…})(x::ReverseDiff.TrackedArray{…}, ::SciMLBase.NullParameters)
@ Turing.Optimisation ~/.julia/packages/Turing/lkUBK/src/optimisation/Optimisation.jl:294
[22] (::OptimizationReverseDiffExt.var"#51#75"{…})(::ReverseDiff.TrackedArray{…})
@ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:157
[23] ReverseDiff.GradientTape(f::OptimizationReverseDiffExt.var"#51#75"{…}, input::Vector{…}, cfg::ReverseDiff.GradientConfig{…})
@ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:199
[24] ReverseDiff.GradientTape(f::Function, input::Vector{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 12}})
@ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/api/tape.jl:198
[25] instantiate_function(f::OptimizationFunction{…}, cache::OptimizationBase.ReInitCache{…}, adtype::AutoReverseDiff, num_cons::Int64)
@ OptimizationReverseDiffExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationReverseDiffExt.jl:187
[26] OptimizationCache(prob::OptimizationProblem{…}, opt::LBFGS{…}, data::Base.Iterators.Cycle{…}; callback::Pathfinder.OptimizationCallback{…}, maxiters::Int64, maxtime::Nothing, abstol::Nothing, reltol::Nothing, progress::Bool, kwargs::@kwargs{})
@ OptimizationBase ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:27
[27] OptimizationCache
@ ~/.julia/packages/OptimizationBase/rRpJs/src/cache.jl:17 [inlined]
[28] #__init#2
@ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:101 [inlined]
[29] __init (repeats 2 times)
@ ~/.julia/packages/OptimizationOptimJL/fdBis/src/OptimizationOptimJL.jl:68 [inlined]
[30] #init#599
@ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:166 [inlined]
[31] init
@ ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:164 [inlined]
[32] solve(::OptimizationProblem{…}, ::LBFGS{…}; kwargs::@kwargs{…})
@ SciMLBase ~/.julia/packages/SciMLBase/RHbdj/src/solve.jl:96
[33] optimize_with_trace(prob::OptimizationProblem{…}, optimizer::LBFGS{…}; progress_name::String, progress_id::Base.UUID, maxiters::Int64, callback::Nothing, fail_on_nonfinite::Bool, kwargs::@kwargs{})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:53
[34] optimize_with_trace
@ ~/.julia/packages/Pathfinder/99eNx/src/optimize.jl:34 [inlined]
[35] _pathfinder(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, executor::SequentialEx{…}, kwargs::@kwargs{…})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:290
[36] _pathfinder
@ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:279 [inlined]
[37] _pathfinder_try_until_succeed(rng::Random._GLOBAL_RNG, prob::OptimizationProblem{…}, logp::Pathfinder.var"#logp#26"{…}; ntries::Int64, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, allow_mutating_init::Bool, kwargs::@kwargs{…})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:265
[38] _pathfinder_try_until_succeed
@ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:253 [inlined]
[39] #25
@ ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:182 [inlined]
[40] progress(f::Pathfinder.var"#25#27"{…}; name::String)
@ ProgressLogging ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:262
[41] progress
@ ~/.julia/packages/ProgressLogging/6KXlp/src/ProgressLogging.jl:258 [inlined]
[42] pathfinder(prob::OptimizationProblem{…}; rng::Random._GLOBAL_RNG, history_length::Int64, optimizer::LBFGS{…}, ndraws_elbo::Int64, ndraws::Int64, input::Model{…}, kwargs::@kwargs{})
@ Pathfinder ~/.julia/packages/Pathfinder/99eNx/src/singlepath.jl:181
[43] pathfinder(model::Model{…}; rng::Random._GLOBAL_RNG, init_scale::Int64, init_sampler::Pathfinder.UniformSampler{…}, init::Nothing, adtype::AutoReverseDiff, kwargs::@kwargs{})
@ PathfinderTuringExt ~/.julia/packages/Pathfinder/99eNx/ext/PathfinderTuringExt.jl:118
[44] top-level scope
@ ~/GitHub/CFA/Rt-without-renewal/EpiAware/docs/src/examples/getting_started.jl:295
Some type information was truncated. Use show(err) to see complete types.

SamuelBrand1 · 2024-03-19T10:23:19Z

Can Turing sample fine with HMC (as in, compute the gradient of the log-density without erroring?)

Yes, the workflow is that pathfinder is part of initialisation for a HMC/NUTS sampler and the NUTS runs ok (even better with good pre-heating!).

SamuelBrand1 · 2024-03-19T10:24:39Z

Thanks for looking into this!

sethaxen · 2024-03-19T13:03:38Z

@SamuelBrand1 this line is using a deprecated syntax: https://github.com/CDCgov/Rt-without-renewal/blob/d6344cc6e451e3e6c4188e4984247f890ae60795/EpiAware/src/EpiLatentModels/utils.jl#L19, which could be the issue here.
Try this instead, and let me know if it works truncated(Normal(0, prior_mean * sqrt(pi) / sqrt(2)); lower=0)

SamuelBrand1 · 2024-03-19T13:27:32Z

That doesn't solve this problem.

However, switching to AutoReverseDiff(false) does. The model I'm sampling can fail because it can sample some big numbers in the initialisation phase which causes problems with sampling from a Negative binomial, this seems to cause problems with pathfinder in reversediff but either not a problem or a silent problem for AdvancedHMC NUTS calls.

I think making the model more resilient is our problem though!

Thanks for helping so much.

sethaxen · 2024-03-19T14:06:27Z

However, switching to AutoReverseDiff(false) does

Ah, okay. Tape compilation is only safe if during program execution the same branches of all control flow are always taken. Not certain what control flow you might be encountering here. If this is the issue, it's maybe a little strange that Turing and Pathfinder could hit different branches. One way this can happen is if the bulk of the probability mass hits one branch, while the mode hits a different branch, since modes can look very different from the bulk; then you expect any good optimizer to hit a different branch.

Either way, if AutoReverseDiff(false) is working with Pathfinder, that seems to indicate well that this PR is doing the right thing, and the issue is probably elsewhere.

Thanks for helping so much.

No problem!

sethaxen added 12 commits March 16, 2024 18:11

Remove ForwardDiff as dependency

063f314

Add ADTypes as dependency

ca3f6e9

Add adtype keyword to singlepath

3861625

Refactor utility functions to use AD if available

b742f5b

Support OptimizationFunction constructed with adtype

2c38583

Support adtype in multipathfinder

53da397

Support adtype for Turing models

b0724ef

Remove error checks for methods that now work

3a74d2a

Increment patch number

9891aa1

Don't set chunksize to 0

50564d4

Optimization.jl doesn't like this

Don't assume parameters supported

5950d27

Seems to be optional, and Turing models don't support it

Remove type unknown to extension

4495169

sethaxen added 17 commits March 16, 2024 20:52

Remove no-longer-needed check

1335c12

Remove build_optim_problem

319386f

Unify LogDensityProblems methods

3c0c047

Only include test_utils.jl once

68bbf23

Unify example LogDensityProblems

7647406

Update tests

4601c1d

Remove adtype constraint to allow NoAD

dbb9e26

Bump compats to Optimization and Turing versions with ADTypes

ee2d6c8

Bump deps lower bounds to match those of Turing/Optimization

ae49cae

Bump ForwardDiff lower bound for tests

c99d427

Bump Optim compat to support Compat v4

0fa8380

Bump lower bounds for DynamicPPL/Compat compat

7bedd4d

Bump compats in other environments to match

0327480

Bump lower bounds for docs for DynamicPPL compatibility

d64019c

Skip downgrading Folds and Transducers for docs/integrations

0632e9c

Increasing these lower bounds ultimately produces a conflict since they require BangBang v0.4, but no Turing version yet is compatible with this version.

Fix test

58d8ab7

Update since ∇f removed from callback

4f0a455

sethaxen added 4 commits March 19, 2024 18:01

Add AbstractADType type constraint

da24fea

Update docstring

0d4feb3

Note ADTypes support

a83b279

Merge branch 'main' into adtypes

59a12a5

sethaxen closed this Jul 2, 2024

sethaxen reopened this Jul 2, 2024

sethaxen added 10 commits July 2, 2024 16:55

Load ADTypes

9947333

Remove unneeded call to dim

b55a1a5

Add ADTypes and ReverseDiff as docs deps

063d5a7

Update quickstart to use and describe ADTypes

cf562a6

Merge branch 'main' into adtypes

0f5bfdd

Bump ReverseDiff compat lower bound

92c4c66

Increment version number

79bebb7

Test adtype is correctly used if function provided

a341ac9

Bump Pathfinder compats for docs and integration tests

be70deb

Avoid deploying docs for downgrade builds

dccbc6d

sethaxen marked this pull request as ready for review July 2, 2024 19:59

sethaxen merged commit c334ef7 into main Jul 2, 2024
16 of 17 checks passed

sethaxen deleted the adtypes branch July 2, 2024 20:30

sethaxen mentioned this pull request Jul 4, 2024

Add support for Turing v0.33 #189

Merged

sethaxen mentioned this pull request Aug 19, 2024

Improving compile times #200

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ADTypes and autodiff by default #178

Support ADTypes and autodiff by default #178

sethaxen commented Mar 16, 2024 •

edited

Loading

codecov bot commented Mar 16, 2024 •

edited

Loading

SamuelBrand1 commented Mar 18, 2024

sethaxen commented Mar 18, 2024

SamuelBrand1 commented Mar 19, 2024 •

edited

Loading

SamuelBrand1 commented Mar 19, 2024

SamuelBrand1 commented Mar 19, 2024

SamuelBrand1 commented Mar 19, 2024

sethaxen commented Mar 19, 2024

SamuelBrand1 commented Mar 19, 2024

sethaxen commented Mar 19, 2024

Support ADTypes and autodiff by default #178

Support ADTypes and autodiff by default #178

Conversation

sethaxen commented Mar 16, 2024 • edited Loading

codecov bot commented Mar 16, 2024 • edited Loading

Codecov Report

SamuelBrand1 commented Mar 18, 2024

sethaxen commented Mar 18, 2024

SamuelBrand1 commented Mar 19, 2024 • edited Loading

SamuelBrand1 commented Mar 19, 2024

SamuelBrand1 commented Mar 19, 2024

SamuelBrand1 commented Mar 19, 2024

sethaxen commented Mar 19, 2024

SamuelBrand1 commented Mar 19, 2024

sethaxen commented Mar 19, 2024

sethaxen commented Mar 16, 2024 •

edited

Loading

codecov bot commented Mar 16, 2024 •

edited

Loading

SamuelBrand1 commented Mar 19, 2024 •

edited

Loading