-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaNs introduced with 3 posteriordb models #94
Comments
That's me! I'm just here to add that these three posteriors are the only ones that I tried with Pathfinder 😅 |
This is the kind of consistency we'd rather not have. 😬 |
Well the upside is that Pathfinder only "sometimes" gets stuck and takes forever because it can't recover. I do think that Pathfinder should abort as soon as it encounters issues, just be caused you can't really hope for a good approximation anyways and there's no point in spending a lot of computation (on a single Pathfinder run). |
We used to terminate on |
I think the potential benefit of Pathfinder has been a bit overstated and it's not worth it to spend a lot of computation. If the posterior is simple, then it doesn't matter that you can sample from it even more easily, and if the posterior is difficult, then the approximation is not worth much. But all the decisions are of course yours. I'm happy with the way it is, especially after you told me how to make Pathfinder abort early. But asimple keyword argument would of course be quite nice to have as well. |
I was unable to reproduce the nan behavior for Here we can improve things quite a bit by putting a POSTERIOR_DB_PATH = joinpath(ENV["HOME"], "projects", "posteriordb", "posterior_database")
using BridgeStan, Pathfinder, ZipFile, Transducers, Test
function load_model(model_name, data_name; kwargs...)
data_path = joinpath(POSTERIOR_DB_PATH, "data", "data", "$data_name.json.zip")
stan_file = joinpath(POSTERIOR_DB_PATH, "models", "stan", "$model_name.stan")
tmpdir = mktempdir(; cleanup=false)
data = joinpath(tmpdir, "$data_name.json")
write(data, ZipFile.Reader(data_path).files[1])
return StanModel(; stan_file, data, kwargs...)
end
function make_pathfinder_inputs(model)
function logp(x)
return try
log_density(model, convert(Vector{Float64}, x); propto=true, jacobian=true)
catch e
NaN
end
end
function ∇logp(x)
return try
log_density_gradient(model, convert(Vector{Float64}, x); propto=true, jacobian=true)[2]
catch e
fill(NaN, length(x))
end
end
dim = param_unc_num(model)
return logp, ∇logp, Int(dim)
end
ndraws = 10
nruns = 5
executor = Transducers.SequentialEx()
model_data_names = [
("dogs", "dogs"),
("kilpisjarvi", "kilpisjarvi_mod"),
("diamonds", "diamonds"),
]
@testset "posteriordb" begin
@testset "$model_name" for (model_name, data_name) in model_data_names
model = load_model(model_name, data_name)
logp, ∇logp, dim = make_pathfinder_inputs(model)
callback = (x, lp) -> !isfinite(lp) || any(!isfinite, x)
for _ in 1:100
result = pathfinder(logp, ∇logp; dim, ndraws, callback)
@test all(isfinite, result.draws)
end
end
end Test Summary: | Pass Total Time
posteriordb | 300 300 36.4s Just retrying until we succeed feels wrong, especially since with HMC the wisdom is to never ignore numerical issues, but IIUC, this same approach is what was used for the paper, where the method in general outperformed ADVI. And we're not trying to get exact draws; we're primarily trying to get even just 1 point in the typical set and secondarily to get a reasonable estimate of the metric. So three action items I see here are:
|
There are many ways to use Pathfinder. In general, I find Pathfinder is not that useful for a well-behaved model. That is, if you're able to write your model in such a way that the step size and metric are quickly learned during warm-up (roughly, all unconstrained parameters are roughly on the same scale, zero-centered, and uncorrelated with low curvature and no heavy tails), then Pathfinder may perform warm-up more quickly, but maybe not by much. Where I think Pathfinder is useful is early in the modeling process, where the model may be poorly parametrized, and the posterior may have poor geometry. Ideally you can tell this from looking at the model, but that's often not the case, especially for novices. A useful tool for diagnosing model problems is to look at draws, but for such models, warm-up is often unbearably slow. Pathfinder can speed up these early stages by quickly (1-2 orders of magnitude faster than HMC warm-up) providing low quality draws and metric estimates that may still be diagnostically useful for improving the model to the point where Pathfinder is no longer useful. |
Oh right, I did wrap BridgeStan's lp/lpg calls in a try catch block and returned -Inf,Infs when an exception ocurred. I actually hadn't tried Pathfinder without that (or had I 🤔). BTW, would it make sense for pathfinder to allow the gradient function to return the density and its gradient? One of the selling points of multipathfinder is that it can (on expectation) handle posteriors with "irrelevant" minor modes, but to be able to do this, the assumption is that while some/many single pathfinder runs can/will fail, we can still recover by only using the "good" pathfinder runs (by resampling). So I think failing early and returning the last non-inf point or rerunning (or just having many runs in parallel in the first place) is in the spirit of the method. |
Perhaps in terms of API, but we would internally end up creating separate functions.
Yeah you've convinced me that this is more user-friendly. I wonder if it makes more sense to only terminate early on non-finite points, non-finite gradients, or |
Nikolas Siccha on Slack shared:
The text was updated successfully, but these errors were encountered: