Averaging over Observations per Trace? #504
-
Hi, Is there a way to average over the sampling of a set of observations for a single trace when scoring that trace (for instance, in the update step or when deciding whether to accept/reject a change?) Our model has noise within the collision dynamics of the physics engine, so the sampled trajectory of a ball (we trace the x and y per timepoint assuming they are sampled from a gaussian centered at that x,y position); however, if the noise "accidentally" makes the trajectory match very closely this trace will dominate our set of sampled traces, when in reality, this is a coincidental "good fit" with the observations of the ball trajectory we are trying to match. Given this noise, is there a way during inference to average over multiple runs of our physics engine per trace? For context, our generative model runs this forward once and traces the x,y at each step. We have also tried running multiple chains, but still within some chains, one trace dominates given the "coincidental" noise fit. I'm happy to provide any clarification to the question if needed - thank you for any help! |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
@collinskatie That's a really good question, which is closely related to "pseudo-marginal" Monte Carlo inference algorithms, which use stochastic approximations to marginal likelihoods which are integrals over the values of encapsulated random choices like your dynamics noise. Every operation (including update) triggers fresh simulation of these choices and this can be seen a single-sample importance sampling estimate of the marginal likelihood, which is an integral over the encapsulated randomness in the dynamics. The GFI was specifically designed to allow you to increase the number of replicates used to estimate these integrals without needing to change your inference code, so it's really cool to see you have a need for this. There are a couple ways to implement things such that N runs of the simulator are used instead of 1 run within every call to e.g. update. It is actually be possible to use a generic generative function combinator that wraps your existing model and accepts a number of replicates to use for each operation. But unfortunately we don't have a version of that combinator ready to be released yet. In lieu of that combinator, it should be possible to extend your model implementation to have this behavior:
const N = 10 # number of simulations that will be used within every model operation
const mixture_of_normals = Mixture([normal for _ in 1:N])
..
@gen function model( ..)
..
simulation_results::Vector = run_replicated_simulation(.., N)
for object_id in objects
{(:x, t, object_id)} ~ mixture_of_normals([fill(1/N, N)], [(get_object_x(simulation_results[i], object_id), noise) for i in 1:N])
{(:y, t, object_id)} ~ mixture_of_normals([fill(1/N, N)], [(get_object_y(simulation_results[i], object_id), noise) for i in 1:N])
end
..
end I suppose it would be better to make N an argument to the model. It is fine to dynamically construct the distribution within the model code if you are using the dynamic modeling language e.g." {(:x, t, object_id)} ~ (Mixture([normal for _ in 1:N]))([fill(1/N, N)], ... ) Of course you can decide not to use Here is the definition of using Gen
struct Mixture{T} <: Distribution{T}
components::Vector{Distribution{T}}
end
function Gen.logpdf(
dist::Mixture{T}, x::T, weights::Vector{Float64},
arg_tuples::Vector) where {T}
ls = Vector{Float64}(undef, length(dist.components))
for i=1:length(dist.components)
ls[i] = logpdf(dist.components[i], x, arg_tuples[i]...) + log(weights[i])
end
return logsumexp(ls)
end
function Gen.random(
dist::Mixture, weights::Vector{Float64},
arg_tuples::Vector)
i = categorical(weights)
return random(dist.components[i], arg_tuples[i]...)
end
function Gen.logpdf_grad(
dist::Mixture{T}, x::T, weights::Vector{Float64},
arg_tuples::Vector) where {T}
error("not implemented")
end
(dist::Mixture)(weights, arg_tuples) = random(dist, weights, arg_tuples)
Gen.is_discrete(dist::Mixture) = is_discrete(dist.components[1])
Gen.has_output_grad(dist::Mixture) = false
Gen.has_argument_grads(dist::Mixture) = (false, false) |
Beta Was this translation helpful? Give feedback.
-
Hi @marcoct - thank you so much!! A mixture of normals is exactly what we're looking for. I pasted in the Mixture class you wrote and started adapting our generative model as you suggested and trying adapting some of your code snippets, but am having some trouble creating the mixture of normals with the class, as is. Is the way you recommended the best way to create the struct once defined? I'm not exactly sure of the type that the Mixture struct is expecting when we construct it (I think I'm a bit confused on some the Gen syntax still)? Here is the error with the command I ran:
I'm a bit confused how [normal for _ in 1:N] would not satisfy the array of distributions type the struct seems to want? Thanks! |
Beta Was this translation helpful? Give feedback.
-
@collinskatie This should work: Mixture{Float64}([normal for _ in 1:10]) |
Beta Was this translation helpful? Give feedback.
-
Perfect yes that works - thanks!! |
Beta Was this translation helpful? Give feedback.
-
Hi @marcoct - sorry for all of the questions. I restructured all of our code and have nearly everything working except the actual sampling for the trace. It seems the logpdf types aren't matching up exactly as defined? I've been trying to parse the error message and it seems to me that the types are what's expected for the parameters of the function you defined within the Mixture struct?
Do you have a sense for what may be going on here - I will keep looking regardless but just wanted to check if there was an obvious Gen or julia error that I'm missing. xs here is an array of floats and measurement noise is a float as well. Thanks! |
Beta Was this translation helpful? Give feedback.
-
@collinskatie I think the problem may be that you have square brackets around
|
Beta Was this translation helpful? Give feedback.
-
Thank you @alex-lew !! That worked! |
Beta Was this translation helpful? Give feedback.
@collinskatie That's a really good question, which is closely related to "pseudo-marginal" Monte Carlo inference algorithms, which use stochastic approximations to marginal likelihoods which are integrals over the values of encapsulated random choices like your dynamics noise. Every operation (including update) triggers fresh simulation of these choices and this can be seen a single-sample importance sampling estimate of the marginal likelihood, which is an integral over the encapsulated randomness in the dynamics. The GFI was specifically designed to allow you to increase the number of replicates used to estimate these integrals without needing to change your inference code, so it's real…