Averaging over Observations per Trace? #504

collinskatie · 2020-09-10T17:27:46Z

collinskatie
Sep 10, 2020

Hi,

Is there a way to average over the sampling of a set of observations for a single trace when scoring that trace (for instance, in the update step or when deciding whether to accept/reject a change?)

Our model has noise within the collision dynamics of the physics engine, so the sampled trajectory of a ball (we trace the x and y per timepoint assuming they are sampled from a gaussian centered at that x,y position); however, if the noise "accidentally" makes the trajectory match very closely this trace will dominate our set of sampled traces, when in reality, this is a coincidental "good fit" with the observations of the ball trajectory we are trying to match.

Given this noise, is there a way during inference to average over multiple runs of our physics engine per trace? For context, our generative model runs this forward once and traces the x,y at each step. We have also tried running multiple chains, but still within some chains, one trace dominates given the "coincidental" noise fit.

I'm happy to provide any clarification to the question if needed - thank you for any help!

Answered by marcoct

Sep 10, 2020

@collinskatie That's a really good question, which is closely related to "pseudo-marginal" Monte Carlo inference algorithms, which use stochastic approximations to marginal likelihoods which are integrals over the values of encapsulated random choices like your dynamics noise. Every operation (including update) triggers fresh simulation of these choices and this can be seen a single-sample importance sampling estimate of the marginal likelihood, which is an integral over the encapsulated randomness in the dynamics. The GFI was specifically designed to allow you to increase the number of replicates used to estimate these integrals without needing to change your inference code, so it's real…

View full answer

marcoct · 2020-09-10T19:28:22Z

marcoct
Sep 10, 2020
Collaborator

@collinskatie That's a really good question, which is closely related to "pseudo-marginal" Monte Carlo inference algorithms, which use stochastic approximations to marginal likelihoods which are integrals over the values of encapsulated random choices like your dynamics noise. Every operation (including update) triggers fresh simulation of these choices and this can be seen a single-sample importance sampling estimate of the marginal likelihood, which is an integral over the encapsulated randomness in the dynamics. The GFI was specifically designed to allow you to increase the number of replicates used to estimate these integrals without needing to change your inference code, so it's really cool to see you have a need for this.

There are a couple ways to implement things such that N runs of the simulator are used instead of 1 run within every call to e.g. update.

It is actually be possible to use a generic generative function combinator that wraps your existing model and accepts a number of replicates to use for each operation. But unfortunately we don't have a version of that combinator ready to be released yet.

In lieu of that combinator, it should be possible to extend your model implementation to have this behavior:

First, factor your model so it breaks down into a sequence of three steps if it doesn't already: (i) sample latent traced random choices, (ii) run simulator without any traced random choices, (iii) sample observable traced random choices.
Then, replace part (ii) with a version that runs the simulation N times, and returns instead of one simulation result a vector of simulation results.
Then, replace each observation random choice distribution (e.g. Gaussian with mean at a simulated location) with a uniform mixture distribution (e.g. a mixture of Gaussians where the means are the vector of simulated locations). I'll include some code below that defines a Mixture type that might be useful for this. It should be possible to replace each e.g. normal random choices with a call to mixture_of_normal e.g. maybe something like this:

const N = 10 # number of simulations that will be used within every model operation
const mixture_of_normals = Mixture([normal for _ in 1:N])

..
@gen function model( ..) 
..
simulation_results::Vector = run_replicated_simulation(.., N)

for object_id in objects
   {(:x, t, object_id)} ~ mixture_of_normals([fill(1/N, N)], [(get_object_x(simulation_results[i], object_id), noise) for i in 1:N])
   {(:y, t, object_id)} ~ mixture_of_normals([fill(1/N, N)], [(get_object_y(simulation_results[i], object_id), noise) for i in 1:N])
end
..
end

I suppose it would be better to make N an argument to the model. It is fine to dynamically construct the distribution within the model code if you are using the dynamic modeling language e.g."

{(:x, t, object_id)} ~ (Mixture([normal for _ in 1:N]))([fill(1/N, N)], ... )

Of course you can decide not to use Mixture and write your own custom mixture distributions that are more specialized for your use case.

Here is the definition of Mixture:

using Gen

struct Mixture{T} <: Distribution{T}
    components::Vector{Distribution{T}}
end

function Gen.logpdf(
        dist::Mixture{T}, x::T, weights::Vector{Float64},
        arg_tuples::Vector) where {T}
    ls = Vector{Float64}(undef, length(dist.components))
    for i=1:length(dist.components)
        ls[i] = logpdf(dist.components[i], x, arg_tuples[i]...) + log(weights[i])
    end
    return logsumexp(ls)
end

function Gen.random(
        dist::Mixture, weights::Vector{Float64},
        arg_tuples::Vector)
    i = categorical(weights)
    return random(dist.components[i], arg_tuples[i]...)
end

function Gen.logpdf_grad(
        dist::Mixture{T}, x::T, weights::Vector{Float64},
        arg_tuples::Vector) where {T}
    error("not implemented")
end

(dist::Mixture)(weights, arg_tuples) = random(dist, weights, arg_tuples)
Gen.is_discrete(dist::Mixture) = is_discrete(dist.components[1])
Gen.has_output_grad(dist::Mixture) = false
Gen.has_argument_grads(dist::Mixture) = (false, false)

0 replies

collinskatie · 2020-09-11T19:43:11Z

collinskatie
Sep 11, 2020
Author

Hi @marcoct - thank you so much!! A mixture of normals is exactly what we're looking for.

I pasted in the Mixture class you wrote and started adapting our generative model as you suggested and trying adapting some of your code snippets, but am having some trouble creating the mixture of normals with the class, as is.

Is the way you recommended the best way to create the struct once defined? I'm not exactly sure of the type that the Mixture struct is expecting when we construct it (I think I'm a bit confused on some the Gen syntax still)?

Here is the error with the command I ran:

julia> const mixture_of_normals = Mixture([normal for _ in 1:N])
ERROR: MethodError: no method matching Mixture(::Array{Gen.Normal,1})
Closest candidates are:
  Mixture(::Array{Distribution{T},1}) where T at none:2
Stacktrace:
 [1] top-level scope at none:0

I'm a bit confused how [normal for _ in 1:N] would not satisfy the array of distributions type the struct seems to want?

Thanks!

0 replies

marcoct · 2020-09-11T19:56:06Z

marcoct
Sep 11, 2020
Collaborator

@collinskatie This should work:

 Mixture{Float64}([normal for _ in 1:10])

0 replies

collinskatie · 2020-09-11T19:57:09Z

collinskatie
Sep 11, 2020
Author

Perfect yes that works - thanks!!

0 replies

collinskatie · 2020-09-12T03:47:15Z

collinskatie
Sep 12, 2020
Author

Hi @marcoct - sorry for all of the questions. I restructured all of our code and have nearly everything working except the actual sampling for the trace. It seems the logpdf types aren't matching up exactly as defined? I've been trying to parse the error message and it seems to me that the types are what's expected for the parameters of the function you defined within the Mixture struct?

{:x} ~ Mixture{Float64}([normal for _ in 1:N])([fill(1/N, N)], [(xs[i], measurement_noise) for i in 1:10])

ERROR: LoadError: MethodError: no method matching logpdf(::Mixture{Float64}, ::Float64, ::Array{Array{Float64,1},1}, ::Array{Tuple{Float64,Float64},1})

Do you have a sense for what may be going on here - I will keep looking regardless but just wanted to check if there was an obvious Gen or julia error that I'm missing. xs here is an array of floats and measurement noise is a float as well.

Thanks!

0 replies

alex-lew · 2020-09-12T11:27:55Z

alex-lew
Sep 12, 2020
Maintainer

@collinskatie I think the problem may be that you have square brackets around fill(1/N, N) -- the fill function already produces a list, so you should be OK removing them :-)

{:x} ~ Mixture{Float64}([normal for _ in 1:N])(fill(1/N, N), [(x, measurement_noise) for x in xs])

0 replies

collinskatie · 2020-09-12T22:54:07Z

collinskatie
Sep 12, 2020
Author

Thank you @alex-lew !! That worked!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Averaging over Observations per Trace? #504

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Averaging over Observations per Trace? #504

collinskatie Sep 10, 2020

Replies: 7 comments

marcoct Sep 10, 2020 Collaborator

collinskatie Sep 11, 2020 Author

marcoct Sep 11, 2020 Collaborator

collinskatie Sep 11, 2020 Author

collinskatie Sep 12, 2020 Author

alex-lew Sep 12, 2020 Maintainer

collinskatie Sep 12, 2020 Author

collinskatie
Sep 10, 2020

marcoct
Sep 10, 2020
Collaborator

collinskatie
Sep 11, 2020
Author

marcoct
Sep 11, 2020
Collaborator

collinskatie
Sep 11, 2020
Author

collinskatie
Sep 12, 2020
Author

alex-lew
Sep 12, 2020
Maintainer

collinskatie
Sep 12, 2020
Author