Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to use a precalculated prior defined as a PMF on a lattice #2146

Closed
ghost opened this issue May 8, 2017 · 14 comments
Closed

Best way to use a precalculated prior defined as a PMF on a lattice #2146

ghost opened this issue May 8, 2017 · 14 comments

Comments

@ghost
Copy link

ghost commented May 8, 2017

Hi!

I have a prior distribution for one of my continuous variables in a form of an an array containing probability mass function for intervals on a discretized lattice. For example, it might be a precalculated kernel density estimation or a pre-calculated mixture of large number of components. It doesn't matter where the distribution came from, the main idea is that it is "smooth enough" and that the lattice step size can be as small as necessary.

There is a need to use this prior in a PyMC3 model. What would be the best way to accomplish it?

My plan is to use DensityDist with a custom Theano operation that

  • calculates probability density at each point using values from the lattice array;
  • calculates gradient using finite differentiation on the lattice.

Is it a feasible approach? How sensitive NUTS sampler is to possible gradient discontinuities? If it is, could it be mitigated by adding something like spline interpolation?

Also are you interesting in adding such distribution to PyMC3?

@junpenglao
Copy link
Member

You can approximate your PDF using a Kernel density estimation (KDE), and used that as a function in your model. For an example see below:
https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/updating_priors.ipynb

@ghost
Copy link
Author

ghost commented May 8, 2017

@junpenglao Thank you, I was looking for an example like this.

The only thing I'm concerned about is that

The NUTS sampling method cannot be used anymore since it requires to provide the gradient of the distribution, so we will use Slice sampling.

That's why I want to implement a custom operation with a gradient method, to continue using NUTS sampler.

@aseyboldt
Copy link
Member

You can define a new theano op with a gradient by subclassing theano.Op and implementing grad(). See the theano docs.
Couldn't you get a continuous differentiable logp by using splines or maybe a gaussian kde?

@ghost
Copy link
Author

ghost commented May 8, 2017

@aseyboldt I don't want to use KDE or mixtures directly because they are too slow to evaluate on each sampler step. I'm implementing a custom Theano operation.

@aseyboldt
Copy link
Member

aseyboldt commented May 8, 2017

Splines should be pretty fast though, right? There might even be a spline op for theano out there somewhere...
And scipy has pretty convenient support for computing the gradients already.

@ghost
Copy link
Author

ghost commented May 8, 2017

@aseyboldt Yes, splines should be almost as fast as finite differences. But to use them it is still necessary to define a custom operation with a custom gradient.

However it is a good idea to try create a Theano operation using splines from SciPy package, as my calculations would anyway be performed on CPU.

@junpenglao
Copy link
Member

It might work also to approximate it with a GP, then you don't need to write your own theano_op.

@ghost
Copy link
Author

ghost commented May 8, 2017

I implemented a custom Theano operation and it works

import pymc3 as pm
from pymc3.distributions import transforms
import theano
import theano.tensor as tt
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline
%matplotlib inline        

class SplineLikelihood (theano.Op):
        
    def __init__(self, pmf, lower, upper):
        self.itypes, self.otypes = [tt.dscalar], [tt.dscalar]
        self.density = InterpolatedUnivariateSpline(np.linspace(lower, upper, len(pmf)), pmf, k=1, ext='zeros')
        self.density_grad = self.density.derivative()
        self.Z = self.density.integral(lower, upper)
        
        @theano.as_op(itypes=[tt.dscalar], otypes=[tt.dscalar])
        def grad_op(x):
            return np.asarray(self.density_grad(x))
        self.grad_op = grad_op
        
    def perform(self, node, inputs, output_storage):
        x, = inputs
        output_storage[0][0] = np.asarray(self.density(x) / self.Z)

    def grad(self, inputs, grads):
        x, = inputs
        x_grad, = grads # added later, thanks to @aseyboldt
        return [x_grad * self.grad_op(x) / self.Z]
    
class SplineDist(pm.Continuous):
    
    def __init__(self, pmf, lower=0, upper=1, transform='interval', *args, **kwargs):
        if transform == 'interval':
            transform = transforms.interval(lower, upper)
        super().__init__(transform=transform, testval=(upper - lower) / 2, *args, **kwargs)
        self.likelihood = SplineLikelihood(pmf, lower, upper)
        
    def logp(self, value):
        return tt.log(self.likelihood(value))
    
with pm.Model() as m:
    spline_p = SplineDist('spline_p', np.linspace(1.0, 10.0, 100), 0, 1)
    trace = pm.sample(100000, random_seed=1, njobs=1)

    pm.traceplot(trace)

However I encountered a strange behavior here. If I remove the division from Theano operation and put it instead to the logp function, i.e.

return tt.log(self.likelihood(value) / self.likelihood.Z)

then the NUTS samper samples at least 10x times slower.

@junpenglao
Copy link
Member

Nicely done! Are you on master? the speed is the same on my side.

@ghost
Copy link
Author

ghost commented May 8, 2017

I ran it on 3.0. Actually it looks like I a bit overestimated the speed because the sampler progress bar predicted total sampling time near 10 minutes in the beginning of sampling, but then eventually sampled it in something like 2.5 minutes, that's just a twice more than for the first version.

I tried to run it on master, and the results are the same, the second version is only twice slower than the first one.

Btw would PyMC3 project be interested in a PR with a new distribution doing something similar to the code in the snippet I posted above? Maybe something that could be called like

dist = pymc3.distributions.Lattice('lattice_dist', pmf, lower=0, higher=1)

@twiecki
Copy link
Member

twiecki commented May 8, 2017

I think both of those would be great contributions.

@junpenglao
Copy link
Member

It would be great contributions!

@aseyboldt
Copy link
Member

@a-rodin Nice!
Are you sure the grad is correct? It seems strange that grads isn't used.

@ghost
Copy link
Author

ghost commented May 9, 2017

@aseyboldt You are right, I had to multiply the result by grads[0]. For some reason I assumed that Theano automatically multiplies the gradient returned by grad by the gradient of the input variable.

It was the cause of the slowdown I described above, because in the second case the total gradient was calculated incorrectly. Now the performance is the same for both variants, even slightly better for the second one. I've updated the snippet. Thank you!

@ghost ghost closed this as completed May 10, 2017
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants