Best way to use a precalculated prior defined as a PMF on a lattice #2146

ghost · 2017-05-08T09:40:33Z

Hi!

I have a prior distribution for one of my continuous variables in a form of an an array containing probability mass function for intervals on a discretized lattice. For example, it might be a precalculated kernel density estimation or a pre-calculated mixture of large number of components. It doesn't matter where the distribution came from, the main idea is that it is "smooth enough" and that the lattice step size can be as small as necessary.

There is a need to use this prior in a PyMC3 model. What would be the best way to accomplish it?

My plan is to use DensityDist with a custom Theano operation that

calculates probability density at each point using values from the lattice array;
calculates gradient using finite differentiation on the lattice.

Is it a feasible approach? How sensitive NUTS sampler is to possible gradient discontinuities? If it is, could it be mitigated by adding something like spline interpolation?

Also are you interesting in adding such distribution to PyMC3?

The text was updated successfully, but these errors were encountered:

junpenglao · 2017-05-08T09:51:38Z

You can approximate your PDF using a Kernel density estimation (KDE), and used that as a function in your model. For an example see below:
https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/updating_priors.ipynb

ghost · 2017-05-08T09:56:19Z

@junpenglao Thank you, I was looking for an example like this.

The only thing I'm concerned about is that

The NUTS sampling method cannot be used anymore since it requires to provide the gradient of the distribution, so we will use Slice sampling.

That's why I want to implement a custom operation with a gradient method, to continue using NUTS sampler.

aseyboldt · 2017-05-08T10:09:02Z

You can define a new theano op with a gradient by subclassing theano.Op and implementing grad(). See the theano docs.
Couldn't you get a continuous differentiable logp by using splines or maybe a gaussian kde?

ghost · 2017-05-08T10:22:07Z

@aseyboldt I don't want to use KDE or mixtures directly because they are too slow to evaluate on each sampler step. I'm implementing a custom Theano operation.

aseyboldt · 2017-05-08T10:26:41Z

Splines should be pretty fast though, right? There might even be a spline op for theano out there somewhere...
And scipy has pretty convenient support for computing the gradients already.

ghost · 2017-05-08T10:38:10Z

@aseyboldt Yes, splines should be almost as fast as finite differences. But to use them it is still necessary to define a custom operation with a custom gradient.

However it is a good idea to try create a Theano operation using splines from SciPy package, as my calculations would anyway be performed on CPU.

junpenglao · 2017-05-08T10:57:11Z

It might work also to approximate it with a GP, then you don't need to write your own theano_op.

ghost · 2017-05-08T15:56:20Z

I implemented a custom Theano operation and it works

import pymc3 as pm
from pymc3.distributions import transforms
import theano
import theano.tensor as tt
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline
%matplotlib inline        

class SplineLikelihood (theano.Op):
        
    def __init__(self, pmf, lower, upper):
        self.itypes, self.otypes = [tt.dscalar], [tt.dscalar]
        self.density = InterpolatedUnivariateSpline(np.linspace(lower, upper, len(pmf)), pmf, k=1, ext='zeros')
        self.density_grad = self.density.derivative()
        self.Z = self.density.integral(lower, upper)
        
        @theano.as_op(itypes=[tt.dscalar], otypes=[tt.dscalar])
        def grad_op(x):
            return np.asarray(self.density_grad(x))
        self.grad_op = grad_op
        
    def perform(self, node, inputs, output_storage):
        x, = inputs
        output_storage[0][0] = np.asarray(self.density(x) / self.Z)

    def grad(self, inputs, grads):
        x, = inputs
        x_grad, = grads # added later, thanks to @aseyboldt
        return [x_grad * self.grad_op(x) / self.Z]
    
class SplineDist(pm.Continuous):
    
    def __init__(self, pmf, lower=0, upper=1, transform='interval', *args, **kwargs):
        if transform == 'interval':
            transform = transforms.interval(lower, upper)
        super().__init__(transform=transform, testval=(upper - lower) / 2, *args, **kwargs)
        self.likelihood = SplineLikelihood(pmf, lower, upper)
        
    def logp(self, value):
        return tt.log(self.likelihood(value))
    
with pm.Model() as m:
    spline_p = SplineDist('spline_p', np.linspace(1.0, 10.0, 100), 0, 1)
    trace = pm.sample(100000, random_seed=1, njobs=1)

    pm.traceplot(trace)

However I encountered a strange behavior here. If I remove the division from Theano operation and put it instead to the logp function, i.e.

return tt.log(self.likelihood(value) / self.likelihood.Z)

then the NUTS samper samples at least 10x times slower.

junpenglao · 2017-05-08T16:02:41Z

Nicely done! Are you on master? the speed is the same on my side.

ghost · 2017-05-08T18:26:13Z

I ran it on 3.0. Actually it looks like I a bit overestimated the speed because the sampler progress bar predicted total sampling time near 10 minutes in the beginning of sampling, but then eventually sampled it in something like 2.5 minutes, that's just a twice more than for the first version.

I tried to run it on master, and the results are the same, the second version is only twice slower than the first one.

Btw would PyMC3 project be interested in a PR with a new distribution doing something similar to the code in the snippet I posted above? Maybe something that could be called like

dist = pymc3.distributions.Lattice('lattice_dist', pmf, lower=0, higher=1)

twiecki · 2017-05-08T19:58:34Z

I think both of those would be great contributions.

junpenglao · 2017-05-08T19:58:57Z

It would be great contributions!

aseyboldt · 2017-05-08T22:37:22Z

@a-rodin Nice!
Are you sure the grad is correct? It seems strange that grads isn't used.

ghost · 2017-05-09T07:32:03Z

@aseyboldt You are right, I had to multiply the result by grads[0]. For some reason I assumed that Theano automatically multiplies the gradient returned by grad by the gradient of the input variable.

It was the cause of the slowdown I described above, because in the second case the total gradient was calculated incorrectly. Now the performance is the same for both variants, even slightly better for the second one. I've updated the snippet. Thank you!

ghost mentioned this issue May 9, 2017

Add Interpolated distribution class #2163

Merged

ghost closed this as completed May 10, 2017

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to use a precalculated prior defined as a PMF on a lattice #2146

Best way to use a precalculated prior defined as a PMF on a lattice #2146

ghost commented May 8, 2017 •

edited by ghost

Loading

junpenglao commented May 8, 2017

ghost commented May 8, 2017 •

edited by ghost

Loading

aseyboldt commented May 8, 2017

ghost commented May 8, 2017 •

edited by ghost

Loading

aseyboldt commented May 8, 2017 •

edited

Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

junpenglao commented May 8, 2017

ghost commented May 8, 2017 •

edited by ghost

Loading

junpenglao commented May 8, 2017

ghost commented May 8, 2017 •

edited by ghost

Loading

twiecki commented May 8, 2017

junpenglao commented May 8, 2017

aseyboldt commented May 8, 2017

ghost commented May 9, 2017 •

edited by ghost

Loading

Best way to use a precalculated prior defined as a PMF on a lattice #2146

Best way to use a precalculated prior defined as a PMF on a lattice #2146

Comments

ghost commented May 8, 2017 • edited by ghost Loading

junpenglao commented May 8, 2017

ghost commented May 8, 2017 • edited by ghost Loading

aseyboldt commented May 8, 2017

ghost commented May 8, 2017 • edited by ghost Loading

aseyboldt commented May 8, 2017 • edited Loading

ghost commented May 8, 2017 • edited by ghost Loading

junpenglao commented May 8, 2017

ghost commented May 8, 2017 • edited by ghost Loading

junpenglao commented May 8, 2017

ghost commented May 8, 2017 • edited by ghost Loading

twiecki commented May 8, 2017

junpenglao commented May 8, 2017

aseyboldt commented May 8, 2017

ghost commented May 9, 2017 • edited by ghost Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

aseyboldt commented May 8, 2017 •

edited

Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

ghost commented May 8, 2017 •

edited by ghost

Loading

ghost commented May 9, 2017 •

edited by ghost

Loading