-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best way to use a precalculated prior defined as a PMF on a lattice #2146
Comments
You can approximate your PDF using a Kernel density estimation (KDE), and used that as a function in your model. For an example see below: |
@junpenglao Thank you, I was looking for an example like this. The only thing I'm concerned about is that
That's why I want to implement a custom operation with a gradient method, to continue using NUTS sampler. |
You can define a new theano op with a gradient by subclassing |
@aseyboldt I don't want to use KDE or mixtures directly because they are too slow to evaluate on each sampler step. I'm implementing a custom Theano operation. |
Splines should be pretty fast though, right? There might even be a spline op for theano out there somewhere... |
@aseyboldt Yes, splines should be almost as fast as finite differences. But to use them it is still necessary to define a custom operation with a custom gradient. However it is a good idea to try create a Theano operation using splines from SciPy package, as my calculations would anyway be performed on CPU. |
It might work also to approximate it with a GP, then you don't need to write your own theano_op. |
I implemented a custom Theano operation and it works import pymc3 as pm
from pymc3.distributions import transforms
import theano
import theano.tensor as tt
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline
%matplotlib inline
class SplineLikelihood (theano.Op):
def __init__(self, pmf, lower, upper):
self.itypes, self.otypes = [tt.dscalar], [tt.dscalar]
self.density = InterpolatedUnivariateSpline(np.linspace(lower, upper, len(pmf)), pmf, k=1, ext='zeros')
self.density_grad = self.density.derivative()
self.Z = self.density.integral(lower, upper)
@theano.as_op(itypes=[tt.dscalar], otypes=[tt.dscalar])
def grad_op(x):
return np.asarray(self.density_grad(x))
self.grad_op = grad_op
def perform(self, node, inputs, output_storage):
x, = inputs
output_storage[0][0] = np.asarray(self.density(x) / self.Z)
def grad(self, inputs, grads):
x, = inputs
x_grad, = grads # added later, thanks to @aseyboldt
return [x_grad * self.grad_op(x) / self.Z]
class SplineDist(pm.Continuous):
def __init__(self, pmf, lower=0, upper=1, transform='interval', *args, **kwargs):
if transform == 'interval':
transform = transforms.interval(lower, upper)
super().__init__(transform=transform, testval=(upper - lower) / 2, *args, **kwargs)
self.likelihood = SplineLikelihood(pmf, lower, upper)
def logp(self, value):
return tt.log(self.likelihood(value))
with pm.Model() as m:
spline_p = SplineDist('spline_p', np.linspace(1.0, 10.0, 100), 0, 1)
trace = pm.sample(100000, random_seed=1, njobs=1)
pm.traceplot(trace) However I encountered a strange behavior here. If I remove the division from Theano operation and put it instead to the return tt.log(self.likelihood(value) / self.likelihood.Z) then the NUTS samper samples at least 10x times slower. |
Nicely done! Are you on master? the speed is the same on my side. |
I ran it on 3.0. Actually it looks like I a bit overestimated the speed because the sampler progress bar predicted total sampling time near 10 minutes in the beginning of sampling, but then eventually sampled it in something like 2.5 minutes, that's just a twice more than for the first version. I tried to run it on master, and the results are the same, the second version is only twice slower than the first one. Btw would PyMC3 project be interested in a PR with a new distribution doing something similar to the code in the snippet I posted above? Maybe something that could be called like dist = pymc3.distributions.Lattice('lattice_dist', pmf, lower=0, higher=1) |
I think both of those would be great contributions. |
It would be great contributions! |
@a-rodin Nice! |
@aseyboldt You are right, I had to multiply the result by It was the cause of the slowdown I described above, because in the second case the total gradient was calculated incorrectly. Now the performance is the same for both variants, even slightly better for the second one. I've updated the snippet. Thank you! |
Hi!
I have a prior distribution for one of my continuous variables in a form of an an array containing probability mass function for intervals on a discretized lattice. For example, it might be a precalculated kernel density estimation or a pre-calculated mixture of large number of components. It doesn't matter where the distribution came from, the main idea is that it is "smooth enough" and that the lattice step size can be as small as necessary.
There is a need to use this prior in a PyMC3 model. What would be the best way to accomplish it?
My plan is to use
DensityDist
with a custom Theano operation thatIs it a feasible approach? How sensitive NUTS sampler is to possible gradient discontinuities? If it is, could it be mitigated by adding something like spline interpolation?
Also are you interesting in adding such distribution to PyMC3?
The text was updated successfully, but these errors were encountered: