fix AR(1) log likelihood #2285

eph · 2017-06-08T12:59:35Z

I've never contributed to an open source project, so I'm not sure if a pull request if overkill here.

junpenglao · 2017-06-08T13:16:41Z

Welcome! No contribution is too small ;-)
Another point of reference is the implementation in statsmodels: doc; code.

twiecki · 2017-06-09T11:54:06Z

pymc3/distributions/timeseries.py

        boundary = Normal.dist(0., tau=tau_e).logp

        innov_like = Normal.dist(k * x_im1, tau=tau_e).logp(x_i)
-        return boundary(x[0]) + tt.sum(innov_like) + boundary(x[-1])


Yeah, this looks like a bug to me.

twiecki · 2017-06-09T11:54:19Z

pymc3/tests/test_distributions.py



+def AR1_logpdf(value, k, tau_e):
+    return (sp.norm(loc=0,scale=1/np.sqrt(tau_e)).logpdf(value[0]) +


pep8 wants spaces around math operators.

But yeah, this test is a bit odd as it basically repcliates the likelihood. statsmodels is an idea but we don't want that as a dependency, but could be optional. @junpenglao?

I dont think we should use statsmodels in the test. I am thinking more rewriting our AR1 logp similar to the (exact) unconditional maximum likelihood as in statsmodels. We should take this opportunity to extend our AR1 model into AR(n).

Sorry about the test. I didn't know the best way to write one without introducing additional dependencies.

As it written now, the AR(1) likelihood calculation is conditional on $x_0 = 0$, which seems a bit arbitrary to me. It seems more natural to conditional on the initial observation.

You could use the unconditional distribution to calculate the likelihood of the initial observation (as in stats models), but this only makes sense if the process is stationary ($ |k| < 1 $). Maybe something like init in GaussianRandomWalk?

I am trying to write a more general AR RV, but I'm getting a bit tripped up by creating the matrix of lagged observations in theano.

twiecki · 2017-06-23T10:04:30Z

@eph Any progress?

junpenglao · 2017-06-23T10:30:04Z

@twiecki I think this is waiting for after 3.1rc4

twiecki · 2017-06-23T10:32:01Z

@junpenglao Definitely, but can start moving some things into the 3.2 branch when they are done.

ferrine · 2017-06-28T10:14:31Z

Is it ready?

junpenglao · 2017-06-28T10:16:38Z

I think we can merge this and improve the test later.

ferrine · 2017-06-28T10:51:48Z

Don't you think we'll forget about test. Or do we need test here?

junpenglao · 2017-06-28T11:10:12Z

@eph wrote a test here but it basically replicates the likelihood (see @twiecki's comment above).
But it actually wont be a big problem if we are going to rewrite the logp later closer to the form in statsmodels

junpenglao · 2017-07-03T13:55:22Z

pinging @eph

eph · 2017-07-03T20:58:29Z

I have written a more general (p lag) autoregressive model. Should I replace AR1 with this class (after tests are written of course). What do you want the tests written against? I know statsmodels was mentioned, but 1) the loglikelihood calculation there is not completely clear to me and 2) I thought we didn't want to keep that dependency.

class AR(distribution.Continuous):
    R"""
    Autoregressive process with p lags.

    .. math::

       x_t = \rho_0 + \rho_1 x_{t-1} + \ldots + \rho_p x_{t-p} + \epsilon_t,
       \epsilon_t \sim N(0,\sigma^2)

    
    The innovation can be parameterized either in terms of precision
    or standard deviation. The link between the two parametrizations is
    given by

    .. math::

       \tau = \dfrac{1}{\sigma^2}


    Parameters
    ----------
    rho : tensor
        Vector of autoregressive coefficients. 

    sd : float 
        Standard deviation of innovation (sd > 0).

    tau : float
        Precision of innovation (tau > 0).

    constant: bool (optional, default = False)
        Whether to include a constant. 
    """

    def __init__(self, rho, sd=None, tau=None,
                 constant=False, *args, **kwargs):
        
        super(AR, self).__init__(*args, **kwargs)
        tau, sd = get_tau_sd(tau=tau, sd=sd)
        self.sd = tt.as_tensor_variable(sd)
        self.tau = tt.as_tensor_variable(tau)

        self.mean = tt.as_tensor_variable(0.)

        rho = tt.as_tensor_variable(rho,ndim=1)
        if constant:
            self.p = rho.shape[0] - 1
        else:
            self.p = rho.shape[0]

        #if self.p <= 0:
        #    raise ValueError('too few lags.')

        self.constant = constant
        self.rho = rho
        

    def logp(self, value):

        y = value[self.p:]
        results,_ = scan(lambda l, obs, p: obs[p-l:-l],
                         outputs_info=None,sequences=[tt.arange(1,self.p+1)],
                         non_sequences=[value, self.p])
        x = tt.stack(results)

        if self.constant:
            y = y - self.rho[0]
            eps = y - self.rho[1:].dot(x)
        else:
            eps = y - self.rho.dot(x)

        innov_like = Normal.dist(mu=0.0, tau=self.tau).logp(eps)
        return tt.sum(innov_like)

twiecki · 2017-07-03T21:09:42Z

Yes, general AR would be great. Tests are difficult here, I wouldn't block on it. Perhaps just a regression test with fixed values? Or maybe a simple case where the result can be confirmed by hand?

junpenglao · 2017-07-14T19:51:06Z

There are conflicts @eph

ferrine · 2017-07-14T22:38:47Z

pymc3/distributions/timeseries.py

+    ----------
+    rho : tensor
+        Vector of autoregressive coefficients. 
+


We usually do not use blank lines between

junpenglao · 2017-07-31T06:08:20Z

pymc3/distributions/distribution.py

    def _repr_latex_(self, name=None, dist=None):
        return None
-
+                                                                     


This should no appear.

junpenglao · 2017-07-31T06:08:36Z

pymc3/distributions/distribution.py

                dtype = 'int64'
        if dtype != 'int16' and dtype != 'int64':
            raise TypeError('Discrete classes expect dtype to be int16 or int64.')
-


This should not be deleted.

junpenglao · 2017-07-31T06:10:00Z

hi @eph, could you please edit the PR following the review? We should merge it after test pass.

ferrine · 2017-07-31T07:21:42Z

The notebook needs to be refined

junpenglao · 2017-08-10T09:07:02Z

Hi @eph, hope you dont mind I step in to fix the conflict and the notebook.

As for this PR, how should we process? The test should be further refined, and the general AR logp could be improved as well (without using theano.scan). But in my opinion, the pm.timeseries needs some refactoring and further documentation, so it might be better to address these in a separate PR.

junpenglao · 2017-08-10T11:27:01Z

OK, after discuss with @twiecki i am merging this now. Further work on AR is referenced in #2491.

junpenglao · 2017-08-10T11:27:27Z

Thanks for the contribution @eph!

eph · 2017-08-11T23:01:28Z

Hi -- sorry for dropping the ball at the end of this PR. Thanks for merging it in! My hope is to implement a vector autoregression next. How would build the lag matrix without looping (theano.scan)? Some kind of algebraic operation? or some other theano reshape / broadcasting procedure?

junpenglao · 2017-08-12T07:23:57Z

@eph no problem ;-)
I have no given much thought about how to build the logp without theano.scan, but judging from the implementation from statsmodel it seems possible? Just build a large (sparse) covariance matrix?
Maybe we can leverage some of our GP modules?
We can further discuss it in #2491.

fix AR(1) log likelihood

1887b7a

junpenglao added this to the 3.2 milestone Jun 8, 2017

twiecki reviewed Jun 9, 2017

View reviewed changes

eph added 2 commits July 14, 2017 15:02

add AR model

d7a1bc1

Merge branch 'AR1-fix' of github.com:eph/pymc3 into VAR

e310bfc

eph added 2 commits July 14, 2017 15:54

Merge branch 'master' into AR1-fix

7a7818d

fix testing floats

24aa987

ferrine reviewed Jul 14, 2017

View reviewed changes

junpenglao reviewed Jul 31, 2017

View reviewed changes

junpenglao force-pushed the AR1-fix branch from 9fdd3e5 to 24aa987 Compare August 10, 2017 08:41

Junpeng Lao added 4 commits August 10, 2017 10:49

resolve conflict

d6c6155

Resolve conflict 2

989f395

Merge branch 'master' into AR1-fix

28f83bb

refine notebook

655c5df

notebook minor fix

ec71e82

junpenglao merged commit f581520 into pymc-devs:master Aug 10, 2017



		def AR1_logpdf(value, k, tau_e):
		return (sp.norm(loc=0,scale=1/np.sqrt(tau_e)).logpdf(value[0]) +

fix AR(1) log likelihood #2285

fix AR(1) log likelihood #2285

Uh oh!

Conversation

eph commented Jun 8, 2017

Uh oh!

junpenglao commented Jun 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki commented Jun 23, 2017

Uh oh!

junpenglao commented Jun 23, 2017

Uh oh!

twiecki commented Jun 23, 2017

Uh oh!

ferrine commented Jun 28, 2017

Uh oh!

junpenglao commented Jun 28, 2017

Uh oh!

ferrine commented Jun 28, 2017

Uh oh!

junpenglao commented Jun 28, 2017

Uh oh!

junpenglao commented Jul 3, 2017

Uh oh!

eph commented Jul 3, 2017

Uh oh!

twiecki commented Jul 3, 2017

Uh oh!

junpenglao commented Jul 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junpenglao commented Jul 31, 2017

Uh oh!

ferrine commented Jul 31, 2017

Uh oh!

junpenglao commented Aug 10, 2017

Uh oh!

junpenglao commented Aug 10, 2017

Uh oh!

junpenglao commented Aug 10, 2017

Uh oh!

eph commented Aug 11, 2017

Uh oh!

junpenglao commented Aug 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants