Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored LKJCorr and _LKJCholeskyCov #4784

Closed
wants to merge 1 commit into from
Closed

Conversation

kc611
Copy link
Contributor

@kc611 kc611 commented Jun 19, 2021

This PR refactors LKJCorr and _LKJCholeskyCov multivariate distributions for v4.

@kc611 kc611 marked this pull request as draft June 19, 2021 08:44
@ricardoV94 ricardoV94 mentioned this pull request Jun 19, 2021
26 tasks
@AlexAndorra
Copy link
Contributor

Thanks for this PR @kc611 ! In the wrapper function LKJCholeskyCov, I think compute_corr should be set to Trueby default now, as v4 is gonna be a breaking change release anyway and people are mostly interested in correlations and standard deviations when using a covariance matrix

@tomicapretto
Copy link
Contributor

I have a couple of examples where I'm trying using LKJCholeskyCov and LKJCorr to compute the prior distribution for the covariance matrix of group-specific effects. In both cases, I have to manipulate the result they provide. Would you like me to share these examples with you, together with the difficulties I'm facing, to see if that can help with this refactoring?

The context is that I'm trying to enable correlated priors for group-specific effects in Bambi and I'm trying to figure out what is the most efficient way of doing it in PyMC3.

@kc611
Copy link
Contributor Author

kc611 commented Jun 24, 2021

Hi Tomas,
Sure, you can share them. The LKJCorr is nearly finished with refactoring. The current failure is a corner case which will need some discussion. Maybe with your help, we could find some flaws in the current implementation which I might've overlooked. 🙂

@@ -2051,7 +2051,6 @@ def test_wishart(self, n):
pass

@pytest.mark.parametrize("x,eta,n,lp", LKJ_CASES)
@pytest.mark.xfail(reason="Distribution not refactored yet")
def test_lkj(self, x, eta, n, lp):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricardoV94 is the case (np.array([0.7,0,-0.7]), 0, 3, -np.inf) here a extreme corner case ?

It looks like previously, this tests didn't pass through random (which now is it's rng_fn) for logp, now it does. And hence fails.

@tomicapretto
Copy link
Contributor

Hi Tomas,
Sure, you can share them. The LKJCorr is nearly finished with refactoring. The current failure is a corner case which will need some discussion. Maybe with your help, we could find some flaws in the current implementation which I might've overlooked.

Here the example notebook. I uploaded it to Google Drive because Github does not allow sharing notebooks here.

@codecov
Copy link

codecov bot commented Jul 7, 2021

Codecov Report

Merging #4784 (056251f) into main (7bd8704) will decrease coverage by 4.44%.
The diff coverage is 90.07%.

❗ Current head 056251f differs from pull request most recent head f03976c. Consider uploading reports for the commit f03976c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    pymc-devs/pymc#4784      +/-   ##
==========================================
- Coverage   80.44%   76.00%   -4.45%     
==========================================
  Files          82       88       +6     
  Lines       14132    14142      +10     
==========================================
- Hits        11369    10749     -620     
- Misses       2763     3393     +630     
Impacted Files Coverage Δ
pymc/distributions/multivariate.py 71.63% <90.07%> (-3.95%) ⬇️
pymc/sampling_jax.py 0.00% <0.00%> (-97.46%) ⬇️
pymc/printing.py 22.22% <0.00%> (-63.64%) ⬇️
pymc/distributions/bound.py 39.17% <0.00%> (-60.83%) ⬇️
pymc/gp/gp.py 60.25% <0.00%> (-32.94%) ⬇️
pymc/distributions/discrete.py 79.93% <0.00%> (-18.85%) ⬇️
pymc/gp/util.py 80.00% <0.00%> (-14.69%) ⬇️
pymc/variational/stein.py 38.00% <0.00%> (-10.34%) ⬇️
pymc/distributions/continuous.py 86.71% <0.00%> (-10.23%) ⬇️
... and 56 more

@twiecki
Copy link
Member

twiecki commented Jul 22, 2021

There's also https://github.com/pymc-devs/pymc3/issues/3473 which would be good to address.

@ricardoV94
Copy link
Member

Any progress on this one?

@kc611
Copy link
Contributor Author

kc611 commented Aug 20, 2021

Porting the LKJCorr is finished (we can merge that separately if we want). However LKJCholeskyCov (is refactored but) still fails upon sampling. And I can't narrow down the reason. (See the failing tests for LKJCholeskyCov.)

So I'll probably need help on this one. Possibly from someone who is familiar with this distribution and the sampling thereof.

@kc611 kc611 force-pushed the add_lkj branch 2 times, most recently from 6599c94 to f4bd741 Compare August 20, 2021 17:28
@twiecki
Copy link
Member

twiecki commented Aug 26, 2021

@kc611 I think you need to rebase.

@falkmielke
Copy link

falkmielke commented Sep 22, 2021

Dear all,
I required this PR (see here) and used it locally.
There was a problem with dimensionality when nu in a MvStudentT has a shape, e.g.
nu = pm.Gamma('dof', 5., 0.1, shape = (1, 2))

traceback:

Traceback (most recent call last):
  File "AnotherMinimalExample.py", line 71, in <module>
    vals = pm.MvStudentT('vals' \
  File "pymc3/pymc3/distributions/distribution.py", line 222, in __new__
    rv_out = model.register_rv(
  File "pymc3/pymc3/model.py", line 1225, in register_rv
    rv_var = self.make_obs_var(rv_var, data, dims, transform)
  File "pymc3/pymc3/model.py", line 1251, in make_obs_var
    raise ShapeError(
pymc3.exceptions.ShapeError: Dimensionality of data and RV don't match. (actual 2 != expected 3)

I can work around this by simply using a single nu; however different nu for a MvStudentT are a plausible use case imo.

@falkmielke
Copy link

And another problem.
pickling a model with this prior does not work:

pickle.PicklingError: Can't pickle <function _LKJCholeskyCov.default_transform.<locals>.transform_params at 0x7fa86a68adc0>: it's not found as pymc3.distributions.multivariate._LKJCholeskyCov.default_transform.<locals>.transform_params

(sorry for the bad news; just trying myself to get it working and hope to help by sharing the pitfalls)

@kc611
Copy link
Contributor Author

kc611 commented Sep 22, 2021

Hi @falkmielke , glad to see you interested in our work. Seems like the current implementation works for you locally for certain cases. Is that right ?

I can work around this by simply using a single nu; however different nu for a MvStudentT are a plausible use case imo.

I think in case of a singular nu it's being broadcasted accordingly. However for a differently shaped nu it's running into some sort of v4 related broadcasting restrictions. The refactoring was done in #4731, so cc @Sayam753 since he'll have more of an idea about the MvStudentT's argument dimensionalities within v4.

pickling a model with this prior does not work:

This seems to be an issue we'll need to look into. However at first glance it seems that this is a transforms related issue. Something that's undergoing changes in #4887. So we'll probably need to wait for that to go in first.

P.S. : The reason why this PR is halted is because the sampling from LKJCholeskyCov is producing unexpected results. Try running pymc3/tests/test_distributions.py::test_lkjcholeskycov() in this particular implementation for the exact issue. So just make sure that whatever you build right now upon this implementation, takes this issue into account.

@falkmielke
Copy link

falkmielke commented Sep 23, 2021

Dear @kc611 ,
thank you for looking into this!
I learned in the meantime that devs had switched to cloudpickle, and that indeed resolved the pickling issue. Guessing the transforms are some functions that pickle can't handle, but cloudpickle can.

I am now stuck with new shape problems, which are again over the top of my head. I guess most are related to other work in progress (e.g. I saw this PR). I'll try to formulate reports once I can clarify what's going on.

Thanks for pointing out the unexpected results. No worries, all my data and results are "reality-checked". For me as a layman, it is a bit frustrating to be working on the dev version, constantly crashing into new issues; but it's also fun and I hope the issues I stumble upon help you guys in further development :)

Cheers!

@falkmielke
Copy link

falkmielke commented Sep 23, 2021

Hi again @kc611 et al.,

I'd like to share some more observations, since I fail to get this running in most cases. Again, this is just the perspective of an end user not closely following the dev progress. I've been drawn into this because I require the latest build to get posterior sampling to work, and I'm still hoping to achieve this. I appreciate any hints or pointers to other PRs which help me to get this running! Meanwhile, I'll keep pulling the latest versions and keep fingers crossed that one day v4 will be there :)

versions

pymc: v4.0 (Sep 22 from github)
+ this pull request merged in locally
aesara: v2.2.2 via pip
python 3.9.7; arch linux

use cases

Just a brief overview of what my use cases are:
(i) I use both [N] MvNormal and [S] MvStudentT
(ii) they come into my models either [o] "observed" (i.e. with the 'observed' keyword) or [p] as a predictor slope
(iii) I can toggle the compute_corr keyword to False [0] or True [1].
I'll refer to these use cases by the identifiers in square brackets, e.g. [N.o.1] is a MvNormal with observed and compute_corr = True.

Issues

Note that all issues I describe here happen in model construction time, i.e. prior to sampling.

Issue 1: compute_corr

First off, a detail, there is an unintuitive difference between [*.*.0] and [*.*.1]. When setting compute_corr = False then the outcome is "packed", i.e. I have to manually pm.expand_packed_triangular to get the Cholesky matrix. This is fine, but I was surprised this is not necessary in the other case. If this is intended and well documented, should be fine.

Issue 2: shaped nu

see my earlier comment, in [S.o.*] there is a pymc3.exceptions.ShapeError: Dimensionality of data and RV don't match. error id nu has a shape.
Otherwise, all [*.o.*] get to sample.

Issue 3: the predictor case [*.p.*]

There seems to be an issue related to shape/size/dim of the rv, and initialization of the multivariate fails when trying to _infer_shape. As mentioned above, this might have to do with other PRs which were included since this one. I thought I'd nevertheless post it so that this PR doesn't acquire too many new issues while halted.

minimal example
import numpy as np
import pymc3 as pm
import aesara as th
import aesara.tensor as tt
import scipy.stats as stats

n_params = 3
cov = np.array([[1., 0.5, 0.2], [0.5, 1.4, -0.4], [0.2, -0.4, 2]])
mu = np.array([-0.4, 1.2, 0.1])#np.zeros(2)
n_observations = 9999

predictor_values = np.random.multivariate_normal(mu, cov, n_observations)

slopes = [0.2, 0.5, -0.3]

observable = np.random.normal(0.8, 0.1, size = n_observations)
for pred, slope in enumerate(slopes):
    observable += predictor_values[:, pred] * slope


model = pm.Model()

with model:

    intercept_data = pm.Data('intercept_data', np.ones((n_observations, 1)))
    intercept_value = pm.Normal('intercept', mu = 0.8, sd = 1., shape = (1,1))
    intercept = tt.dot(intercept_data, intercept_value)


    predictors_data = pm.Data('predictors_data', predictor_values)

    if False:
        chol, corr, stds = pm.LKJCholeskyCov( f'lkj_chol' \
                                              , n = n_params \
                                              , eta = 1. \
                                              , sd_dist = pm.HalfCauchy.dist(1.) \
                                              , compute_corr = True \
                                             )

    else:
        pchol = pm.LKJCholeskyCov( f'lkj_chol' \
                                   , n = n_params \
                                   , eta = 1. \
                                   , sd_dist = pm.HalfCauchy.dist(1.) \
                                   , compute_corr = False \
                                  )
        chol = pm.expand_packed_triangular(n_params, pchol, lower = True)

    predictor_slopes = pm.MvNormal(  f'predictors' \
                                        , mu = 0. \
                                        , chol = chol \
                                        # , nu = 1. \
                                        , shape = n_params \
                                        )
    predictors = tt.dot(predictors_data, predictor_slopes)

    obs = pm.Normal('obs', mu = intercept + predictors, sd = pm.HalfCauchy('residual', 1.), \
                    observed = observable \
                    )
traceback
Traceback (most recent call last):
  File "YetAnotherMinimalExample.py", line 55, in <module>
    predictor_slopes = pm.MvNormal(  f'predictors' \
  File "pymc3/pymc3/distributions/distribution.py", line 218, in __new__
    rv_out = cls.dist(*args, rng=rng, **kwargs)
  File "pymc3/pymc3/distributions/multivariate.py", line 231, in dist
    return super().dist([mu, cov], **kwargs)
  File "pymc3/pymc3/distributions/distribution.py", line 309, in dist
    rv_out = cls.rv_op(*dist_params, size=create_size, **kwargs)
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/basic.py", line 325, in __call__
    return super().__call__(mean, cov, size=size, **kwargs)
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/op.py", line 304, in __call__
    res = super().__call__(rng, size, dtype, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/aesara/graph/op.py", line 275, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/op.py", line 349, in make_node
    bcast = self.compute_bcast(dist_params, size)
  File "/usr/lib/python3.9/site-packages/aesara/configparser.py", line 49, in res
    return f(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/op.py", line 282, in compute_bcast
    shape = self._infer_shape(size, dist_params)
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/op.py", line 219, in _infer_shape
    params_ind_slice = tuple(
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/op.py", line 220, in <genexpr>
    slice_ind_dims(p, ps, n)
  File "/usr/lib/python3.9/site-packages/aesara/tensor/random/op.py", line 212, in slice_ind_dims
    p[ind_slice],
  File "/usr/lib/python3.9/site-packages/aesara/tensor/var.py", line 496, in __getitem__
    raise IndexError("too many indices for array")
IndexError: too many indices for array

I have been trying to mess around in aesara/tensor/random/op.py functions RandomVariable._infer_shape() and default_shape_from_params, but only get the shape problems propagated to a higher scope. The whole concept of these rv's is still a bit over my head, but maybe the fog will clear at some point :)

other branches

Note that I tried to merge in the progress which is going on on other branches. This might help you to anticipate upcoming conflicts.
I noticed that, whereas the branch v4-4523 merges in pretty smoothly, there is considerable trouble with branch revert-shape-args (maybe obvious; this seems to be a major overhaul).

Thanks again for your work on this!

@kc611 kc611 force-pushed the add_lkj branch 2 times, most recently from 7de4596 to 29a7931 Compare October 2, 2021 12:23
sd_vals = at.sqrt(variance)

# TODO: Since the sd_dist logp is added independently,
# we can perhaps compose the logp terms in a more clean way
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with calling logpt directly over here is that logpt tries to filter the input values according to RV's shape. So we'd have to put an limitation on shapes if we try to do it like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean that. But this approach seems fine in any case. As long as it doesn't result in some value variable missing warnings


def random(self, point=None, size=None):
samples = np.linalg.cholesky(C)[..., tril_idx[0], tril_idx[1]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LKJCorrRV can be called over here if this particular operation (i.e. np.linalg.cholesky(C)) along with C *= D[..., :, np.newaxis] * D[..., np.newaxis, :] can be done after we index the C array as C[..., tril_idx[0], tril_idx[1]].

If that's the case then sure we could avoid rewriting the code above and directly call LKJCorrRV.rng_fn upon these arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but worth investigating it

@ricardoV94
Copy link
Member

@kc611 Did you get a chance to resume work on this PR? What were the remaining issues?

* `sd_dist` is now a standard parameter (with a prior specified outside of the distribution)
@ricardoV94
Copy link
Member

I think I found a bug in the old random method of the LKJCholeskyCov

@tomicapretto
Copy link
Contributor

I think I found a bug in the old random method of the LKJCholeskyCov

😨 is it too bad? Do you want a second pair of eyes to double check that?

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 25, 2022

Closing in favor of #5382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants