Implement unconstraining transform for LKJCorr #7380

johncant · 2024-06-21T14:26:10Z

I've ported this bijector from tensorflow and added to LKJCorr. This ensures that initial samples drawn from LKJCorr are positive definite, which fixes #7101 . Sampling now completes successfully with no divergences.

There are several parts I'm not comfortable with:

https://github.com/johncant/pymc/blob/fix_lkjcorr_positive_definiteness/pymc/distributions/multivariate.py#L1583 - it seems like a bad idea to run a pytensor graph here. Is there any way to get the LKJCorr n parameter from op or rv without evaling any pytensors?
https://github.com/johncant/pymc/blob/fix_lkjcorr_positive_definiteness/pymc/distributions/transforms.py#L176-L177 - not sure whether or not this is the right way to create a constant pytensor tensor here.

@fonnesbeck @twiecki @jessegrabowski @velochy - please could you take a look? I would like to make sure that this fix makes sense before adding tests and making the linters pass.

Notes:

Tests not yet written, linters not yet ran
The original tensorflow bijector is defined in the opposite sense to pymc transforms, i.e. forward in tensorflow_probability is backward in pymc
The original tensorflow bijector produces cholesky factors, not actual correlation matrices, so in this implementation, we have to do a cholesky decomposition in the forward transform.
In the tensorflow bijector, the triagonal elements of a matrix are filled in a clockwise spiral, as opposed to numpy which defines indices in a row-major order.

Description

Backward method

Start with identity matrix and fill lower triangular elements with unconstrained real numbers.
Normalize each row so the L-2 norm is 1
This is now a Cholesky factor that will always result in positive definite correlation matrices

Forward method

Reconstruct the correlation matrix from its upper triangular elements
Perform cholesky decomposition to obtain L
The diagonal elements of L are multipliers we used to normalize the other elements.
Extract those diagonal elements and divide to undo the backward method

log_jac_det

This was quite complicated to implement, so I used the symbolic jacobian.

Related Issue

Closes BUG: LKJCorr breaks when used as covariance with MvNormal #7101

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7380.org.readthedocs.build/en/7380/

welcome · 2024-06-21T14:26:14Z

]
💖 Thanks for opening this pull request! 💖 The PyMC community really appreciates your time and effort to contribute to the project. Please make sure you have read our Contributing Guidelines and filled in our pull request template to the best of your ability.

jessegrabowski · 2024-06-22T02:41:39Z

pymc/distributions/transforms.py

+
+        # Are the diagonals always guaranteed to be positive?
+        # I don't know, so we'll use abs
+        row_norms = 1/pt.abs(pt.diag(chol))


Yep, always positive. You don't need abs here

jessegrabowski · 2024-06-22T02:52:37Z

pymc/distributions/transforms.py

+        )
+
+    def _jacobian(self, value, *inputs):
+        return pt.jacobian(


pt.jacobian can be quite expensive, because it requires us to loop over every input and compute the associated symbolic gradients. There's a closed form solution for the log-det jacobian in the TFP code, so you can eliminate this method and implement the closed form log-det jac:

n = ps.shape(y)[-1] return -tf.reduce_sum( tf.range(2, n + 2, dtype=y.dtype) * tf.math.log(tf.linalg.diag_part(y)), axis=-1)

diag_part would just be pt.diagonal(y, axis1=-2, axis2=-1). That will account for potential batching on y. So something like:

n = y.shape[-1] return -(pt.arange(2, n+2, dtype=y.dtype) * pt.log(pt.diagonal(y, axis1=-2, axis2=-1))).sum(axis=-1)

Can you point me to some info on how that's derived? Going to need to modify it to work with the correlation matrix.

Never mind, found in the comments in TFP _inverse_log_det_jacobian

jessegrabowski · 2024-06-22T02:56:34Z

pymc/distributions/transforms.py

+        row_indices, col_indices = np.tril_indices(self.n, -1)
+        return (
+            pytensor.shared(row_indices),
+            pytensor.shared(col_indices)


There's no need to save these as shared variables, you can use the numpy indices directly. Making the numpy indices is pretty cheap, I'm not sure its worth it to cache them

jessegrabowski · 2024-06-22T03:44:08Z

pymc/distributions/transforms.py

+
+        return unconstrained[self.tril_r_idxs, self.tril_c_idxs]
+
+    def backward(self, value, *inputs, foo=False):


You need to check that these functions match the expected outputs from TFP. I used the test case from the tfp docs and got the wrong values -- array([0.89442719, 0.81649658, 0.91287093]) vs the reference solution

array([[1. , 0. , 0. ], [0.70710678, 0.70710678, 0. ], [0.66666667, 0.66666667, 0.33333333]])

You did some extra research so I might be missing something?

Something like this matches tfp:

def backward(self, value, *inputs): """ Convert unconstrained real numbers to the off-diagonal elements of the cholesky decomposition of a correlation matrix. """ def unpack_upper_tril_with_eye_diag(x, core_shape): """1D allocation case""" return pt.set_subtensor(pt.eye(core_shape)[*np.tril_indices(core_shape, k=-1)], x[::-1]) value = pt.as_tensor_variable(value) core_shape = value.type.shape[-1] # Vectorize the 1D case to handle potential batch dimensions out = pt.vectorize(partial(unpack_upper_tril_with_eye_diag, core_shape=core_shape), '(n)->(n,n)')(value) # Vector L2 norm without .real call to speed things up a bit norm = pt.sqrt(pt.sum((out ** 2), axis=-1, keepdims=True)) return out / norm

Thanks for that code above, there's some great stuff I can reuse.

Need to address this comment first, since actually working in the first place is really fundamental. Here's a notebook that demonstrates that this implementation does replicate the original reference implementation from TFP: https://colab.research.google.com/drive/1BBNNfBUNJPGT_7MxVboTqvRegJ-TUamc?usp=sharing

Here's why it didn't work for you:

PyMC implementation needs to output the upper triangular elements of the correlation matrix, whereas the TFP implementation outputs a Cholesky factor.

Differences in indexing off-diagonal elements. TFP actually fills in off-diagonal elements in a clockwise spiral, whereas np.triu_indices is row major. I notice you reverse the elements in the code above which is correct for 3x3 but not for larger matrices.

I got access denied to the colab :)

I'm not surprised my code doesn't work, but I'm glad you know why. In the general case, tensorflow concatenates a reflected copy of the array to itself then reshapes and masks out the lower/upper triangle -- see here if you haven't already. There's no reason why we couldn't just do that.

I'm not sure that we need to copy their output 1:1 -- after all, the important thing is that we can go from unconstrained samples to a valid cholesky decomposed correlation matrix. Is the order we put the numbers into the matrix relevant? I'm not sure, but my instinct is no. On the other hand, if we copy 1:1 we can be sure it's right.

PyMC implementation needs to output the upper triangular elements of the correlation matrix, where the TFP implementation outputs a Cholesky factor.

Are you sure? I thought the upper triangular elements are the cholesky factorized correlation matrix. If you're right though we just need to add a matmul to the end right?

Apologies - Link is now open with Viewer permission and I've made you an Editor.

I don't think the order we insert the off-diagonal elements into an array is very important, but it is needed in order to compare results between this implementation and the one in TFP. I would suggest sticking with np.triu_indices here.

Are you sure? I thought the upper triangular elements are the cholesky factorized correlation matrix. If you're right though we just need to add a matmul to the end right?

Yes, you can see this by looking at the implementation of LKJCorr. I originally thought the same thing, implemented the transform accordingly, then was surprised that non-posdef matrices were generated. 🤦

Great! Sounds like you have a good handle on things. I think what would be a really important next step would be to add a test that your implementation correctly makes a round trip from $\mathbb R \to \Omega \to \mathbb R$, where $\Omega$ is the set of correlation matrices.

Awesome - I can do that

jessegrabowski · 2024-06-22T03:45:55Z

pymc/distributions/transforms.py

+        self.n = n
+        self.m = int(n*(n-1)/2) # number of off-diagonal elements
+        self.tril_r_idxs, self.tril_c_idxs = self._generate_tril_indices()
+        self.triu_r_idxs, self.triu_c_idxs = self._generate_triu_indices()


See below, not sure we need to cache these. __init__ is probably unnecessary

jessegrabowski · 2024-06-22T03:47:12Z

pymc/distributions/transforms.py

+        jac = self._jacobian(value)
+        return pt.log(pt.linalg.det(jac))
+
+    def forward(self, value, *inputs):


See below. I'm pretty sure this needs to go from matrix to vector (to match the tfp case) @junpenglao might know for sure.

+1 it is better for the unbounded being a vector.

Sorry, I am a bit confused and don't understand what you mean.

Specifically, do you mean this function needs to work along the last axis for arrays of arbitrary number of dimensions, and that the current iteration assumes that value will only have dimension 1?

ricardoV94 · 2024-06-24T14:14:05Z

pymc/distributions/multivariate.py

@@ -1579,7 +1579,9 @@ def logp(value, n, eta):

 @_default_transform.register(_LKJCorr)
 def lkjcorr_default_transform(op, rv):
-    return MultivariateIntervalTransform(-1.0, 1.0)


Can you delete this transform class as well? It was a (wrong) patch to the problem you're solving

Can do. Just to confirm, you don't consider MultivariateIntervalTransform to be part of pymc's public API?

Nope, can be removed without worries

ricardoV94 · 2024-06-24T14:17:16Z

pymc/distributions/transforms.py

+        self.triu_r_idxs, self.triu_c_idxs = self._generate_triu_indices()
+
+    def _generate_tril_indices(self):
+        row_indices, col_indices = np.tril_indices(self.n, -1)


Not sure if it matters but there is a pt.tril_indices and pt.triu_indices so no need to eval n. If it's already restricted to be constant elsewhere (like the logp), then it's fine either way

I think it's good practice to use the pt version, even if n is fixed

I originally tried to use the pt version, but one of the function calls required constant values. However, I've made so many changes, that might no longer be the case. I'll try the pt version again and see if I can get it to work.

johncant · 2024-07-29T17:52:18Z

Hi, It's unlikely I'm going to have any time to work on this for the next 6 months. The hardest part is coming up with a closed form solution for log_det_jac, which I don't think I'm very close to doing.

twiecki · 2024-07-30T08:32:49Z

Thanks for the update @johncant and for pushing this as far as you did.

ricardoV94 · 2024-09-14T16:37:06Z

tests/distributions/test_transform.py

+    computed_log_jac_det = transform.log_jac_det(y).eval()
+
+    # Expected log determinant: 0 (since row norms are 1)
+    expected_log_jac_det = 0.0


Weak test. Tell it to compare with pytensor jacobian machinery with a non-trivial input. And to reuse test code that already exists to do that

Port TF bijector to ensure posdef LKJCorr samples

79fa542

johncant changed the title ~~Fix #7101 by implementing a transform to that LKJCorr samples are positive definite~~ Fix #7101 by implementing a transform to ensure that LKJCorr samples are positive definite Jun 21, 2024

jessegrabowski requested changes Jun 22, 2024

View reviewed changes

ricardoV94 reviewed Jun 24, 2024

View reviewed changes

ricardoV94 changed the title ~~Fix #7101 by implementing a transform to ensure that LKJCorr samples are positive definite~~ Implement unconstraining transform for LKJCorr Jun 24, 2024

ricardoV94 added the enhancements label Jun 24, 2024

twiecki added 2 commits September 14, 2024 17:07

Use GPT o1 to finish PR.

dcd3a8d

Linter fixes.

408adba

ricardoV94 requested changes Sep 14, 2024

View reviewed changes

Update doc string. Ask o1-mini to improve test.

df723bc


		return unconstrained[self.tril_r_idxs, self.tril_c_idxs]

		def backward(self, value, *inputs, foo=False):

Uh oh!

Implement unconstraining transform for LKJCorr #7380

Are you sure you want to change the base?

Implement unconstraining transform for LKJCorr #7380

Uh oh!

Conversation

johncant commented Jun 21, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Backward method

Forward method

log_jac_det

Related Issue

Checklist

Type of change

Uh oh!

welcome bot commented Jun 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johncant Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johncant commented Jul 29, 2024

Uh oh!

twiecki commented Jul 30, 2024

Uh oh!

johncant commented Jun 21, 2024 •

edited by github-actions bot

Loading

johncant Jun 22, 2024 •

edited

Loading

jessegrabowski Jun 22, 2024 •

edited

Loading

jessegrabowski Jun 24, 2024 •

edited

Loading

ricardoV94 Jun 24, 2024 •

edited

Loading