CAR random variables #4596

aerubanov · 2021-03-31T16:54:44Z

close #4518
I Changed CAR distribution class to compatible with v4 RandomVariable and implemented CAR random variable class. But now I need some advises for further improvements.

@brandonwillard, I have few questions about random variable for CAR:

if some parameter of random variable have param that may be scalar or vector (like mu, tau and alpa), what I should specify in ndims_params? If I pass scalar instead vector an error occurred.
Should I somehow handle size param in rng_fn method?
Where I should store intermediate variables (like D and lam) that need to be calculated only once. Initially they were calculated in the __init__ method of distribution class and stored as distribution class atribute.

@ckrapu, Do you have any ideas for testing the implementation of the algorithm?

brandonwillard · 2021-03-31T17:33:41Z

if some parameter of random variable have param that may be scalar or vector (like mu, tau and alpa), what I should specify in ndims_params? If I pass scalar instead vector an error occurred.

It sounds like you need to find a "minimally sufficient" space into which you can embed all the parameters. The real question is "What's the correspondence between the dimensions of these parameters?" For instance, if mu is a scalar, what should tau be, etc.?

Should I somehow handle size param in rng_fn method?

Yes, that's a mandatory parameter. You'll need to understand those dimensions above in order to implement this, though; otherwise, take a look at the implementation of MvNormalRV for a brute-force approach to implementing size.

Where I should store intermediate variables (like D and lam) that need to be calculated only once. Initially they were calculated in the __init__ method of distribution class and stored as distribution class atribute.

You can add those as parameters to the Op and use Op.__init__ to compute the derived quantities and hide them from the caller.

aerubanov · 2021-03-31T17:50:35Z

It sounds like you need to find a "minimally sufficient" space into which you can embed all the parameters. The real question is "What's the correspondence between the dimensions of these parameters?" For instance, if mu is a scalar, what should tau be, etc.?

@brandonwillard No, my question was about another thing. The CAR class docs specifying that a float or array can be passed as these parameters, optionaly. How I should I handle this? Should I transform float to array in dist method? Or some other way?

brandonwillard · 2021-03-31T18:05:23Z

Should I transform float to array in dist method? Or some other way?

If you have the parameter dimensions worked out already, then there's no question about whether or not you should—for instance—convert a scalar to a vector, matrix, etc. If it's a question of where to do the conversion, often that kind of thing is done in Op.make_node.

Assuming that's the case, and you need to convert a scalar to a vector, just add a broadcastable dimension to the scalar value. There are aesara.tensor.shape_pad* functions that will do this, as well as aesara.tensor.reshape.

(Aesara could probably use implementations of the numpy.atleast_*, too.)

N.B.: There are also some broadcasting helper functions used throughout the RandomVariable implementations in Aesara. For instance, check out broadcast_params.

pymc3/distributions/multivariate.py

aerubanov · 2021-03-31T19:04:09Z

@brandonwillard , Thanks for the tips and quick response! I will make the necessary changes.

ckrapu · 2021-04-01T01:00:23Z

@ckrapu, Do you have any ideas for testing the implementation of the algorithm?

The testing strategy I used for the original CAR implementation was to make sure that the CAR logp and the equivalent multivariate normal with CAR-structured covariance matrix were equivalent, up to an additive constant. You can see how that is implemented here.

ckrapu · 2021-04-01T01:10:42Z

pymc3/distributions/multivariate.py

+            u += 1
+        L = scipy.linalg.cholesky_banded(Qb, lower=False)
+        z = rng.normal(size=W.shape[0], loc=mu)
+        samples = scipy.linalg.cho_solve_banded((L, False), z)


I'm not familiar with the desired signature of rng_fn, but it's not clear to me that this allows for sampling multiple CAR draws simultaneously. If that's not the case, then one-time solution to the permutation optimization problem in reverse_cuthill_mckee is not going to be better than simply sampling from a multivariate normal with the CAR covariance matrix.

I think that the size parameter is just what you need to implement the possibility of sampling multiple CAR draws simultaneously. So I should change my implementation of this method.

brandonwillard · 2021-04-02T17:13:39Z

pymc3/distributions/multivariate.py

+    dtype = "floatX"
+    _print_name = ("CAR", "\\operatorname{CAR}")
+
+    def make_node(self, rng, size, dtype, *dist_params):


It's better to explicitly specify the parameters (i.e. mu, W, etc.); otherwise, the signature is unnecessarily ambiguous.

Also, you'll likely need to convert those distribution parameters to Aesara variables using aesara.tensor.as_tensor[_variable].

@brandonwillard, Should I do this convertion in the make_node method of RandomVareable instead of the dist method of CAR Distribution class?

Yeah, it's always good to do that in Op.make_node.

aerubanov · 2021-04-06T14:51:44Z

I add handling of size param in rng_fn method and move convertion to aesara tensor in make_node method. But now, when it trying to calculate logp for CAR distribution i get this error:

n[2]: import pymc3 as pm, numpy as np
Backend TkAgg is interactive backend. Turning interactive mode on.
In[3]: W = np.array([[1, 0, 0, 1, 0], [0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 1, 0, 1, 0], [0, 0, 1, 0, 1]])
In[4]: pm.logpt(pm.CAR.dist([0, 0, 0, 0, 0], W, [0.5, 0.5, 0.5, 0,5], [1, 1, 1, 1, 1]),np.random.randn(5)).eval()
Traceback (most recent call last):
  File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-71c3b392bce9>", line 1, in <module>
    pm.logpt(pm.CAR.dist([0, 0, 0, 0, 0], W, [0.5, 0.5, 0.5, 0,5], [1, 1, 1, 1, 1]),np.random.randn(5)).eval()
  File "/home/anatoly/HDD/Projects/pymc3/pymc3/distributions/logp.py", line 165, in logpt
    rv_value = rv_var.type.filter_variable(rv_value.astype(rv_var.dtype))
  File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/tensor/type.py", line 258, in filter_variable
    raise TypeError(
TypeError: Cannot convert Type TensorType(float64, vector) (of Variable TensorConstant{[ 2.000324...65305157]}) into Type TensorType(float64, matrix). You can try to manually convert TensorConstant{[ 2.000324...65305157]} into a TensorType(float64, matrix).

I guess I just did something wrong on the last call. @brandonwillard , could you help me with these error?

I also plan to start figuring out how to write tests.Based on the discussion above, I think we can compare the sampling from CAR with the sampling from the equivalent multivariate normal with CAR-structured covariance matrix. But perhaps it is not the samples themselves that should be compared, but the statistics calculated from their distribution or something similar.

ckrapu · 2021-04-06T17:24:54Z

> TypeError: Cannot convert Type TensorType(float64, vector) (of Variable TensorConstant{[ 2.000324...65305157]}) into Type TensorType(float64, matrix). You can try to manually convert TensorConstant{[ 2.000324...65305157]} into a TensorType(float64, matrix).

If you try using np.random.randn(5,1) instead, does it work?

pymc3/distributions/multivariate.py

brandonwillard · 2021-04-07T04:55:09Z

pymc3/distributions/multivariate.py

-        self.median = self.mode = self.mean = self.mu = at.as_tensor_variable(mu)
-        self.sparse = sparse
+    @classmethod
+    def dist(cls, mu, W, alpha, tau, sparse=False, *args, **kwargs):

        if not W.ndim == 2 or not np.allclose(W, W.T):


NumPy cannot be used here, because W could be a TensorVariable. You'll need to assert symmetry in a symbolic way—or simply remove the check for now.

@aerubanov, this still needs to be resolved.

@brandonwillard , Thanks for the reminder, I made the changes.

brandonwillard · 2021-04-07T04:57:32Z

pymc3/distributions/multivariate.py


-    def logp(self, value):
+    def logp(value, mu, W, alpha, tau, sparse=False):


There's no need for a sparse parameter here; plus, this option would only be useful to someone directly calling this function, which isn't too likely or common.

More importantly, you can determine whether or not W is sparse by inspection.

pymc3/distributions/multivariate.py

brandonwillard · 2021-04-07T05:18:42Z

I pushed some of the recommendations I made above and re-enabled the old CAR test in test_distributions, so you'll need to rebase locally.

Otherwise, according to the old test, it looks like the log-likelihood could be working; however, that test needs an addition for at least one size > 2D.

Also, it looks like we need to add a CAR test to test_distributions_random for the random sampling in CARRV (I couldn't find one in there).

aerubanov · 2021-04-07T14:09:07Z

Thanks for review, @brandonwillard !

Otherwise, according to the old test, it looks like the log-likelihood could be working; however, that test needs an addition for at least one size > 2D.

Yes, I will try to add test some tests for such case.

Also, it looks like we need to add a CAR test to test_distributions_random for the random sampling in CARRV (I couldn't find one in there).

I think about solution based on comparison samples from CAR and MultivariateNormal with CAR-structured covariance matrix, similar to what we do for logpt. I discused this idea with @ferrine , and he suggested to use Sliced-Wasserstein distance for comparison metric . What do you think?

ricardoV94 · 2021-04-07T14:17:11Z

I think about solution based on comparison samples from CAR and MultivariateNormal with CAR-structured covariance matrix, similar to what we do for logpt. I discused this idea with @ferrine , and he suggested to use Sliced-Wasserstein distance for comparison metric . What do you think?

We have some tests for random methods that use the Kolgomorov-Smirnov test to assess the equivalence between pymc3 and a reference number generator (which could be the MultivariateNormal in your case). Maybe you can adapt the logic from there: https://github.com/pymc-devs/pymc3/blob/e5c42b47e43940837afe171f7a30c8ea89f54ed2/pymc3/tests/test_distributions_random.py#L60

aerubanov · 2021-04-07T14:49:15Z

@ricardoV94 , thank you, I will check this.

aerubanov · 2021-04-13T20:59:56Z

I added a test for the rng_fn method based on the logic from the pymc3_random() function in pymc3/pymc3/tests/test_distributions_random.py, but it fails =(. Now I'm trying to fix it.

@brandonwillard , @ricardoV94, do you have any ideas what I am doing wrong?

ricardoV94 · 2021-04-14T05:29:35Z

Does the test pass if you duplicate the CAR and compare against itself, instead of against the MvNormal? If it does it means the two distributions you have now are not really equivalent

aerubanov · 2021-04-14T14:03:34Z

@ricardoV94 , yes, if I duplicate CAR distribution, the test passes. But as @ckrapu pointed out, this test is not correct for this distribution. So I will try to use the options he suggested.

aerubanov · 2021-04-14T15:59:39Z

I experimented with different W matrix formats, and sometimes the test passed, but sometimes failed. So, it seems that the KS test is actually not appropriate here.

brandonwillard · 2021-04-15T17:45:32Z

I experimented with different W matrix formats, and sometimes the test passed, but sometimes failed. So, it seems that the KS test is actually not appropriate here.

If they're only failing within a smallish window near the numerical limit/precision, then that's fine. Also, we'll need to seed the tests, which will obviously help.

aerubanov · 2021-04-19T16:07:42Z

I changed test for CAR rng_fn method. Now we compare samples for each dimensions piece by piece. and looks like it work better. @brandonwillard,

Also, we'll need to seed the tests, which will obviously help.

I added numpy random seed setting before starting to generate samples. Is there anything else that needs to be added?

brandonwillard · 2021-05-11T23:49:51Z

pymc3/distributions/multivariate.py

+            W_sparse = scipy.sparse.csr_matrix(W)
+            W = aesara.sparse.as_sparse_variable(W_sparse)


Suggested change

W_sparse = scipy.sparse.csr_matrix(W)

W = aesara.sparse.as_sparse_variable(W_sparse)

W = aesara.sparse.as_sparse_variable(W)

Just like before, W could be a TensorVariable, so calling scipy.sparse.csr_matrix isn't sound.

Also, as I mentioned in the other comment, the sparse option is unnecessary; the caller should provide a sparse variable if they want to use one, and the logic here should handle both cases.

aerubanov · 2021-05-12T13:49:32Z

@brandonwillard , I remove sparse argument, but now I have two problems:

when I try to pass aesara SparseVariable as W in CAR constructor, I get this error:

 File "<ipython-input-13-b44074052d76>", line 2, in <module>
  car = pm.CAR("car", mu, ws, alpha, tau)
..........................
File "/home/anatoly/HDD/Projects/pymc3/pymc3/distributions/multivariate.py", line 1963, in make_node
  return super().make_node(rng, size, dtype, mu, W, alpha, tau)
File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/tensor/random/op.py", line 374, in make_node
  bcast = self.compute_bcast(dist_params, size)
File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/configparser.py", line 49, in res
  return f(*args, **kwargs)
File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/tensor/random/op.py", line 287, in compute_bcast
  shape = self._infer_shape(size, dist_params)
File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/tensor/random/op.py", line 224, in _infer_shape
  params_ind_slice = tuple(
File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/tensor/random/op.py", line 225, in <genexpr>
  slice_ind_dims(p, ps, n)
File "/home/anatoly/anaconda3/envs/pymc3-dev-py38/lib/python3.8/site-packages/aesara/tensor/random/op.py", line 214, in slice_ind_dims
  for s, b in zip(shape[:-n], p.broadcastable[:-n])
AttributeError: 'SparseVariable' object has no attribute 'broadcastable'

But if I pass ndarray or TensorVariable all works fine.

test for CAR logp method fails for 3D size, because aesara.sparse.dot() accept only 1 or 2 ndim inputs.

What should I do about these errors?

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

brandonwillard · 2021-07-12T14:04:21Z

@brandonwillard Is it okay to merge with the current .eval() hack or should we wait for an upstream fix / find another alternative if possible?

No, we shouldn't merge evaluations like that.

aerubanov · 2021-07-12T14:10:59Z

@ricardoV94 I rebased on main and I will open issues about .sum() for sparse tensor in aesara and here.

aerubanov · 2021-07-12T17:56:01Z

@ricardoV94, I removed .eval() hack. @brandonwillard noticed me in aesara-devs/aesara#522 that sp_sum() fit mach better here.

aerubanov · 2021-07-13T09:17:07Z

@ricardoV94, I changed formatting and add tolerance selection based on the aesara.config.floatX value in test_car_logp. These should fixed failed tests.

codecov · 2021-07-13T09:37:29Z

Codecov Report

Merging #4596 (c89dad2) into main (a3ee747) will increase coverage by 0.84%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #4596      +/-   ##
==========================================
+ Coverage   72.32%   73.16%   +0.84%     
==========================================
  Files          85       86       +1     
  Lines       13884    13838      -46     
==========================================
+ Hits        10042    10125      +83     
+ Misses       3842     3713     -129

Impacted Files	Coverage Δ
pymc3/distributions/multivariate.py	`71.45% <100.00%> (+7.61%)`	⬆️
pymc3/tests/conftest.py	`90.47% <100.00%> (+2.24%)`	⬆️
pymc3/distributions/dist_math.py	`87.36% <0.00%> (-4.22%)`	⬇️
pymc3/math.py	`67.85% <0.00%> (-0.49%)`	⬇️
pymc3/distributions/discrete.py	`98.97% <0.00%> (-0.03%)`	⬇️
pymc3/__init__.py	`100.00% <0.00%> (ø)`
pymc3/sampling_jax.py	`0.00% <0.00%> (ø)`
pymc3/distributions/__init__.py	`100.00% <0.00%> (ø)`
pymc3/printing.py	`85.85% <0.00%> (ø)`
pymc3/sampling.py	`85.67% <0.00%> (+0.01%)`	⬆️
... and 10 more

pymc3/distributions/multivariate.py

aerubanov · 2021-07-14T11:34:59Z

@ricardoV94 , I added check for W symmetry and test for it.

pymc3/distributions/multivariate.py

ricardoV94 · 2021-07-26T06:59:00Z

pymc3/tests/test_distributions.py

+        W = aesara.sparse.csr_from_dense(W)
+
+    car_dist = CAR.dist(mu, W, alpha, tau)
+    with pytest.raises(AssertionError):


Can you match the error message?

Suggested change

with pytest.raises(AssertionError):

with pytest.raises(AssertionError, match="W must be a symmetric adjacency matrix"):

And add a test for the other error (ndim=2)?

ricardoV94

Looks almost there. Just suggested a tweak to the error tests

aerubanov · 2021-07-26T11:51:13Z

@ricardoV94 , I added changes that you suggested.

ricardoV94 · 2021-07-26T11:54:31Z

CC @ckrapu in case he has the chance to take a look before merging

ckrapu · 2021-07-26T15:02:03Z

Thanks for the ping - I am very excited to see that the fast CAR sampling method is implemented. I often find scenarios where I need to sample draws from this kind of random field even outside of the context of prior predictive sampling.

ricardoV94

Looks good to me, trusting the rng_fn algorithm is correctly implemented.

Thanks a lot @aerubanov! This was a tough one, hopefully your next PR will be easier to crack :D

twiecki · 2021-07-27T12:32:46Z

Thanks @aerubanov, this was indeed a tricky one!

CAR random variables (pymc-devs#4596)

brandonwillard suggested changes Mar 31, 2021

View reviewed changes

pymc3/distributions/multivariate.py Outdated Show resolved Hide resolved

ckrapu reviewed Apr 1, 2021

View reviewed changes

brandonwillard suggested changes Apr 2, 2021

View reviewed changes

aerubanov force-pushed the CAR_rand_var branch from eab7b6b to 26141f6 Compare April 6, 2021 14:51

brandonwillard suggested changes Apr 7, 2021

View reviewed changes

brandonwillard force-pushed the CAR_rand_var branch from 26141f6 to 1106e32 Compare April 7, 2021 05:11

ricardoV94 mentioned this pull request Apr 10, 2021

V4 update test framework for distributions random method 2nd attempt #4608

Merged

3 tasks

aerubanov force-pushed the CAR_rand_var branch from fc44b7b to 8bbdd5b Compare April 21, 2021 10:35

aerubanov requested a review from brandonwillard April 23, 2021 15:52

aerubanov force-pushed the CAR_rand_var branch from 8bbdd5b to 5bfe6c8 Compare May 9, 2021 20:50

brandonwillard suggested changes May 11, 2021

View reviewed changes

ricardoV94 mentioned this pull request May 12, 2021

Port remaining distributions to v4 #4686

Closed

26 tasks

aerubanov and others added 2 commits July 12, 2021 16:57

Update pymc3/tests/test_distributions.py

9572625

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

add floatx conversion

10f7fb1

aerubanov force-pushed the CAR_rand_var branch from b77ca16 to 10f7fb1 Compare July 12, 2021 14:05

aerubanov mentioned this pull request Jul 12, 2021

Add sum to _sparse_py_operators aesara-devs/aesara#522

Closed

remove hack for .sum()

c52047e

aerubanov added 2 commits July 13, 2021 11:00

fix formating

6668ad2

add tolerance setting based on float type

0ccd54d

ricardoV94 reviewed Jul 13, 2021

View reviewed changes

pymc3/distributions/multivariate.py Show resolved Hide resolved

add W symmetry check

35c4a43

ricardoV94 reviewed Jul 14, 2021

View reviewed changes

pymc3/distributions/multivariate.py Show resolved Hide resolved

add aesara assert

5ddd1d5

aerubanov requested a review from ricardoV94 July 22, 2021 15:32

ricardoV94 reviewed Jul 26, 2021

View reviewed changes

add test for ndim check

c89dad2

ricardoV94 approved these changes Jul 27, 2021

View reviewed changes

twiecki approved these changes Jul 27, 2021

View reviewed changes

twiecki merged commit 819f045 into pymc-devs:main Jul 27, 2021

aerubanov deleted the CAR_rand_var branch July 27, 2021 13:58

sthagen added a commit to sthagen/pymc-devs-pymc that referenced this pull request Jul 27, 2021

Merge pull request #77 from pymc-devs/main

f966049

CAR random variables (pymc-devs#4596)


		def logp(self, value):
		def logp(value, mu, W, alpha, tau, sparse=False):

		W_sparse = scipy.sparse.csr_matrix(W)
		W = aesara.sparse.as_sparse_variable(W_sparse)

	W_sparse = scipy.sparse.csr_matrix(W)
	W = aesara.sparse.as_sparse_variable(W_sparse)
	W = aesara.sparse.as_sparse_variable(W)

	with pytest.raises(AssertionError):
	with pytest.raises(AssertionError, match="W must be a symmetric adjacency matrix"):

Uh oh!

CAR random variables #4596

CAR random variables #4596

Uh oh!

Conversation

aerubanov commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Mar 31, 2021

Uh oh!

aerubanov commented Mar 31, 2021

Uh oh!

brandonwillard commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aerubanov commented Mar 31, 2021

Uh oh!

ckrapu commented Apr 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aerubanov commented Apr 6, 2021

Uh oh!

ckrapu commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brandonwillard commented Apr 7, 2021

Uh oh!

aerubanov commented Apr 7, 2021

Uh oh!

ricardoV94 commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aerubanov commented Apr 7, 2021

Uh oh!

aerubanov commented Apr 13, 2021

Uh oh!

ricardoV94 commented Apr 14, 2021

Uh oh!

aerubanov commented Apr 14, 2021

Uh oh!

aerubanov commented Apr 14, 2021

Uh oh!

brandonwillard commented Apr 15, 2021

Uh oh!

aerubanov commented Apr 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aerubanov commented May 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonwillard commented Jul 12, 2021

Uh oh!

aerubanov commented Jul 12, 2021

Uh oh!

aerubanov commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

aerubanov commented Mar 31, 2021 •

edited

Loading

brandonwillard commented Mar 31, 2021 •

edited

Loading

ckrapu commented Apr 1, 2021 •

edited

Loading

ckrapu commented Apr 6, 2021 •

edited

Loading

ricardoV94 commented Apr 7, 2021 •

edited

Loading

aerubanov commented May 12, 2021 •

edited

Loading

aerubanov commented Jul 12, 2021 •

edited

Loading

codecov bot commented Jul 13, 2021 •

edited

Loading