-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAR random variables #4596
CAR random variables #4596
Conversation
It sounds like you need to find a "minimally sufficient" space into which you can embed all the parameters. The real question is "What's the correspondence between the dimensions of these parameters?" For instance, if
Yes, that's a mandatory parameter. You'll need to understand those dimensions above in order to implement this, though; otherwise, take a look at the implementation of
You can add those as parameters to the |
@brandonwillard No, my question was about another thing. The CAR class docs specifying that a float or array can be passed as these parameters, optionaly. How I should I handle this? Should I transform float to array in |
If you have the parameter dimensions worked out already, then there's no question about whether or not you should—for instance—convert a scalar to a vector, matrix, etc. If it's a question of where to do the conversion, often that kind of thing is done in Assuming that's the case, and you need to convert a scalar to a vector, just add a broadcastable dimension to the scalar value. There are (Aesara could probably use implementations of the N.B.: There are also some broadcasting helper functions used throughout the |
@brandonwillard , Thanks for the tips and quick response! I will make the necessary changes. |
The testing strategy I used for the original CAR implementation was to make sure that the CAR logp and the equivalent multivariate normal with CAR-structured covariance matrix were equivalent, up to an additive constant. You can see how that is implemented here. |
pymc3/distributions/multivariate.py
Outdated
u += 1 | ||
L = scipy.linalg.cholesky_banded(Qb, lower=False) | ||
z = rng.normal(size=W.shape[0], loc=mu) | ||
samples = scipy.linalg.cho_solve_banded((L, False), z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the desired signature of rng_fn
, but it's not clear to me that this allows for sampling multiple CAR draws simultaneously. If that's not the case, then one-time solution to the permutation optimization problem in reverse_cuthill_mckee
is not going to be better than simply sampling from a multivariate normal with the CAR covariance matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the size parameter is just what you need to implement the possibility of sampling multiple CAR draws simultaneously. So I should change my implementation of this method.
pymc3/distributions/multivariate.py
Outdated
dtype = "floatX" | ||
_print_name = ("CAR", "\\operatorname{CAR}") | ||
|
||
def make_node(self, rng, size, dtype, *dist_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to explicitly specify the parameters (i.e. mu
, W
, etc.); otherwise, the signature is unnecessarily ambiguous.
Also, you'll likely need to convert those distribution parameters to Aesara variables using aesara.tensor.as_tensor[_variable]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brandonwillard, Should I do this convertion in the make_node
method of RandomVareable
instead of the dist
method of CAR Distribution
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's always good to do that in Op.make_node
.
I add handling of
I guess I just did something wrong on the last call. @brandonwillard , could you help me with these error? I also plan to start figuring out how to write tests.Based on the discussion above, I think we can compare the sampling from CAR with the sampling from the equivalent multivariate normal with CAR-structured covariance matrix. But perhaps it is not the samples themselves that should be compared, but the statistics calculated from their distribution or something similar. |
If you try using |
pymc3/distributions/multivariate.py
Outdated
self.median = self.mode = self.mean = self.mu = at.as_tensor_variable(mu) | ||
self.sparse = sparse | ||
@classmethod | ||
def dist(cls, mu, W, alpha, tau, sparse=False, *args, **kwargs): | ||
|
||
if not W.ndim == 2 or not np.allclose(W, W.T): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumPy cannot be used here, because W
could be a TensorVariable
. You'll need to assert symmetry in a symbolic way—or simply remove the check for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aerubanov, this still needs to be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brandonwillard , Thanks for the reminder, I made the changes.
pymc3/distributions/multivariate.py
Outdated
|
||
def logp(self, value): | ||
def logp(value, mu, W, alpha, tau, sparse=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for a sparse
parameter here; plus, this option would only be useful to someone directly calling this function, which isn't too likely or common.
More importantly, you can determine whether or not W
is sparse by inspection.
26141f6
to
1106e32
Compare
I pushed some of the recommendations I made above and re-enabled the old CAR test in Otherwise, according to the old test, it looks like the log-likelihood could be working; however, that test needs an addition for at least one Also, it looks like we need to add a CAR test to |
Thanks for review, @brandonwillard !
Yes, I will try to add test some tests for such case.
I think about solution based on comparison samples from |
We have some tests for random methods that use the Kolgomorov-Smirnov test to assess the equivalence between pymc3 and a reference number generator (which could be the MultivariateNormal in your case). Maybe you can adapt the logic from there: https://github.com/pymc-devs/pymc3/blob/e5c42b47e43940837afe171f7a30c8ea89f54ed2/pymc3/tests/test_distributions_random.py#L60 |
@ricardoV94 , thank you, I will check this. |
I added a test for the @brandonwillard , @ricardoV94, do you have any ideas what I am doing wrong? |
Does the test pass if you duplicate the CAR and compare against itself, instead of against the MvNormal? If it does it means the two distributions you have now are not really equivalent |
@ricardoV94 , yes, if I duplicate CAR distribution, the test passes. But as @ckrapu pointed out, this test is not correct for this distribution. So I will try to use the options he suggested. |
I experimented with different W matrix formats, and sometimes the test passed, but sometimes failed. So, it seems that the KS test is actually not appropriate here. |
If they're only failing within a smallish window near the numerical limit/precision, then that's fine. Also, we'll need to seed the tests, which will obviously help. |
I changed test for CAR
I added numpy random seed setting before starting to generate samples. Is there anything else that needs to be added? |
pymc3/distributions/multivariate.py
Outdated
W_sparse = scipy.sparse.csr_matrix(W) | ||
W = aesara.sparse.as_sparse_variable(W_sparse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W_sparse = scipy.sparse.csr_matrix(W) | |
W = aesara.sparse.as_sparse_variable(W_sparse) | |
W = aesara.sparse.as_sparse_variable(W) |
Just like before, W
could be a TensorVariable
, so calling scipy.sparse.csr_matrix
isn't sound.
Also, as I mentioned in the other comment, the sparse
option is unnecessary; the caller should provide a sparse variable if they want to use one, and the logic here should handle both cases.
@brandonwillard , I remove
What should I do about these errors? |
Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
No, we shouldn't merge evaluations like that. |
@ricardoV94 I rebased on |
@ricardoV94, I removed |
@ricardoV94, I changed formatting and add tolerance selection based on the |
Codecov Report
@@ Coverage Diff @@
## main #4596 +/- ##
==========================================
+ Coverage 72.32% 73.16% +0.84%
==========================================
Files 85 86 +1
Lines 13884 13838 -46
==========================================
+ Hits 10042 10125 +83
+ Misses 3842 3713 -129
|
@ricardoV94 , I added check for |
pymc3/tests/test_distributions.py
Outdated
W = aesara.sparse.csr_from_dense(W) | ||
|
||
car_dist = CAR.dist(mu, W, alpha, tau) | ||
with pytest.raises(AssertionError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you match the error message?
with pytest.raises(AssertionError): | |
with pytest.raises(AssertionError, match="W must be a symmetric adjacency matrix"): |
And add a test for the other error (ndim=2
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost there. Just suggested a tweak to the error tests
@ricardoV94 , I added changes that you suggested. |
CC @ckrapu in case he has the chance to take a look before merging |
Thanks for the ping - I am very excited to see that the fast CAR sampling method is implemented. I often find scenarios where I need to sample draws from this kind of random field even outside of the context of prior predictive sampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, trusting the rng_fn
algorithm is correctly implemented.
Thanks a lot @aerubanov! This was a tough one, hopefully your next PR will be easier to crack :D
Thanks @aerubanov, this was indeed a tricky one! |
CAR random variables (pymc-devs#4596)
close #4518
I Changed CAR distribution class to compatible with
v4
RandomVariable
and implemented CAR random variable class. But now I need some advises for further improvements.@brandonwillard, I have few questions about random variable for CAR:
mu
,tau
andalpa
), what I should specify inndims_params
? If I pass scalar instead vector an error occurred.rng_fn
method?D
andlam
) that need to be calculated only once. Initially they were calculated in the__init__
method of distribution class and stored as distribution class atribute.@ckrapu, Do you have any ideas for testing the implementation of the algorithm?