[WIP] Add new groups to io_pyro #1090

nitishp25 · 2020-02-26T17:44:32Z

Description

Add the following groups to io_pyro:

predictions
constant_data
predictions_constant_data

Tests:

predictions
constant_data
predictions_constant_data

Checklist

Follows official PR format
Includes new or updated tests to cover the new feature
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

nitishp25 · 2020-02-26T17:45:45Z

from_pyro works with only predictions though it cannot retain (chains, draws) in this case.
It's converting (chains, draws) to (1, total_no_of_samples) since it cannot extract nchains and ndraws. Should I add arguments for pred_chains and pred_draws for this case?

Works fine when trace is passed.

OriolAbril · 2020-02-26T18:45:57Z

arviz/data/io_pyro.py

+ idata_origin=None,
+ inplace=False


I would not add this to from_pyro, they could go to a from_pyro_predictions (to mimic pymc3 pattern) but I am not sure it is worth it to include such a function it yet, it would basically be only a concat call, so users can call az.concat directly

OriolAbril · 2020-02-26T18:46:57Z

arviz/data/io_pyro.py

 import pyro

+ if self.predictions is not None and self.pred_dims is None:
+ raise ValueError("Prediction dims are needed for predictions group.")


dims are not needed, if not present they will get some default names by ArviZ internal functions.

But the thing is, the predictions variables have the same name as the posterior_predictive variables, so it uses the posterior_predictive coords which has different value and causes error.

I am not sure I follow, predictions should not have access to dims at any point, only to pred_dims. Then the possible cases are:

pred dims is None -> default dim and coord names

pred dims is a dict -> use pred dims

Yes, both predictions and predictions_constant_data use pred_dims instead of dims. I have raised that error just in case the user passes the predictions but does not pass pred_dims.

All other groups are using the default dims, only the predictions and predictions_constant_data groups need pred_dims because their variable names are same as posterior_predictive and constant_data respectively but their data has actually different dimensions. So they cannot use the default dims in any case. You can see an example here.

Are we on the same page now? Other groups will always use the default dims but predictions and predictions_constant_data must always use pred_dims right?

Yes, my point is that dims can be None, which makes ArviZ use some defaults, pred_dims should have the option of being none and generating default values, not using dims, generating the defaults that correspond to the dataset. I am not sure what I am missing, I thought default dims were generated on a dataset basis, not on an inference data basis

Ohh yes dims are generated on dataset basis.

pred dims is None -> default dim and coord names

Sorry I got lost here, thought that you were talking about self.dims :)

OriolAbril · 2020-02-26T18:50:38Z

Should I add arguments for pred_chains and pred_draws for this case?

Maybe an nchains argument? It should be enough with only one, as the number of samples is already a dimension

OriolAbril · 2020-02-27T22:31:13Z

arviz/data/io_pyro.py

@@ -45,10 +66,18 @@ def __init__(
 self.nchains = self.ndraws = 0


I have just seen the if posterior is not None and chain, draw definition is repeated below. This one can be removed

OriolAbril · 2020-02-27T22:33:38Z

arviz/data/io_pyro.py

 self.coords = coords
 self.dims = dims
+ self.pred_dims = pred_dims
+ self.num_chains = num_chains


I would handle this in the else: self.nchains = self.ndraws = 0 below, to set self.nchains to num_chains and then get the number of draws from the predictions, posterior predictive or prior (I think io_pymc does something similar).

Also, as it is only used in init, it does not need to be saved in self

OriolAbril · 2020-02-27T22:35:39Z

arviz/data/io_pyro.py

+
+ @requires("predictions")
+ def predictions_to_xarray(self):
+ """Convert predictions (out of sample predictions) to xarray."""


I would either remove the out of sample predictions or change it to out of sample posterior predictive

nitishp25 · 2020-02-28T20:31:28Z

Can I now add the tests?

OriolAbril · 2020-02-28T21:10:41Z

arviz/data/io_pyro.py

+ pred_dims: dict
+ Dims for predictions data. Map variable names to their coordinates.
+ num_chains: int
+ Number of chains used for sampling. Only needed when posterior is not provided.


Maybe "ignored if posterior is present instead"

OriolAbril · 2020-02-28T21:12:05Z

arviz/data/io_pyro.py

 self.coords = coords
 self.dims = dims
+ self.pred_dims = pred_dims
+ self.num_chains = num_chains


Also, as it is only used in init, it does not need to be saved in self

OriolAbril · 2020-02-28T21:40:17Z

arviz/data/io_pyro.py

 else:
- self.nchains = self.ndraws = 0
+ raise ValueError("`num_chains` is needed if trace is not given.")


I would not raise an error, for example with prior, sampling is generally not performed with multiple chains, from_pyro with only prior or predictions and no num chains should work (even if merging afterwards does not or does not work properly)

Then should I set chains and draws to 0 here? Or change the elif above to else and within that:

if num_chains is not None: self.nchains = num_chains else: self.nchains = 1 . . .

In the interest of not sending warnings when they are not needed, maybe it would be best to set num_chains=1 in the function definitions. I think it will do the same as the else you are proposing given that num_chains is ignored if posterior is present.

OriolAbril · 2020-02-29T14:42:53Z

Can I now add the tests?

Definitely, I think you have already used check_multiple_attrs, so I'll skip the introduction, as always, ask if you have any doubt.

nitishp25 · 2020-03-08T11:59:55Z

@OriolAbril, do you think these tests are enough? Any tests for pred_dims or num_chains or any changes to be made in test_inference_data_no_posterior?

OriolAbril

do you think these tests are enough? Any tests for pred_dims or num_chains or any changes to be made in test_inference_data_no_posterior?

I would add some more tests, as you say, one checking that dims and pred dims work properly would be great and one checking num_chain works too. Also, either in the no posterior test, constant data test or a new one, it would be great to check what happens when predictions is alone (maybe test num_chains and predictions alone at the same time in a new test?).

Notes: by predictions alone I mean any of predictions, constant_data_predictions and pred_dims, also, https://github.com/arviz-devs/arviz/blob/master/arviz/tests/external_tests/test_data_pymc.py#L322 can also help with testing some combinations in the same test.

Another idea, please say if you think it would be useful or not. I have seen that constant data (either for model or for predictions) is generally already in a dict, however it is a dict of pytorch tensors, do you think it would be useful to allow a dict of tensors as constant data argument? It could be handled with a try except or checking ìf hasattr("detach", value)?

nitishp25 · 2020-03-09T15:27:48Z

Another idea, please say if you think it would be useful or not. I have seen that constant data (either for model or for predictions) is generally already in a dict, however it is a dict of pytorch tensors, do you think it would be useful to allow a dict of tensors as constant data argument? It could be handled with a try except or checking ìf hasattr("detach", value)?

I think constant data already works with a dict of tensors. Is there any incompatibility?

OriolAbril · 2020-03-09T15:47:43Z

I think constant data already works with a dict of tensors. Is there any incompatibility?

Not sure, there should probably be a test for this (or just say it is not supported), like numpy, xarray hardly ever raises an error when creating arrays/datasets, however, the result can be unexpected in many cases:

np.array(((1, 2), (1, 4)))                                                                                                         
# array([[1, 2],
#        [1, 4]])

np.array(((1, 2), (1, 4, 3)))                                                                                                      
# array([(1, 2), (1, 4, 3)], dtype=object)

np.array(((1,2), (1, (4, 3))))                                                                                                   
# array([[1, 2],
#        [1, (4, 3)]], dtype=object)

We should make sure to add in docstring to convert to array or add a test checking that the generated dataset has the right shape, not length one and dtype object or something similar.

OriolAbril

We could also make predictions data a fixture as it is used several times (I think)

OriolAbril · 2020-03-09T17:28:59Z

arviz/tests/external_tests/test_data_pyro.py

+ inference_data = from_pyro(prior=prior)
+ test_dict = {"prior": ["mu", "tau", "eta"]}
+ fails = check_multiple_attrs(test_dict, inference_data)
+ assert not fails


I think using

assert not fails, "only prior: {}".format(fails)

will yield more informative error messages, not sure about formatting working though. I'll try to check this

OriolAbril · 2020-03-09T17:33:52Z

arviz/tests/external_tests/test_data_pyro.py

+ )
+
+ inference_data = from_pyro(predictions=predictions, num_chains=2)
+ nchains = inference_data.predictions["obs"].shape[0]


inference_data.predictions.dims["chain"] should return the length of chain dim

nitishp25 · 2020-03-10T13:24:10Z

Not sure, there should probably be a test for this (or just say it is not supported), like numpy, xarray hardly ever raises an error when creating arrays/datasets, however, the result can be unexpected in many cases:
np.array(((1, 2), (1, 4)))                                                                                                         
# array([[1, 2],
#        [1, 4]])

np.array(((1, 2), (1, 4, 3)))                                                                                                      
# array([(1, 2), (1, 4, 3)], dtype=object)

np.array(((1,2), (1, (4, 3))))                                                                                                   
# array([[1, 2],
#        [1, (4, 3)]], dtype=object)
We should make sure to add in docstring to convert to array or add a test checking that the generated dataset has the right shape, not length one and dtype object or something similar.

Sorry, I don't understand. The incorrect dims case would only be there when pyro samples data from the model without chains (like in case of prior) right? And if the user converts the above 3 cases to tensors then only the first one would work and other 2 wouldn't be converted to a tensor in the first place. Do you mean that the user could pass such examples without converting to a tensor or that these are the values after the passed tensors have been detached to arrays? I didn't get your point

OriolAbril · 2020-03-10T17:38:53Z

I have played a little with tensors and it looks like xarray converts them automatically to arrays.

My concern came from the fact that when prompted with the object:

tensor([ 4.5000,  6.0000,  7.0000, 12.0000, 18.0000], dtype=torch.float64)

it can either be converted to an array of dtype float and length 5 or to an array of dtype object and length 1 whose first position contains the whole tensor. Everything seems to work properly, even test error messages. Thanks!

modify changelog

OriolAbril

Minor nits

OriolAbril · 2020-03-11T10:38:15Z

arviz/data/io_pyro.py

+ """When constructing InferenceData must have at least
+ one of trace, prior, posterior_predictive or predictions."""


"... one of posterior, prior, ...", trace is a pymc specific argument

OriolAbril · 2020-03-11T10:52:43Z

arviz/tests/external_tests/test_data_pyro.py

+ @pytest.fixture(scope="class")
+ def predictions_data(self, data):
+ posterior_samples = data.obj.get_samples()
+ model = data.obj.kernel.model
+ pred_data = {"J": 8, "sigma": np.array([5.0, 7.0, 12.0, 4.0, 6.0, 10.0, 3.0, 9.0])}
+ predictions = Predictive(model, posterior_samples)(
+ pred_data["J"], torch.from_numpy(pred_data["sigma"]).float()
+ )
+ return predictions


How about making a prediction_params (like eight_school_params with scope class), so that predictions_data(self, data, prediction_params) and this prediction_params fixture would then also be used in get_inference_data and in test_inference_data_no_posterior

OriolAbril · 2020-03-11T11:01:15Z

arviz/tests/external_tests/test_data_pyro.py

+ dims = inference_data.posterior_predictive["obs"].shape[2:]
+ pred_dims = inference_data.predictions["obs"].shape[2:]
+ assert dims == (8,)
+ assert pred_dims == (8,)


inference_data.posterior_predictive.dims["school"] and inference_data.predictions.dims["school_pred"], then the assert will be dims == 8

This can be moved to the test_inference_data test above too, both start with the same code.

OriolAbril · 2020-03-11T11:03:36Z

arviz/tests/external_tests/test_data_pyro.py

+
+ def test_inference_data_num_chains(self, predictions_data):
+ predictions = predictions_data
+ inference_data = from_pyro(predictions=predictions, num_chains=2)


num_chains=chains (use chains fixture imported from helpers to make sure chains is actually the number of chains in posterior)

OriolAbril · 2020-03-11T15:21:19Z

arviz/tests/external_tests/test_data_pyro.py

+ # test dims
+ dims = inference_data.posterior_predictive.dims["school"]
+ pred_dims = inference_data.predictions.dims["school_pred"]
+ assert dims == 8, pred_dims == 8


these should still be one in each line, otherwise the pred_dims == 8 will only be executed if assert dims fails.

OriolAbril

LGTM. I think it is ready to merge

nitishp25 · 2020-03-14T06:19:23Z

I'll add the groups to numpyro soon

fix changelog merge issue

codecov · 2020-03-20T19:35:57Z

Codecov Report

Merging #1090 into master will decrease coverage by 0.00%.
The diff coverage is 97.82%.

@@            Coverage Diff             @@
##           master    #1090      +/-   ##
==========================================
- Coverage   92.68%   92.67%   -0.01%     
==========================================
  Files          93       93              
  Lines        9032     9069      +37     
==========================================
+ Hits         8371     8405      +34     
- Misses        661      664       +3

Impacted Files	Coverage Δ
arviz/data/io_pyro.py	`95.86% <97.82%> (-1.37%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1ccd0d...0387ff7. Read the comment docs.

OriolAbril · 2020-03-20T21:12:23Z

Thanks!

* add predictions * add remaining groups * black changes * modify chains * remove repeated lines * minor changes done * fix pred_dims * add tests * add more tests * modified tests * update changelog modify changelog * minor changes * correct test

OriolAbril reviewed Feb 26, 2020

View reviewed changes

nitishp25 requested a review from OriolAbril February 27, 2020 19:40

OriolAbril reviewed Feb 27, 2020

View reviewed changes

nitishp25 requested a review from OriolAbril February 28, 2020 14:13

OriolAbril reviewed Feb 28, 2020

View reviewed changes

nitishp25 added 8 commits March 7, 2020 20:07

add predictions

e4d379b

add remaining groups

7d15268

black changes

5b11426

modify chains

c2098ce

remove repeated lines

3173ab4

minor changes done

2b07e64

fix pred_dims

5c25b9a

add tests

246da6e

nitishp25 force-pushed the pyro-groups branch from 0a21b73 to 246da6e Compare March 8, 2020 11:57

OriolAbril reviewed Mar 8, 2020

View reviewed changes

add more tests

bd2b73b

OriolAbril approved these changes Mar 9, 2020

View reviewed changes

modified tests

31fef17

update changelog

2a2849a

modify changelog

nitishp25 requested a review from OriolAbril March 11, 2020 07:17

OriolAbril approved these changes Mar 11, 2020

View reviewed changes

minor changes

7a481b4

OriolAbril approved these changes Mar 11, 2020

View reviewed changes

correct test

88c812c

OriolAbril approved these changes Mar 14, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into pyro-groups

0387ff7

fix changelog merge issue

OriolAbril merged commit f7febc1 into arviz-devs:master Mar 20, 2020

		@@ -45,10 +66,18 @@ def __init__(
		self.nchains = self.ndraws = 0

		"""When constructing InferenceData must have at least
		one of trace, prior, posterior_predictive or predictions."""

[WIP] Add new groups to io_pyro #1090

[WIP] Add new groups to io_pyro #1090

Conversation

nitishp25 commented Feb 26, 2020 • edited Loading

Description

Checklist

nitishp25 commented Feb 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OriolAbril commented Feb 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nitishp25 commented Feb 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nitishp25 Feb 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OriolAbril commented Feb 29, 2020

nitishp25 commented Mar 8, 2020

OriolAbril left a comment

Choose a reason for hiding this comment

nitishp25 commented Mar 9, 2020

OriolAbril commented Mar 9, 2020

OriolAbril left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nitishp25 commented Mar 10, 2020

OriolAbril commented Mar 10, 2020

OriolAbril left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OriolAbril left a comment

Choose a reason for hiding this comment

nitishp25 commented Mar 14, 2020

codecov bot commented Mar 20, 2020

Codecov Report

OriolAbril commented Mar 20, 2020

nitishp25 commented Feb 26, 2020 •

edited

Loading

nitishp25 commented Feb 26, 2020 •

edited

Loading

nitishp25 Feb 29, 2020 •

edited

Loading