Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_pymc3(prior=...) fails when model uses coords #1424

Closed
michaelosthege opened this issue Oct 16, 2020 · 5 comments
Closed

from_pymc3(prior=...) fails when model uses coords #1424

michaelosthege opened this issue Oct 16, 2020 · 5 comments

Comments

@michaelosthege
Copy link
Contributor

Describe the bug
Making an InferenceData from a prior predictive dictionary in the context of a model fails when the model uses coords.

To Reproduce
The following code works fine:

with pymc3.Model() as pmodel:
    pymc3.Lognormal("lothar", mu=-3, sd=0.8, shape=(1,))
    pp = pymc3.sample_prior_predictive(samples=1234,)

print({
    k : v.shape
    for k, v in pp.items()
})
    
with pmodel:
    idata = arviz.from_pymc3(prior=pp)

Output:

{'lothar': (1234,), 'lothar_log__': (1234,)}

And this one has the same dimensions, just with coords, but fails:

with pymc3.Model(coords={
    "lothar_coords": ["first"]
}) as pmodel:
    pymc3.Lognormal("lothar", mu=-3, sd=0.8, dims=("lothar_coords",))
    pp = pymc3.sample_prior_predictive(samples=1234,)

print({
    k : v.shape
    for k, v in pp.items()
})

with pmodel:
    idata = arviz.from_pymc3(prior=pp)

Output:

{'lothar': (1234,), 'lothar_log__': (1234,)}

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-23d7de7280c2> in <module>
     11 
     12 with pmodel:
---> 13     idata = arviz.from_pymc3(prior=pp)
     14 idata

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\arviz\data\io_pymc3.py in from_pymc3(trace, prior, posterior_predictive, log_likelihood, coords, dims, model, save_warmup)
    538         dims=dims,
    539         model=model,
--> 540         save_warmup=save_warmup,
    541     ).to_inference_data()
    542 

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\arviz\data\io_pymc3.py in to_inference_data(self)
    473             "posterior_predictive": self.posterior_predictive_to_xarray(),
    474             "predictions": self.predictions_to_xarray(),
--> 475             **self.priors_to_xarray(),
    476             "observed_data": self.observed_data_to_xarray(),
    477         }

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\arviz\data\io_pymc3.py in priors_to_xarray(self)
    374                     library=self.pymc3,
    375                     coords=self.coords,
--> 376                     dims=self.dims,
    377                 )
    378             )

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\arviz\data\base.py in dict_to_dataset(data, attrs, library, coords, dims)
    206     for key, values in data.items():
    207         data_vars[key] = numpy_to_data_array(
--> 208             values, var_name=key, coords=coords, dims=dims.get(key)
    209         )
    210     return xr.Dataset(data_vars=data_vars, attrs=make_attrs(attrs=attrs, library=library))

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\arviz\data\base.py in numpy_to_data_array(ary, var_name, coords, dims)
    171     # filter coords based on the dims
    172     coords = {key: xr.IndexVariable((key,), data=coords[key]) for key in dims}
--> 173     return xr.DataArray(ary, coords=coords, dims=dims)
    174 
    175 

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\xarray\core\dataarray.py in __init__(self, data, coords, dims, name, attrs, indexes, fastpath)
    342             data = _check_data_shape(data, coords, dims)
    343             data = as_compatible_data(data)
--> 344             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    345             variable = Variable(dims, data, attrs, fastpath=True)
    346             indexes = dict(

~\AppData\Local\Continuum\miniconda3\envs\airflow-dev\lib\site-packages\xarray\core\dataarray.py in _infer_coords_and_dims(shape, coords, dims)
    121         raise ValueError(
    122             "different number of dimensions on data "
--> 123             "and dims: %s vs %s" % (len(shape), len(dims))
    124         )
    125     else:

ValueError: different number of dimensions on data and dims: 2 vs 3

Expected behavior
In both cases, an InferenceData object containing the prior predictive samples should be created.

Additional context

  • PyMC3 3.9.3
  • ArviZ, 0.10.0
@OriolAbril
Copy link
Member

The issue here is with the (1234,) shape. ArviZ expects the arrays in prior argument of from_pymc3 to have shape draw, *shape (it knows there is no chain dimension and it adds it to comply with the chain, draw, *shape convention).

This array has only the draw dimension, it should be 1234, 1 for the dim/coordinate to be accepted, currently when lothar_dim is passed, ArviZ gets a 2d (chain, draw) array with 3 dimensions (chain, draw, lothar_dim) and errors out. I skimmed through the code and could not see any squeeze on io_pymc3, which is consistent with the prints you have included in the code examples. Not sure if the (1,) shape is being ignored by sample_prior_predictive or if it's being squeezed out.

@michaelosthege
Copy link
Contributor Author

Yes, shape (1,) is treated the same as ():

with pymc3.Model() as pmodel:
    a = pymc3.Normal("a")
    b = pymc3.Normal("b", shape=(1,))
    c = pymc3.Normal("c", shape=(2,))
print({ p.name : p.random(size=123).shape for p in [a,b,c]})
# {'a': (123,), 'b': (123,), 'c': (123, 2)}

Not sure how we should deal with this..
We have model coords that are automatically determined - they might be a vector of length 1 too and treating these differently is a real headache..

Also this issue may need to move over to pymc3..

@michaelosthege
Copy link
Contributor Author

This issue is caused by pymc-devs/pymc#4206

@OriolAbril
Copy link
Member

I think this has been solved now, should we add a test to check it does not come up again before closing though?

@michaelosthege
Copy link
Contributor Author

I don't think we'll need a test, because the fixed shape handling is tested thoroughly in PyMC3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants