Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper function to easily stack chain and draws dimensions #1469

Closed
AlexAndorra opened this issue Dec 22, 2020 · 5 comments · Fixed by #1725
Closed

Add helper function to easily stack chain and draws dimensions #1469

AlexAndorra opened this issue Dec 22, 2020 · 5 comments · Fixed by #1725

Comments

@AlexAndorra
Copy link
Contributor

Stacking chains and draws is often useful when one doesn't care about which chain a draw is coming from. This is currently possible by doing idata.posterior.stack(sample=("chain", "draw")), but very few people seem to know that.

Adding a helper function to easily stack the chain and draws dimensions into a sample dimension would go a long way towards making users' life easier, as well as making better use of xarray's capabilities. Feel free to comment if you want to take on this issue 🖖

@ahartikainen
Copy link
Contributor

Is there a reason we can not add this when we create InferenceData? Would anything change or break? I think it would only add one more dimension, but everything else is same?

@AlexAndorra
Copy link
Contributor Author

I don't think anything would break, as you can still access chain and draw after stacking, but I'm not sure I'd add it by default -- if another easy solution to add this feature exists I mean.
The reason is just that it adds yet another dimension, so it makes the ID heavier and, most importantly, can be confusing for people who don't need this. Generating it on-demand would be the best case scenario IMO

@OriolAbril
Copy link
Member

OriolAbril commented May 6, 2021

Proposal for this, a get_dataset or extract_dataset function, something like:

def get_dataset(idata, group="posterior", combined=False, var_names=None, filter_vars=None):
    """Extracts an inference data group or subset of it as xarray dataset

    Parameters
    ----------
    idata : InferenceData
        InferenceData from which to extract the data. 
        <not sure if it should be idata or anything that can be converted to idata>
    group : str, default "posterior"
    combined : bool, default False?
    var_names : str or list of str, optional
        Like with plotting, sometimes it's easier to subset saying what to exclude instead of what to include
    filter_vars : like with plotting

    Returns
    -------
    xarray.Dataset (or xarray.DataArray?)
        I am not sure whether we should return a dataarray iff `var_names` is a string 
        and a dataset otherwise or always a dataset.
    """

I believe this will handle most practical cases and be quite flexible while still being very little code as everything is reused from other functions/externalized.

@chiral-carbon
Copy link

Hi! I recently submitted a PR for updating GLM poisson regression to best practices, and on executing cell 18 in that notebook

az.summary(np.exp(inf_fish_alt.posterior), kind="stats")

I get this RuntimeWarning:

/home/ada/.local/lib/python3.8/site-packages/xarray/core/computation.py:724: RuntimeWarning: overflow encountered in exp 
  result_data = func(*input_data) 
/home/ada/.local/lib/python3.8/site-packages/numpy/core/_methods.py:193: RuntimeWarning: invalid value encountered in subtract 
  x = asanyarray(arr - arrmean) 

Here, inf_fish_alt is the trace created using model based on pymc.glm.GLM.from_formula

@OriolAbril suggested that this might be relevant to the discussion here.

@OriolAbril
Copy link
Member

yeah, here we basically want to exponentiate all varibles except mu because mu has alredy been exponentiated in the model code and doing so again results in overflow. If the proposed funciton were available, we'd be able to get the right subset quite easily and then exponentiate this and pass it to summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants