Add helper function to easily stack `chain` and `draws` dimensions #1469

AlexAndorra · 2020-12-22T19:03:31Z

Stacking chains and draws is often useful when one doesn't care about which chain a draw is coming from. This is currently possible by doing idata.posterior.stack(sample=("chain", "draw")), but very few people seem to know that.

Adding a helper function to easily stack the chain and draws dimensions into a sample dimension would go a long way towards making users' life easier, as well as making better use of xarray's capabilities. Feel free to comment if you want to take on this issue 🖖

The text was updated successfully, but these errors were encountered:

ahartikainen · 2020-12-23T05:35:56Z

Is there a reason we can not add this when we create InferenceData? Would anything change or break? I think it would only add one more dimension, but everything else is same?

AlexAndorra · 2020-12-23T09:41:57Z

I don't think anything would break, as you can still access chain and draw after stacking, but I'm not sure I'd add it by default -- if another easy solution to add this feature exists I mean.
The reason is just that it adds yet another dimension, so it makes the ID heavier and, most importantly, can be confusing for people who don't need this. Generating it on-demand would be the best case scenario IMO

OriolAbril · 2021-05-06T14:54:37Z

Proposal for this, a get_dataset or extract_dataset function, something like:

def get_dataset(idata, group="posterior", combined=False, var_names=None, filter_vars=None):
    """Extracts an inference data group or subset of it as xarray dataset

    Parameters
    ----------
    idata : InferenceData
        InferenceData from which to extract the data. 
        <not sure if it should be idata or anything that can be converted to idata>
    group : str, default "posterior"
    combined : bool, default False?
    var_names : str or list of str, optional
        Like with plotting, sometimes it's easier to subset saying what to exclude instead of what to include
    filter_vars : like with plotting

    Returns
    -------
    xarray.Dataset (or xarray.DataArray?)
        I am not sure whether we should return a dataarray iff `var_names` is a string 
        and a dataset otherwise or always a dataset.
    """

I believe this will handle most practical cases and be quite flexible while still being very little code as everything is reused from other functions/externalized.

chiral-carbon · 2021-05-07T12:21:37Z

Hi! I recently submitted a PR for updating GLM poisson regression to best practices, and on executing cell 18 in that notebook

az.summary(np.exp(inf_fish_alt.posterior), kind="stats")

I get this RuntimeWarning:

/home/ada/.local/lib/python3.8/site-packages/xarray/core/computation.py:724: RuntimeWarning: overflow encountered in exp 
  result_data = func(*input_data) 
/home/ada/.local/lib/python3.8/site-packages/numpy/core/_methods.py:193: RuntimeWarning: invalid value encountered in subtract 
  x = asanyarray(arr - arrmean)

Here, inf_fish_alt is the trace created using model based on pymc.glm.GLM.from_formula

@OriolAbril suggested that this might be relevant to the discussion here.

OriolAbril · 2021-05-07T15:23:50Z

yeah, here we basically want to exponentiate all varibles except mu because mu has alredy been exponentiated in the model code and doing so again results in overflow. If the proposed funciton were available, we'd be able to get the right subset quite easily and then exponentiate this and pass it to summary.

AlexAndorra added Help Wanted Beginner labels Dec 22, 2020

OriolAbril mentioned this issue Dec 29, 2020

More intuitive chain extraction methods #1301

Closed

OriolAbril mentioned this issue Jan 8, 2021

Write "Working with InferenceData" page #1486

Open

8 tasks

OriolAbril mentioned this issue Apr 21, 2021

Bayesian variable (one-object mode) #1668

Open

review-notebook-app bot mentioned this issue May 6, 2021

Update GLM-poisson-regression to best practices state pymc-devs/pymc-examples#154

Merged

OriolAbril mentioned this issue Jun 12, 2021

add extract_dataset function #1725

Merged

5 tasks

OriolAbril mentioned this issue Oct 22, 2021

Better API for obtaining posterior point estimates & more #1899

Open

OriolAbril closed this as completed in #1725 Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add helper function to easily stack `chain` and `draws` dimensions #1469

Add helper function to easily stack `chain` and `draws` dimensions #1469

AlexAndorra commented Dec 22, 2020

ahartikainen commented Dec 23, 2020

AlexAndorra commented Dec 23, 2020

OriolAbril commented May 6, 2021 •

edited

Loading

chiral-carbon commented May 7, 2021

OriolAbril commented May 7, 2021

Add helper function to easily stack chain and draws dimensions #1469

Add helper function to easily stack chain and draws dimensions #1469

Comments

AlexAndorra commented Dec 22, 2020

ahartikainen commented Dec 23, 2020

AlexAndorra commented Dec 23, 2020

OriolAbril commented May 6, 2021 • edited Loading

chiral-carbon commented May 7, 2021

OriolAbril commented May 7, 2021

Add helper function to easily stack `chain` and `draws` dimensions #1469

Add helper function to easily stack `chain` and `draws` dimensions #1469

OriolAbril commented May 6, 2021 •

edited

Loading