Adding New Rootogram Plot #81

imperorrp · 2024-08-12T21:15:20Z

Pushed draft of the new Rootogram implementation code (#52 ) .

The existing hist functions from the currently still-open plot_dist hist addition PR [WIP] Histogram support addition to distplot.py #47 were used here as well as some code from the plot_ppc PR [WIP] Adding PPC plot to Arviz-Plots #55.
The hist visual element backend interface docstring was modified with 'y' now referring to the 'top' y-coordinate explicitly and not height of the bars (as suggested by @OriolAbril). The matplotlib implementation of the hist visual element (which uses bar behind the scenes) was also updated internally with y=y-bottom to reflect this. (This has to be updated in [WIP] Histogram support addition to distplot.py #47 now as well). The bars were not being plotted in the expected positions otherwise.
WIP work includes appropriate binning for each facetted subset of the data to be plotted.
The observed data is used for binning for each subset and the bin heights here are required for setting the 'top's of the predictive bars so this is computed always, regardless of whether observed is passed as True or False (The True/False condition only determines whether it is plotted or not)

Current plot output when the rugby default Arviz datatree is passed (should work with any datatree with posterior predictive and observed groups like plot_ppc though):

azp.plot_rootogram(data)

azp.plot_rootogram(data, plot_kwargs={"predictive": False})

azp.plot_rootogram(data, observed=False)

pc = azp.plot_rootogram(data, backend="bokeh")
pc.show()

azp.plot_rootogram(data, observed_rug=True)

^If we want a rugplot for rootograms as well like plot_ppc.

📚 Documentation preview 📚: https://arviz-plots--81.org.readthedocs.build/en/81/

imperorrp · 2024-08-12T21:23:06Z

src/arviz_plots/plots/rootogramplot.py

+    # use get_bins func from arviz-stats on observed data and then use those bins for
+    # computing histograms for predictive data as well
+    # WIP: currently only the bins for one variable (without any facetting) is retrieved and used
+    bins = array_stats.get_bins(obs_distribution["home_points"].values)


Reminder for later: Use xarray aware get_bins function once available from Arviz-Stats

imperorrp · 2024-08-21T14:32:56Z

Todo: Shift rug lower, below the bottom of the bars

imperorrp · 2024-08-29T11:16:21Z

Rugplots now render below the bottom of the rootogram bars.

To allow for some tolerance gap for visibility, I've currently set this algorithm:

min_histogram_bottom = min(histogram_bottom)
min_bottom[var_name] = min_histogram_bottom - (0.2 * (0 - min_histogram_bottom))

(It may be that picking a specific number to subtract from min_histogram is better as that'd keep this gap a constant number of units, but then the chart size may vary and so would the perceivable gap)

Output looks like this now when the rugplot is plotted:

imperorrp · 2024-09-02T12:37:05Z

Binning works as expected now using the 'get_bins' arviz-stats branch:

src/arviz_plots/backend/__init__.py

OriolAbril · 2024-09-02T15:05:21Z

src/arviz_plots/plots/rootogramplot.py

+        pc_kwargs["aes"] = pc_kwargs.get("aes", {}).copy()
+        pc_kwargs["aes"].setdefault("overlay", sample_dims)  # setting overlay dim


This can be removed, we only want facetting in rootgram

OriolAbril · 2024-09-02T15:11:59Z

src/arviz_plots/plots/rootogramplot.py

+    observed=None,
+    observed_rug=False,


I would remove these two. Not including the observed data doesn't really make sense for the rootgram. We could skip the y shift in such cases but then the resulting plot would be a histogram directly, if users want a "regular" histogram they should use plot_dist.

Moreover, as you might have noticed, the rug plot doesn't really give much information because the data is discrete, so there are many lines overlapping at 1, many lines overlapping at 2... so the rugplot doesn't really show how many observations do we have at 1 or at 2, it seems each has only a single observation. And playing with alpha can only help so much, at most you'd get a qualitative information about the number of variables at each value which is less info than the observed line/scatter provide. So I would remove the argument and the rugplot.

OriolAbril · 2024-09-02T15:13:46Z

src/arviz_plots/plots/rootogramplot.py

+        for group_name in (predictive_data_group, "observed_data"):
+            if group_name not in dt.children:
+                raise TypeError(f'`data` argument must have the group "{group_name}" for ppcplot')


After removing observed just keep this outside the if.

OriolAbril · 2024-09-02T15:35:28Z

src/arviz_plots/plots/rootogramplot.py

+    # new_obs_hist with histogram->y and left_edge/right_edge midpoint->x
+    new_obs_hist = xr.Dataset()
+
+    for var_name in list(obs_hist.keys()):
+        left_edges = obs_hist[var_name].sel(plot_axis="left_edges").values
+        right_edges = obs_hist[var_name].sel(plot_axis="right_edges").values
+
+        left_edges = np.array(left_edges)
+        right_edges = np.array(right_edges)
+
+        x = (left_edges + right_edges) / 2
+        y = obs_hist[var_name].sel(plot_axis="histogram").values
+
+        stacked_data = np.stack((x, y), axis=-1)
+        new_var = xr.DataArray(
+            stacked_data, dims=["hist_dim", "plot_axis"], coords={"plot_axis": ["x", "y"]}
+        )
+
+        new_obs_hist[var_name] = new_var


Suggested change

# new_obs_hist with histogram->y and left_edge/right_edge midpoint->x

new_obs_hist = xr.Dataset()

for var_name in list(obs_hist.keys()):

left_edges = obs_hist[var_name].sel(plot_axis="left_edges").values

right_edges = obs_hist[var_name].sel(plot_axis="right_edges").values

left_edges = np.array(left_edges)

right_edges = np.array(right_edges)

x = (left_edges + right_edges) / 2

y = obs_hist[var_name].sel(plot_axis="histogram").values

stacked_data = np.stack((x, y), axis=-1)

new_var = xr.DataArray(

stacked_data, dims=["hist_dim", "plot_axis"], coords={"plot_axis": ["x", "y"]}

)

new_obs_hist[var_name] = new_var

new_obs_hist = xr.concat(

(ds.sel(plot_axis=["right_edges", "left_edges"]).sum("plot_axis") / 2, obs_hist.sel(plot_axis="histogram", drop=True)),

dim="plot_axis",

).assign_coords(plot_axis=["x", "y"])

Also note that the code in the PR would only have worked when all variables get the same number of bins (which doesn't always happen) because the dimension name is hardcoded to hist_dim in all variables of the dataset. As I mentioned in slack, the behaviour has changed in the get_bins PR to support this effect. Now each variable gets independent dimensions hist_dim_mu, hist_dim_tau...

OriolAbril · 2024-09-02T15:37:55Z

src/arviz_plots/plots/rootogramplot.py

+        # new_pp_hist dataset
+        new_pp_hist = xr.Dataset()
+
+        for var_name in list(pp_hist.keys()):


also update and get rid of the loop

…r bottom !=0 in matplotlib backend hist func

…ooping logic and removed 'none' backend hist element duplication

imperorrp · 2024-09-04T19:10:36Z

Tests are failing due to plot_dist modification requirement since plot_rootogram currently depends on the unmerged arviz-stats 'get_bins' branch. This modification will be required globally once this Arviz-Stats branch is merged too

…dded tests for plot_rootogram

imperorrp · 2024-09-08T13:36:40Z

Added tests for plot_rootogram

codecov-commenter · 2024-09-08T13:39:44Z

Codecov Report

Attention: Patch coverage is 93.60000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 85.39%. Comparing base (f4a39af) to head (a67bb2d).

Files with missing lines	Patch %	Lines
src/arviz_plots/plots/rootogramplot.py	92.72%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
+ Coverage   84.80%   85.39%   +0.59%     
==========================================
  Files          21       22       +1     
  Lines        2336     2451     +115     
==========================================
+ Hits         1981     2093     +112     
- Misses        355      358       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

aloctavodia · 2024-09-13T13:55:09Z

An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions
arviz-devs/Exploratory-Analysis-of-Bayesian-Models#60. We do not necessarily need to follow those examples, they are just suggestions. Actually, once we have a working rootograms that example will use whatever arviz implements.

imperorrp · 2024-09-16T07:50:13Z

An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions arviz-devs/Exploratory-Analysis-of-Bayesian-Models#60. We do not necessarily need to follow those examples, they are just suggestions. Actually, once we have a working rootograms that example will use whatever arviz implements.

If we follow those examples, the line_y visual element function from #85 can be used for the thin lines to represent the hanging and suspended lines. For the hanging case, we can use the existing logic and for suspended we could modify the logic in this portion of the code:

histogram_bottom = new_obs_hist.sel(plot_axis="y") - pp_hist.sel(plot_axis="histogram")
histogram_bottom = histogram_bottom.expand_dims(plot_axis=["histogram_bottom"])
# print(f" diff = {a}\n")

new_pp_hist = xr.concat(
    (
        new_obs_hist.sel(plot_axis="y"),  # getting tops of histogram (observed values)
        pp_hist.sel(plot_axis="left_edges"),
        pp_hist.sel(plot_axis="right_edges"),
        histogram_bottom,
    ),
    dim="plot_axis",
).assign_coords(plot_axis=["histogram", "left_edges", "right_edges", "histogram_bottom"])

And instead pass a histogram_top along the histogram dimension (root of predictive count minus observed count) and 0 for the histogram_bottom dimension:

histogram_top = pp_hist.sel(plot_axis="histogram") - new_obs_hist.sel(plot_axis="y")
histogram_top = histogram_top.expand_dims(plot_axis=["histogram"])

For the uncertainty representation of 94% HDI, what would the process be?

imperorrp commented Aug 12, 2024

View reviewed changes

OriolAbril reviewed Sep 2, 2024

View reviewed changes

imperorrp added 5 commits September 3, 2024 02:00

First commit for Rootogram plot

ae9955f

Added 'baseline' to plot_kwargs in docstring

61a0234

Minor fix for 'observed_line' linestyle and supporting array check fo…

9dfb020

…r bottom !=0 in matplotlib backend hist func

Set rugplots under rootogram bars

0246937

get_bins working update

7396ca7

imperorrp force-pushed the rootogram branch from 006010d to 7396ca7 Compare September 2, 2024 20:31

Removed observed rug and default overlay of sample_dims, simplified l…

a9d4c75

…ooping logic and removed 'none' backend hist element duplication

Added plot_dist fix, added alpha to 'none' 'hist' backend function, a…

a67bb2d

…dded tests for plot_rootogram

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding New Rootogram Plot #81

Adding New Rootogram Plot #81

imperorrp commented Aug 12, 2024 •

edited

Loading

imperorrp Aug 12, 2024

imperorrp commented Aug 21, 2024

imperorrp commented Aug 29, 2024

imperorrp commented Sep 2, 2024

OriolAbril Sep 2, 2024

OriolAbril Sep 2, 2024

OriolAbril Sep 2, 2024

OriolAbril Sep 2, 2024

OriolAbril Sep 2, 2024

OriolAbril Sep 2, 2024

imperorrp commented Sep 4, 2024

imperorrp commented Sep 8, 2024

codecov-commenter commented Sep 8, 2024

aloctavodia commented Sep 13, 2024 •

edited

Loading

imperorrp commented Sep 16, 2024 •

edited

Loading

		pc_kwargs["aes"] = pc_kwargs.get("aes", {}).copy()
		pc_kwargs["aes"].setdefault("overlay", sample_dims) # setting overlay dim

Adding New Rootogram Plot #81

Are you sure you want to change the base?

Adding New Rootogram Plot #81

Conversation

imperorrp commented Aug 12, 2024 • edited Loading

imperorrp Aug 12, 2024

Choose a reason for hiding this comment

imperorrp commented Aug 21, 2024

imperorrp commented Aug 29, 2024

imperorrp commented Sep 2, 2024

OriolAbril Sep 2, 2024

Choose a reason for hiding this comment

OriolAbril Sep 2, 2024

Choose a reason for hiding this comment

OriolAbril Sep 2, 2024

Choose a reason for hiding this comment

OriolAbril Sep 2, 2024

Choose a reason for hiding this comment

OriolAbril Sep 2, 2024

Choose a reason for hiding this comment

OriolAbril Sep 2, 2024

Choose a reason for hiding this comment

imperorrp commented Sep 4, 2024

imperorrp commented Sep 8, 2024

codecov-commenter commented Sep 8, 2024

Codecov Report

aloctavodia commented Sep 13, 2024 • edited Loading

imperorrp commented Sep 16, 2024 • edited Loading

imperorrp commented Aug 12, 2024 •

edited

Loading

aloctavodia commented Sep 13, 2024 •

edited

Loading

imperorrp commented Sep 16, 2024 •

edited

Loading