Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement summarystats for InferenceObjects types #294

Merged
merged 42 commits into from
Aug 2, 2023
Merged

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Jul 30, 2023

This PR adds a StatsBase.summarystats method for Dataset and InferenceData inputs (EDIT: and also new utilities for nicely showing tables)

Internally, this calls several stats and diagnostics functions on the entire input Dataset and then concatenates them along a newly created :metric dimension. This new Dataset may be returned, but by default, the Dataset is turned into an iterator over marginals, which is interpreted as a row table and reformatted to be a column table. This column table is lightly wrapped in a SummaryStats object, which implements the Tables interface. The benefits of this object over returning a DataFrame are:

  • we can in principle dispatch on this object in the future
  • we can intelligently choose the display precision, e.g. as with ModelComparisonResult and AbstractELPDResult, we use the MCSE of an estimate, if available, to choose the number of significant digits to show of the estimate itself.
  • we can avoid the heavy DataFrames.jl dependency

EDIT: Because this PR adds a number of utilities for nicely showing tables, these are now used in the show methods of ModelComparisonResult and AbstractELPDResult. For the latter, I also reformatted the table to be more similar to that of ModelComparisonResult. Since changes to show methods should never be considered breaking changes, this applies here as well.

Future improvements

We can further support users providing their own stats funs to include/replace the ones used, but doing this well would involve adding a utility function for users to wrap a function that reduces to a scalar or a Tuple or NamedTuple of scalars so that it maps across all non-sample indices for all variables, producing a new Dataset, optionally with a new dimension. This would actually reduce a lot of code redundancy, so this utility should appear in InferenceObjects first.

Similarly, the flattened iterator will be useful for plotting and may need to be refactored to become an API utility as well. This could certainly be done in a more type-stable way.

Without any additional work, we could use Crayons to highlight bad ESS values and R-hat values. This would just require storing information about number of chains/draws in the SummaryStats object.

sethaxen and others added 2 commits July 31, 2023 00:25
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@sethaxen
Copy link
Member Author

Without any additional work, we could use Crayons to highlight bad ESS values and R-hat values. This would just require storing information about number of chains/draws in the SummaryStats object.

I'm locally testing this, and I think it looks pretty good. It keeps with our trend in e.g. psis of using colors in the REPL to draw attention to diagnostics that look bad.

image

We could additionally add informative warnings, something like 10 params have ESS values lower than 100*nchains. and 5 params have R-hat values higher than 1.01.

@sethaxen
Copy link
Member Author

sethaxen commented Aug 1, 2023

I've opted for now to not do highlighting or warnings. This can be addressed in a future PR.

@sethaxen sethaxen marked this pull request as ready for review August 1, 2023 23:45
@sethaxen
Copy link
Member Author

sethaxen commented Aug 2, 2023

I've checked that the tables look good in the REPL, Documenter, jupyter, and Pluto.

using ArviZ, ArviZExampleData
idata = load_example_data("radon");
summarystats(idata)

The REPL and doctests use the plain-text show method. Here's the REPL version:
image

jupyter and Documenter use the HTML show method. Here's the jupyter version:
image

For better or for worse, when Pluto sees that an object implements the row table Tables interface, it ignores all show methods and instead uses its own semi-interactive table viewer:
image

@sethaxen sethaxen merged commit c31176a into main Aug 2, 2023
@sethaxen sethaxen deleted the summarystats branch August 2, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant