Implement summarystats for InferenceObjects types #294

sethaxen · 2023-07-30T22:22:31Z

This PR adds a StatsBase.summarystats method for Dataset and InferenceData inputs (EDIT: and also new utilities for nicely showing tables)

Internally, this calls several stats and diagnostics functions on the entire input Dataset and then concatenates them along a newly created :metric dimension. This new Dataset may be returned, but by default, the Dataset is turned into an iterator over marginals, which is interpreted as a row table and reformatted to be a column table. This column table is lightly wrapped in a SummaryStats object, which implements the Tables interface. The benefits of this object over returning a DataFrame are:

we can in principle dispatch on this object in the future
we can intelligently choose the display precision, e.g. as with ModelComparisonResult and AbstractELPDResult, we use the MCSE of an estimate, if available, to choose the number of significant digits to show of the estimate itself.
we can avoid the heavy DataFrames.jl dependency

EDIT: Because this PR adds a number of utilities for nicely showing tables, these are now used in the show methods of ModelComparisonResult and AbstractELPDResult. For the latter, I also reformatted the table to be more similar to that of ModelComparisonResult. Since changes to show methods should never be considered breaking changes, this applies here as well.

Future improvements

We can further support users providing their own stats funs to include/replace the ones used, but doing this well would involve adding a utility function for users to wrap a function that reduces to a scalar or a Tuple or NamedTuple of scalars so that it maps across all non-sample indices for all variables, producing a new Dataset, optionally with a new dimension. This would actually reduce a lot of code redundancy, so this utility should appear in InferenceObjects first.

Similarly, the flattened iterator will be useful for plotting and may need to be refactored to become an API utility as well. This could certainly be done in a more type-stable way.

Without any additional work, we could use Crayons to highlight bad ESS values and R-hat values. This would just require storing information about number of chains/draws in the SummaryStats object.

src/ArviZStats/summarystats.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

sethaxen · 2023-07-31T11:46:50Z

Without any additional work, we could use Crayons to highlight bad ESS values and R-hat values. This would just require storing information about number of chains/draws in the SummaryStats object.

I'm locally testing this, and I think it looks pretty good. It keeps with our trend in e.g. psis of using colors in the REPL to draw attention to diagnostics that look bad.

We could additionally add informative warnings, something like 10 params have ESS values lower than 100*nchains. and 5 params have R-hat values higher than 1.01.

sethaxen · 2023-08-01T15:52:39Z

I've opted for now to not do highlighting or warnings. This can be addressed in a future PR.

Pluto uses these

sethaxen · 2023-08-02T14:07:21Z

I've checked that the tables look good in the REPL, Documenter, jupyter, and Pluto.

using ArviZ, ArviZExampleData
idata = load_example_data("radon");
summarystats(idata)

The REPL and doctests use the plain-text show method. Here's the REPL version:

jupyter and Documenter use the HTML show method. Here's the jupyter version:

For better or for worse, when Pluto sees that an object implements the row table Tables interface, it ignores all show methods and instead uses its own semi-interactive table viewer:

sethaxen added 3 commits July 31, 2023 00:05

Add initial summarystats implementation

6595396

Add summarystats to API docs

ef697ae

Increment patch number

7047305

github-actions bot reviewed Jul 30, 2023

View reviewed changes

src/ArviZStats/summarystats.jl Outdated Show resolved Hide resolved

sethaxen and others added 2 commits July 31, 2023 00:25

Update src/ArviZStats/summarystats.jl

21d5f5f

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Fix summarystats methods

81f953e

sethaxen added 18 commits August 1, 2023 17:10

Rename utility function

d68ae1a

Add utilities for formatting to strings

bd7a380

Generalize interval probability variable name

94cda40

Make test pass on v1.10.x

2bbf0d2

Implement common interfaces for SummaryStats

d2578a3

Add alignment of column entries

427583b

Use formatter utility functions

45168bf

Update keywords

d08e0e8

Add utility function

96ec4b5

Use parent

61d8ac3

Rename keyword to prob_interval

e80f0e2

Rename to compact_labels

e996a77

Fix docstring

64bf7bb

Update doctests

5b31f59

Use new utilities in compare

11484be

Change table format for ELPD results

2d3e4d7

Update tests and docstrings

ad67c19

Update docstrings

d05a516

sethaxen added 4 commits August 1, 2023 22:06

Simplify formatter implementations

1ca231e

Add formatter tests

7c78dc4

Define metric dim as constant

dee54ae

Refactor utility functions

819389a

sethaxen added 7 commits August 2, 2023 01:08

Use refactored utilities

32951c5

Call correct version of ess_rhat

7d8c746

Concretize sample dims

9fc83fd

Add summarystats tests

78d27df

Remove duplicate keyword doc

c60ac12

Add doctest

579c00b

Add SummaryStats tests

94be8ec

sethaxen marked this pull request as ready for review August 1, 2023 23:45

sethaxen added 8 commits August 2, 2023 10:14

Add missing tests

b415b3d

Unify PrettyTables code

dc0e907

Add HTML show method for tables

4f81010

Fix compare tests

020c0cf

Just test that HTML is returned

1255219

Add Tables row table methods

07bea59

Pluto uses these

sReuse table utilities for AbstractELPDResult

9a7cb24

Fix indexing on v1.6

5eff0a5

sethaxen merged commit c31176a into main Aug 2, 2023

sethaxen deleted the summarystats branch August 2, 2023 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement summarystats for InferenceObjects types #294

Implement summarystats for InferenceObjects types #294

sethaxen commented Jul 30, 2023 •

edited

Loading

sethaxen commented Jul 31, 2023

sethaxen commented Aug 1, 2023

sethaxen commented Aug 2, 2023

Implement summarystats for InferenceObjects types #294

Implement summarystats for InferenceObjects types #294

Conversation

sethaxen commented Jul 30, 2023 • edited Loading

sethaxen commented Jul 31, 2023

sethaxen commented Aug 1, 2023

sethaxen commented Aug 2, 2023

sethaxen commented Jul 30, 2023 •

edited

Loading