-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement summarystats for InferenceObjects types #294
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
I'm locally testing this, and I think it looks pretty good. It keeps with our trend in e.g. We could additionally add informative warnings, something like |
I've opted for now to not do highlighting or warnings. This can be addressed in a future PR. |
Pluto uses these
This PR adds a
StatsBase.summarystats
method forDataset
andInferenceData
inputs (EDIT: and also new utilities for nicely showing tables)Internally, this calls several stats and diagnostics functions on the entire input
Dataset
and then concatenates them along a newly created:metric
dimension. This newDataset
may be returned, but by default, theDataset
is turned into an iterator over marginals, which is interpreted as a row table and reformatted to be a column table. This column table is lightly wrapped in aSummaryStats
object, which implements the Tables interface. The benefits of this object over returning aDataFrame
are:ModelComparisonResult
andAbstractELPDResult
, we use the MCSE of an estimate, if available, to choose the number of significant digits to show of the estimate itself.EDIT: Because this PR adds a number of utilities for nicely showing tables, these are now used in the
show
methods ofModelComparisonResult
andAbstractELPDResult
. For the latter, I also reformatted the table to be more similar to that ofModelComparisonResult
. Since changes toshow
methods should never be considered breaking changes, this applies here as well.Future improvements
We can further support users providing their own stats funs to include/replace the ones used, but doing this well would involve adding a utility function for users to wrap a function that reduces to a scalar or a
Tuple
orNamedTuple
of scalars so that it maps across all non-sample indices for all variables, producing a newDataset
, optionally with a new dimension. This would actually reduce a lot of code redundancy, so this utility should appear in InferenceObjects first.Similarly, the flattened iterator will be useful for plotting and may need to be refactored to become an API utility as well. This could certainly be done in a more type-stable way.
Without any additional work, we could use
Crayon
s to highlight bad ESS values and R-hat values. This would just require storing information about number of chains/draws in theSummaryStats
object.