Implement pure Julia InferenceData scheme #128

sethaxen · 2021-05-17T21:18:21Z

Currently our InferenceData just wraps arviz.InferenceData for dispatch. This is convenient because it's low maintenance and highly functional, as we get all class methods implemented on the Python side for free. However, the data itself is stuck in xarray Datasets, which is fine for Python users who may already be familiar with xarray, but for Julia users who do not use xarray, it is not as friendly as we would like for actually accessing or modifying the underlying data.

An alternative is to reimplement the InferenceData schema using only Julia packages. This comes with a maintenance cost but has the benefit of improving this package's usability for Julia users. By providing converts to/from the Python implementation of InferenceData, and using PyCall's feature of no-copy array passing, we can construct xarray views of our InferenceData to access the all existing xarray-based functionality.

To implement the schema, we need to choose (at least) one representation for arrays with named dimensions and named keys and for collections of these arrays into datasets with named variables and attributes. The old solution to named dimensions and named keys was AxisArrays.jl, which MCMCChains.jl is built on. This is probably not the best way going forward, as a rich ecosystem of packages that provide either named dimensions or name keys has been developing. See JuliaCollections/AxisArraysFuture#1 for useful discussion.

In particular, it was noted on slack that AxisSets.jl provides a KeyedDataset type that contains collections of KeyedArrays from AxisArrays.jl, which wrap NamedArray from NamedDims.jl. Because AxisSets and NamedDims are going into production at Invenia labs, they are battle tested and are maintained by dedicated teams. It is also more likely then that our users will be familiar with one or more of these packages going forward. Therefore, I propose that our Julian InferenceData is a collection of Dataset objects that wrap AxisSets.jl's KeyedDataset objects.

The text was updated successfully, but these errors were encountered:

sethaxen · 2022-06-23T09:30:24Z

DimensionalData has been downloaded 50x more than AxisSets in the last 6 months and has similar functionality as xarray, so I've begun work in #191 using DimensionalData as the backing for a pure Julia InferenceData.

This was referenced May 17, 2021

Make ArviZ.jl more Julian #130

Closed

Add from_turing and from_soss to simplify getting auxiliary groups #132

Open

sethaxen mentioned this issue Jan 19, 2022

Prior predictive keyword arg not used in from_mcmcchains #146

Closed

ParadaCarleton mentioned this issue Feb 18, 2022

Toward a completely PPL-agnostic Bayesian workflow #154

Open

sethaxen mentioned this issue Mar 20, 2022

Make the Quickstart docs a Pluto notebook #136

Merged

sethaxen mentioned this issue Jun 22, 2022

Reimplement Dataset and InferenceData using DimensionalData #191

Merged

sethaxen closed this as completed in #191 Jul 9, 2022

sethaxen moved this to Done in CZI Round 4 Aug 20, 2022

sethaxen added this to CZI Round 4 Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pure Julia InferenceData scheme #128

Implement pure Julia InferenceData scheme #128

sethaxen commented May 17, 2021

sethaxen commented Jun 23, 2022

Implement pure Julia InferenceData scheme #128

Implement pure Julia InferenceData scheme #128

Comments

sethaxen commented May 17, 2021

sethaxen commented Jun 23, 2022