Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pure Julia InferenceData scheme #128

Closed
sethaxen opened this issue May 17, 2021 · 1 comment · Fixed by #191
Closed

Implement pure Julia InferenceData scheme #128

sethaxen opened this issue May 17, 2021 · 1 comment · Fixed by #191

Comments

@sethaxen
Copy link
Member

Currently our InferenceData just wraps arviz.InferenceData for dispatch. This is convenient because it's low maintenance and highly functional, as we get all class methods implemented on the Python side for free. However, the data itself is stuck in xarray Datasets, which is fine for Python users who may already be familiar with xarray, but for Julia users who do not use xarray, it is not as friendly as we would like for actually accessing or modifying the underlying data.

An alternative is to reimplement the InferenceData schema using only Julia packages. This comes with a maintenance cost but has the benefit of improving this package's usability for Julia users. By providing converts to/from the Python implementation of InferenceData, and using PyCall's feature of no-copy array passing, we can construct xarray views of our InferenceData to access the all existing xarray-based functionality.

To implement the schema, we need to choose (at least) one representation for arrays with named dimensions and named keys and for collections of these arrays into datasets with named variables and attributes. The old solution to named dimensions and named keys was AxisArrays.jl, which MCMCChains.jl is built on. This is probably not the best way going forward, as a rich ecosystem of packages that provide either named dimensions or name keys has been developing. See JuliaCollections/AxisArraysFuture#1 for useful discussion.

In particular, it was noted on slack that AxisSets.jl provides a KeyedDataset type that contains collections of KeyedArrays from AxisArrays.jl, which wrap NamedArray from NamedDims.jl. Because AxisSets and NamedDims are going into production at Invenia labs, they are battle tested and are maintained by dedicated teams. It is also more likely then that our users will be familiar with one or more of these packages going forward. Therefore, I propose that our Julian InferenceData is a collection of Dataset objects that wrap AxisSets.jl's KeyedDataset objects.

@sethaxen
Copy link
Member Author

DimensionalData has been downloaded 50x more than AxisSets in the last 6 months and has similar functionality as xarray, so I've begun work in #191 using DimensionalData as the backing for a pure Julia InferenceData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant