Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Dataset and InferenceData using DimensionalData #191

Merged
merged 133 commits into from
Jul 9, 2022
Merged

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Jun 22, 2022

This PR reimplements Dataset to be a DimensionalData.AbstractDimStack that behaves identically to DimensionalData.DimStack, and InferenceData as a keyed collection of Datasets. DimensionalData is the closest thing in the Julia ecosystem to an xarray replacement, in that it has a structured like xarray.Dataset. Its API is quite different from xarray's, so this is a major breaking change.

Additional major differences:

  • concat! has been removed from the API and package, as InferenceData is now immutable
  • dims and coords can in general be any collection indexed by symbols, generally Dict{Symbol} or NamedTuple. However, complete type-inferrability is only possible with NamedTuple. For Python functions, however, Dicts are better behaved, since PyCall doesn't map NamedTuples to python's dict type.
  • Symbols should be used in most places instead of strings. Exceptions are when the string corresponds to a coordinate/index or a plotting label.

InferenceData and Dataset can be transparently converted to arviz.InferenceData and xarray.Dataset with no copying of data, so there is no appreciable loss in efficiency by having the storage be in Julia.

Fixes #128 and #141

src/dataset.jl Outdated Show resolved Hide resolved
@sethaxen
Copy link
Member Author

Here's an example of how this looks:

julia> using ArviZ

julia> idata = load_arviz_data(:radon)
InferenceData with groups:
    > posterior
    > posterior_predictive
    > log_likelihood
    > sample_stats
    > prior
    > prior_predictive
    > observed_data
    > constant_data

julia> idata.posterior
Dataset with dimensions: 
  Dim{:chain} Sampled Int64[0, 1, 2, 3] ForwardOrdered Irregular Points,
  Dim{:draw} Sampled Int64[0, 1, , 498, 499] ForwardOrdered Irregular Points,
  Dim{:g_coef} Sampled PyCall.PyObject[PyObject 'intercept', PyObject 'slope'] ForwardOrdered Irregular Points,
  Dim{:County} Sampled PyCall.PyObject[PyObject 'AITKIN', PyObject 'ANOKA', , PyObject 'WRIGHT', PyObject 'YELLOW MEDICINE'] ForwardOrdered Irregular Points
and 7 layers:
  :g         Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:g_coef} (4×500×2)
  :za_county Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:County} (4×500×85)
  :b         Float64 dims: Dim{:chain}, Dim{:draw} (4×500)
  :sigma_a   Float64 dims: Dim{:chain}, Dim{:draw} (4×500)
  :a         Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:County} (4×500×85)
  :a_county  Float64 dims: Dim{:chain}, Dim{:draw}, Dim{:County} (4×500×85)
  :sigma     Float64 dims: Dim{:chain}, Dim{:draw} (4×500)

with metadata Dict{Symbol, Any} with 6 entries:
  :inference_library_version => "3.9.2"
  :sampling_time             => 18.097
  :tuning_steps              => 1000
  :created_at                => "2020-07-24T18:15:12.191355"
  :arviz_version             => "0.9.0"
  :inference_library         => "pymc3"

There are some numpy dtypes that just never seem to be converted on the Julia side, and this seems unlikely to change soon. This can be avoided if we have a converter from netcdf directly to our InferenceData.

@sethaxen sethaxen marked this pull request as ready for review July 3, 2022 11:54
@ahartikainen
Copy link

Are still going to have a concat function that creates a new idata object?

@sethaxen
Copy link
Member Author

sethaxen commented Jul 3, 2022

Are still going to have a concat function that creates a new idata object?

Yes, concat is still a part of the API, with the in-place option disabled. We also implement Base.merge, which is just used to add/replace groups and doesn't recur into the groups at all.

@sethaxen sethaxen merged commit 1f64237 into main Jul 9, 2022
@sethaxen sethaxen deleted the dimdata branch July 9, 2022 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement pure Julia InferenceData scheme
3 participants