Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to_dict method to InferenceData object #1223

Merged
merged 18 commits into from
Aug 13, 2020

Conversation

percygautam
Copy link
Contributor

@percygautam percygautam commented Jun 4, 2020

Description

This PR adds the to_dict method to InferenceData object.

Checklist

  • Follows official PR format
  • New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

@percygautam
Copy link
Contributor Author

Currently implemented to_dict() method.
The code :

import arviz as az
data = az.load_arviz_data("rugby")
data.to_dict(data=False)

will return:

{'posterior': {'coords': {'chain': {'dims': ('chain',),
    'attrs': {},
    'dtype': 'int64',
    'shape': (4,)},
   'draw': {'dims': ('draw',), 'attrs': {}, 'dtype': 'int64', 'shape': (500,)},
   'team': {'dims': ('team',), 'attrs': {}, 'dtype': 'object', 'shape': (6,)}},
  'attrs': {'created_at': '2019-07-12T20:31:53.545143',
   'inference_library': 'pymc3',
   'inference_library_version': '3.7'},
  'dims': {'chain': 4, 'draw': 500, 'team': 6},
  'data_vars': {'home': {'dims': ('chain', 'draw'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500)},
   'intercept': {'dims': ('chain', 'draw'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500)},
   'atts_star': {'dims': ('chain', 'draw', 'team'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500, 6)},
   'defs_star': {'dims': ('chain', 'draw', 'team'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500, 6)},
   'sd_att': {'dims': ('chain', 'draw'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500)},
   'sd_def': {'dims': ('chain', 'draw'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500)},
   'atts': {'dims': ('chain', 'draw', 'team'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500, 6)},
   'defs': {'dims': ('chain', 'draw', 'team'),
    'attrs': {},
    'dtype': 'float64',
    'shape': (4, 500, 6)}}},
 'posterior_predictive': {'coords': {'chain': {'dims': ('chain',),
    'attrs': {},
    'dtype': 'int64',
    'shape': (4,)},
   'draw': {'dims': ('draw',), 'attrs': {}, 'dtype': 'int64', 'shape': (500,)},
   'match': {'dims': ('match',),
    'attrs': {},
    'dtype': 'object',
    'shape': (60,)}},
  'attrs': {'created_at': '2019-07-12T20:31:53.563854',
   'inference_library': 'pymc3',
   'inference_library_version': '3.7'},
  'dims': {'chain': 4, 'draw': 500, 'match': 60},
  'data_vars': {'home_points': {'dims': ('chain', 'draw', 'match'),
    'attrs': {},
    'dtype': 'int64',
    'shape': (4, 500, 60)},
   'away_points': {'dims': ('chain', 'draw', 'match'),
    'attrs': {},
    'dtype': 'int64',
    'shape': (4, 500, 60)}}},
 'sample_stats': {'coords': {'chain': {'dims': ('chain',),
    'attrs': {},
    'dtype': 'int64'
......
.......

@OriolAbril
Copy link
Member

I would have expected to_dict to be compatible with from_dict, so that:

dct = idata.to_dict()
idata_new = az.from_dict(**dct)
# this works and we recover the same inference data object

Let's see what everyone else thinks

@percygautam
Copy link
Contributor Author

@OriolAbril I have done the changes. Have a look!

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of the several possibilities there will be after merging #1201:

  • chains and draws equal between post and prior -> custom! i.e. chain=["A", "B"] (I would not support that)
  • chains and draws equal between post and prior, index_origin=0 (this is not an issue now, will be once we merge labeling restructure
  • chains and draws equal between post and prior, index_origin=1 (as above)
  • different chains and draws

some cases could be solved by adding chain and draw dims in resulting dict, but not all of them, there should be some kind of check on index_origin and it be returned. Here are some examples:

  • case 1: index_origin=0
    • posterior
      • chan: 0, 1, 2, 3
      • draw: np.arange(100)
    • prior
      • chain: 0
      • draw: np.arange(200)
  • case 2: index_origin=1
    • posterior
      • chan: 1, 2, 3, 4
      • draw: np.arange(1, 101)
    • prior
      • chain: 1
      • draw: np.arange(1, 201)

arviz/data/inference_data.py Outdated Show resolved Hide resolved
arviz/data/inference_data.py Outdated Show resolved Hide resolved
arviz/data/inference_data.py Outdated Show resolved Hide resolved
arviz/data/io_dict.py Outdated Show resolved Hide resolved
arviz/data/io_dict.py Outdated Show resolved Hide resolved
@percygautam percygautam changed the title Add xr.Dataset methods Add to_dict method to InferenceData object Jun 20, 2020
@codecov
Copy link

codecov bot commented Jun 20, 2020

Codecov Report

Merging #1223 into master will increase coverage by 0.03%.
The diff coverage is 90.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1223      +/-   ##
==========================================
+ Coverage   91.75%   91.79%   +0.03%     
==========================================
  Files         101      101              
  Lines       10568    10622      +54     
==========================================
+ Hits         9697     9750      +53     
- Misses        871      872       +1     
Impacted Files Coverage Δ
arviz/data/io_dict.py 92.74% <87.50%> (-0.19%) ⬇️
arviz/data/inference_data.py 83.94% <93.33%> (+0.54%) ⬆️
arviz/stats/density_utils.py 64.65% <0.00%> (ø)
arviz/data/io_tfp.py 99.03% <0.00%> (+2.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95df315...8a760c1. Read the comment docs.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of arguments in from_dict is starting to become challengingly large, shoud it be modified to take kwargs and call dict_to_dataset for all of them (checking for observed_data, constant_data to set skip_dims=[])?

This would allow users to introduce any group which may not be ideal but would make code quite easier to follow and maintain.

arviz/data/io_dict.py Show resolved Hide resolved
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with tests on this one.

arviz/data/io_dict.py Show resolved Hide resolved
@percygautam percygautam mentioned this pull request Aug 7, 2020
3 tasks
@percygautam percygautam requested a review from OriolAbril August 7, 2020 19:20
Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Are we saving attr to all groups? I''m fine with this, it is not optimal, but people handling with large attrs info should manually handle the attrs.

@OriolAbril
Copy link
Member

We currently store the version, time, ppl library as attrs to all groups, whereas things like sampling time is stored only in posterior or sample_stats

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ready to merge, only one small nit

arviz/data/io_dict.py Show resolved Hide resolved
arviz/tests/base_tests/test_data.py Outdated Show resolved Hide resolved
@ahartikainen
Copy link
Contributor

Hi, remember to git pull before working with the branch (I fixed merge conflicts)

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @ahartikainen you can merge when you see this provided that test have passed

@ahartikainen ahartikainen merged commit 0ee3c26 into arviz-devs:master Aug 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants