Added methods to convert InferenceData to and from Zarr [docs] #1518

semohr · 2021-01-22T12:03:41Z

Description

This pull request adds two methods to the 'InferenceData' class:

.from_zarr() to construct an 'InferenceData' object via xarrays .open_zarr method.
.to_zarr() to convert a zarr 'store' or a path to an 'InferenceData' object

These two methods are quite useful to save and restore 'InferenceData' objects to disk with zarr. Zarr is still experimental in xarray, but I think it could be a good addition.

For the implementation, I tried to use a similar approach as in the from_dataframe and to_dataframe methods. I also tried to add the changes to the changelog, please take a look. I would appreciate if someone could assist me with writing tests for these methods.

Someone may also need to add a mock import in sphinx for Zarr. I tried to generate the documentation locally but I got a weird runtime-error so I skipped that for now.

Best,
Sebastian

References

Zarr

Checklist

Follows official PR format
New features are properly documented (with an example if appropriate)?
Includes new or updated tests to cover the new feature
- Move tests into a different file using the same approach as for numba and bokeh
Code style correct (follows pylint and black guidelines)
Changes are listed in changelog

ahartikainen · 2021-01-22T12:09:00Z

Hi, great idea to use zarr.

I added [docs] tag for the title, so our CI will build the documentation.

ahartikainen

I added initial comments. Looks great.

arviz/data/inference_data.py

ahartikainen · 2021-01-22T12:13:23Z

arviz/data/inference_data.py

+            # Create zarr group in store with same group name
+            getattr(self, group).to_zarr(store=store, group=group, mode="w")
+
+        return zarr.open(store)  # Open store to get overarching group


How does zarr handle open files ?

Are they closed manually?

As far as I understand it only keeps metafiles to the location of the files and loads them on demand.

From the documentation:
Files are only held open while they are being read or written and are
closed immediately afterwards, so there is no need to manually close any files.

FYI: There is a great talk from one of the developers here.

ahartikainen · 2021-01-22T12:21:54Z

There is linting error so add this to top of the file

# pylint: disable=too-many-lines,too-many-public-methods

Also variable g to maybe some 2chr name

arviz/data/inference_data.py

ahartikainen · 2021-01-22T14:47:14Z

Hi, for code style, code structure and import order stuff

from project folder you can do

black arviz
isort arviz
pylint arviz

black -> reformat code
isort -> sort imports
pylint -> check codestyle

The settings used by the tools can be found in pyproject.toml / pylint.rc (if you are interested what they are doing)

arviz/data/inference_data.py

ahartikainen · 2021-01-22T14:52:46Z

Oh yes, we probably don't use isort officially, but it gets the job done.

OriolAbril

Thanks for your contribution! I have added some comments but nothing big.

Regarding tests, my guess is that following https://github.com/arviz-devs/arviz/blob/master/arviz/tests/base_tests/test_data.py#L1194 but with zarr will be enough.

arviz/data/inference_data.py

codecov · 2021-01-24T11:04:24Z

Codecov Report

Merging #1518 (dd129a6) into main (525e311) will decrease coverage by 0.03%.
The diff coverage is 80.95%.

@@            Coverage Diff             @@
##             main    #1518      +/-   ##
==========================================
- Coverage   91.14%   91.10%   -0.04%     
==========================================
  Files         105      105              
  Lines       11342    11382      +40     
==========================================
+ Hits        10338    10370      +32     
- Misses       1004     1012       +8

Impacted Files	Coverage Δ
arviz/data/inference_data.py	`83.59% <80.95%> (-0.22%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 525e311...dd129a6. Read the comment docs.

canyon289 · 2021-01-24T16:32:58Z

Im not opposed to this, just curious. Why would people want to use zarr over hdf5? I didnt see anything on their docs that explained it so curious if you know

OriolAbril · 2021-01-24T16:53:00Z

I think this presentation is a good summary: https://zarr-developers.github.io/slides/scipy-2019.html (note to those not familiar with RISE/reveal.js, press space for next slide)

OriolAbril

Thanks for the tests! This is looking really good

OriolAbril · 2021-01-24T17:02:55Z

arviz/tests/base_tests/test_data.py

@@ -1246,6 +1249,86 @@ def test_empty_inference_data_object(self):
        assert not os.path.exists(filepath)


+class TestDataZarr:


Not sure what would be the best approach here (pinging @canyon289 and @ahartikainen to see what they think), but we should either create a test_data_zarr.py or add a class level skipif. As zarr is an optional dependency, we should make sure its test are skipped if it's not installed instead of failing the test suite. Something similar to what we do with numba: https://github.com/arviz-devs/arviz/blob/master/arviz/tests/base_tests/test_diagnostics_numba.py#L16 and bokeh https://github.com/arviz-devs/arviz/blob/master/arviz/tests/base_tests/test_plots_bokeh.py#L48

Good Idea was quite easy to add 👍 Haven't created a new file though.

Agree with Oriol that a pytest marker should be used to skip tests if zarr isnt present instead of failing the test suite.

A separate file would be nice

OriolAbril

pylint can be quite a headache, I hope this helps fixing it so we can merge :)

OriolAbril · 2021-01-25T20:04:49Z

arviz/tests/base_tests/test_data_zarr.py

+
+import numpy as np
+import pytest
+import zarr


This won't work because it will raise an exception here before getting to the skipif

arviz/tests/base_tests/test_data.py

arviz/tests/base_tests/test_data_zarr.py

semohr · 2021-01-26T10:21:52Z

Thank you, I was starting do get crazy :) Well, now there is an azure error in precompile models... I don't think I changed something related to that.

OriolAbril

Well, now there is an azure error in precompile models... I don't think I changed something related to that.

You did not, let's see if it still happens tomorrow and if it does we'll try to see why it it, sometimes it's only azures fault or some release lag in related packages (i.e. tf probability releasing a couple days later than tf).

We do have to solve the import error I commented above. Currently test_data_zarr imports zarr and then checks if its available in order to skip the tests. Therefore, if zarr is not installed, pylint will raise an import error instead of skipping the tests. It should be imported below and make pylint ignore the "wrong" order.

semohr · 2021-01-26T10:44:21Z

True, we can try to move zarr after the pytest statement and disable pylint. Not sure if this is the right approach though.

pytestmark = pytest.mark.skipif(  # pylint: disable=invalid-name
    importlib.util.find_spec("zarr") is None and not running_on_ci(),
    reason="test requires zarr which is not installed",
)

import zarr #pylint: disable=wrong-import-position

arviz/tests/base_tests/test_data_zarr.py

OriolAbril

Looks good to merge, thank you so much! I hope it was an enjoyable and enriching experience

I will wait until we figure out what is happening with CI and then merge

semohr · 2021-01-26T11:29:17Z

Yes it was quite telling, learned a lot about the azur pipeline. Thanks for taking your time 👍

to a hierarchical zarr group.

…zarr store.

- added reference to zarr docs - replaced type with isInstance - replaced MemoryStore with TempStore

- Moved MutableMapping to top - Check if zarr is installed - Removed ifs observed_data,constant_data,predictions_constant_data - Renamed g to zarr_handle Removed depreciated docstring

Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com>

It is working locally without errors

Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>

…tion`

Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>

OriolAbril · 2021-02-06T16:36:23Z

Thanks @semohr!

) * Added a to_zarr method which converts the InferenceData object to a hierarchical zarr group. * Added a from_zarr method which create an inferenceData object from a zarr store. * Fixed small typo in to_zarr and from_zarr * Added the ability to create InferenceData from zarr.hierarchy.Group * Oversight in zarr.hierachy.group.groups() generator call. * Forgot dictionary definition. * Cleanup zarr: - added reference to zarr docs - replaced type with isInstance - replaced MemoryStore with TempStore * Added to methods to CHANGELOG * PR-Comments: - Moved MutableMapping to top - Check if zarr is installed - Removed ifs observed_data,constant_data,predictions_constant_data - Renamed g to zarr_handle Removed depreciated docstring * Removed typo * Replaced last occurence of g with zarr_handle * Added from packaging import version * Fixed wrong import order, I did not know this is a thing in python, wow * Even later import of version * Yet another try to fix the import order * Local pylint is working with this formatting... * Docstring formatting changes and removed deprecated parts. * Fixed local black version mismatch. * Update arviz/data/inference_data.py Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com> * Added tests and docs intersphinx_mapping. * Improved test coverage for to_zarr and from_zarr functions. * Added pytest.mark.skipif to zarr test class * Moved test class to new file called 'test_data_zarr' * Fixed small pylint wrong-import-position error * Yet another import-order fix * Pylint is still black magic for me... It is working locally without errors * Update arviz/tests/base_tests/test_data_zarr.py Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> * Removed running_on_ci * Reverted last change and removed running_on_ci in right place now * Moved zarr import down and added `# pylint: disable=wrong-import-position` * Update arviz/tests/base_tests/test_data_zarr.py Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com> Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>

ahartikainen changed the title ~~Added methods to convert InferenceData to and from Zarr~~ Added methods to convert InferenceData to and from Zarr [docs] Jan 22, 2021

ahartikainen reviewed Jan 22, 2021

View reviewed changes

arviz/data/inference_data.py Outdated Show resolved Hide resolved

ahartikainen reviewed Jan 22, 2021

View reviewed changes

arviz/data/inference_data.py Show resolved Hide resolved

ahartikainen reviewed Jan 22, 2021

View reviewed changes

arviz/data/inference_data.py Outdated Show resolved Hide resolved

OriolAbril reviewed Jan 22, 2021

View reviewed changes

arviz/data/inference_data.py Outdated Show resolved Hide resolved

arviz/data/inference_data.py Outdated Show resolved Hide resolved

arviz/data/inference_data.py Show resolved Hide resolved

OriolAbril reviewed Jan 24, 2021

View reviewed changes

OriolAbril reviewed Jan 25, 2021

View reviewed changes

OriolAbril requested changes Jan 26, 2021

View reviewed changes

OriolAbril reviewed Jan 26, 2021

View reviewed changes

arviz/tests/base_tests/test_data_zarr.py Outdated Show resolved Hide resolved

OriolAbril approved these changes Jan 26, 2021

View reviewed changes

Base automatically changed from master to main January 26, 2021 19:44

semohr added 8 commits February 6, 2021 11:46

Added a to_zarr method which converts the InferenceData object

9b293e3

to a hierarchical zarr group.

Added a from_zarr method which create an inferenceData object from a …

77f5b7b

…zarr store.

Fixed small typo in to_zarr and from_zarr

1cec2fc

Added the ability to create InferenceData from zarr.hierarchy.Group

266faea

Oversight in zarr.hierachy.group.groups() generator call.

5f324da

Forgot dictionary definition.

b54cb23

Cleanup zarr:

738d678

- added reference to zarr docs - replaced type with isInstance - replaced MemoryStore with TempStore

Added to methods to CHANGELOG

d36d93a

semohr and others added 23 commits February 6, 2021 11:46

PR-Comments:

b09a189

- Moved MutableMapping to top - Check if zarr is installed - Removed ifs observed_data,constant_data,predictions_constant_data - Renamed g to zarr_handle Removed depreciated docstring

Removed typo

cce3fe4

Replaced last occurence of g with zarr_handle

6c5162e

Added from packaging import version

c6bb143

Fixed wrong import order, I did not know this is a thing in python, wow

b243023

Even later import of version

e64172f

Yet another try to fix the import order

0b9c911

Local pylint is working with this formatting...

c2841c0

Docstring formatting changes and removed deprecated parts.

427b726

Fixed local black version mismatch.

24448cd

Update arviz/data/inference_data.py

928a8d0

Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com>

Added tests and docs intersphinx_mapping.

3b19249

Improved test coverage for to_zarr and from_zarr functions.

3cf59e5

Added pytest.mark.skipif to zarr test class

6b272b2

Moved test class to new file called 'test_data_zarr'

d2f68d4

Fixed small pylint wrong-import-position error

750b441

Yet another import-order fix

bbc1e87

Pylint is still black magic for me...

6412212

It is working locally without errors

Update arviz/tests/base_tests/test_data_zarr.py

e7d7956

Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>

Removed running_on_ci

6a9635a

Reverted last change and removed running_on_ci in right place now

155751b

Moved zarr import down and added `# pylint: disable=wrong-import-posi…

8688ba6

…tion`

Update arviz/tests/base_tests/test_data_zarr.py

dd129a6

Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>

OriolAbril force-pushed the master branch from 3378849 to dd129a6 Compare February 6, 2021 10:48

OriolAbril merged commit 2be3611 into arviz-devs:main Feb 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added methods to convert InferenceData to and from Zarr [docs] #1518

Added methods to convert InferenceData to and from Zarr [docs] #1518

semohr commented Jan 22, 2021 •

edited

Loading

ahartikainen commented Jan 22, 2021

ahartikainen left a comment

ahartikainen Jan 22, 2021

semohr Jan 22, 2021

ahartikainen commented Jan 22, 2021

ahartikainen commented Jan 22, 2021 •

edited

Loading

ahartikainen commented Jan 22, 2021

OriolAbril left a comment

codecov bot commented Jan 24, 2021 •

edited

Loading

canyon289 commented Jan 24, 2021

OriolAbril commented Jan 24, 2021

OriolAbril left a comment

OriolAbril Jan 24, 2021

semohr Jan 25, 2021

canyon289 Jan 25, 2021

OriolAbril left a comment

OriolAbril Jan 25, 2021

semohr commented Jan 26, 2021 •

edited

Loading

OriolAbril left a comment

semohr commented Jan 26, 2021 •

edited

Loading

OriolAbril left a comment

semohr commented Jan 26, 2021

OriolAbril commented Feb 6, 2021

		@@ -1246,6 +1249,86 @@ def test_empty_inference_data_object(self):
		assert not os.path.exists(filepath)


		class TestDataZarr:

Added methods to convert InferenceData to and from Zarr [docs] #1518

Added methods to convert InferenceData to and from Zarr [docs] #1518

Conversation

semohr commented Jan 22, 2021 • edited Loading

Description

References

Checklist

ahartikainen commented Jan 22, 2021

ahartikainen left a comment

Choose a reason for hiding this comment

ahartikainen Jan 22, 2021

Choose a reason for hiding this comment

semohr Jan 22, 2021

Choose a reason for hiding this comment

ahartikainen commented Jan 22, 2021

ahartikainen commented Jan 22, 2021 • edited Loading

ahartikainen commented Jan 22, 2021

OriolAbril left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 24, 2021 • edited Loading

Codecov Report

canyon289 commented Jan 24, 2021

OriolAbril commented Jan 24, 2021

OriolAbril left a comment

Choose a reason for hiding this comment

OriolAbril Jan 24, 2021

Choose a reason for hiding this comment

semohr Jan 25, 2021

Choose a reason for hiding this comment

canyon289 Jan 25, 2021

Choose a reason for hiding this comment

OriolAbril left a comment

Choose a reason for hiding this comment

OriolAbril Jan 25, 2021

Choose a reason for hiding this comment

semohr commented Jan 26, 2021 • edited Loading

OriolAbril left a comment

Choose a reason for hiding this comment

semohr commented Jan 26, 2021 • edited Loading

OriolAbril left a comment

Choose a reason for hiding this comment

semohr commented Jan 26, 2021

OriolAbril commented Feb 6, 2021

semohr commented Jan 22, 2021 •

edited

Loading

ahartikainen commented Jan 22, 2021 •

edited

Loading

codecov bot commented Jan 24, 2021 •

edited

Loading

semohr commented Jan 26, 2021 •

edited

Loading

semohr commented Jan 26, 2021 •

edited

Loading