-
-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added methods to convert InferenceData to and from Zarr [docs] #1518
Conversation
Hi, great idea to use zarr. I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added initial comments. Looks great.
# Create zarr group in store with same group name | ||
getattr(self, group).to_zarr(store=store, group=group, mode="w") | ||
|
||
return zarr.open(store) # Open store to get overarching group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does zarr handle open files ?
Are they closed manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand it only keeps metafiles to the location of the files and loads them on demand.
From the documentation:
Files are only held open while they are being read or written and are
closed immediately afterwards, so there is no need to manually close any files.
FYI: There is a great talk from one of the developers here.
There is linting error so add this to top of the file
Also variable |
Hi, for code style, code structure and import order stuff from project folder you can do
black -> reformat code The settings used by the tools can be found in pyproject.toml / pylint.rc (if you are interested what they are doing) |
Oh yes, we probably don't use isort officially, but it gets the job done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! I have added some comments but nothing big.
Regarding tests, my guess is that following https://github.com/arviz-devs/arviz/blob/master/arviz/tests/base_tests/test_data.py#L1194 but with zarr will be enough.
Codecov Report
@@ Coverage Diff @@
## main #1518 +/- ##
==========================================
- Coverage 91.14% 91.10% -0.04%
==========================================
Files 105 105
Lines 11342 11382 +40
==========================================
+ Hits 10338 10370 +32
- Misses 1004 1012 +8
Continue to review full report at Codecov.
|
Im not opposed to this, just curious. Why would people want to use zarr over hdf5? I didnt see anything on their docs that explained it so curious if you know |
I think this presentation is a good summary: https://zarr-developers.github.io/slides/scipy-2019.html (note to those not familiar with RISE/reveal.js, press space for next slide) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests! This is looking really good
arviz/tests/base_tests/test_data.py
Outdated
@@ -1246,6 +1249,86 @@ def test_empty_inference_data_object(self): | |||
assert not os.path.exists(filepath) | |||
|
|||
|
|||
class TestDataZarr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what would be the best approach here (pinging @canyon289 and @ahartikainen to see what they think), but we should either create a test_data_zarr.py
or add a class level skipif. As zarr is an optional dependency, we should make sure its test are skipped if it's not installed instead of failing the test suite. Something similar to what we do with numba: https://github.com/arviz-devs/arviz/blob/master/arviz/tests/base_tests/test_diagnostics_numba.py#L16 and bokeh https://github.com/arviz-devs/arviz/blob/master/arviz/tests/base_tests/test_plots_bokeh.py#L48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Idea was quite easy to add 👍 Haven't created a new file though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Oriol that a pytest marker should be used to skip tests if zarr isnt present instead of failing the test suite.
A separate file would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint can be quite a headache, I hope this helps fixing it so we can merge :)
|
||
import numpy as np | ||
import pytest | ||
import zarr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work because it will raise an exception here before getting to the skipif
Thank you, I was starting do get crazy :) Well, now there is an azure error in precompile models... I don't think I changed something related to that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, now there is an azure error in precompile models... I don't think I changed something related to that.
You did not, let's see if it still happens tomorrow and if it does we'll try to see why it it, sometimes it's only azures fault or some release lag in related packages (i.e. tf probability releasing a couple days later than tf).
We do have to solve the import error I commented above. Currently test_data_zarr
imports zarr and then checks if its available in order to skip the tests. Therefore, if zarr is not installed, pylint will raise an import error instead of skipping the tests. It should be imported below and make pylint ignore the "wrong" order.
True, we can try to move zarr after the pytest statement and disable pylint. Not sure if this is the right approach though. pytestmark = pytest.mark.skipif( # pylint: disable=invalid-name
importlib.util.find_spec("zarr") is None and not running_on_ci(),
reason="test requires zarr which is not installed",
)
import zarr #pylint: disable=wrong-import-position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge, thank you so much! I hope it was an enjoyable and enriching experience
I will wait until we figure out what is happening with CI and then merge
Yes it was quite telling, learned a lot about the azur pipeline. Thanks for taking your time 👍 |
to a hierarchical zarr group.
- added reference to zarr docs - replaced type with isInstance - replaced MemoryStore with TempStore
- Moved MutableMapping to top - Check if zarr is installed - Removed ifs observed_data,constant_data,predictions_constant_data - Renamed g to zarr_handle Removed depreciated docstring
Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com>
It is working locally without errors
Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>
Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>
Thanks @semohr! |
) * Added a to_zarr method which converts the InferenceData object to a hierarchical zarr group. * Added a from_zarr method which create an inferenceData object from a zarr store. * Fixed small typo in to_zarr and from_zarr * Added the ability to create InferenceData from zarr.hierarchy.Group * Oversight in zarr.hierachy.group.groups() generator call. * Forgot dictionary definition. * Cleanup zarr: - added reference to zarr docs - replaced type with isInstance - replaced MemoryStore with TempStore * Added to methods to CHANGELOG * PR-Comments: - Moved MutableMapping to top - Check if zarr is installed - Removed ifs observed_data,constant_data,predictions_constant_data - Renamed g to zarr_handle Removed depreciated docstring * Removed typo * Replaced last occurence of g with zarr_handle * Added from packaging import version * Fixed wrong import order, I did not know this is a thing in python, wow * Even later import of version * Yet another try to fix the import order * Local pylint is working with this formatting... * Docstring formatting changes and removed deprecated parts. * Fixed local black version mismatch. * Update arviz/data/inference_data.py Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com> * Added tests and docs intersphinx_mapping. * Improved test coverage for to_zarr and from_zarr functions. * Added pytest.mark.skipif to zarr test class * Moved test class to new file called 'test_data_zarr' * Fixed small pylint wrong-import-position error * Yet another import-order fix * Pylint is still black magic for me... It is working locally without errors * Update arviz/tests/base_tests/test_data_zarr.py Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> * Removed running_on_ci * Reverted last change and removed running_on_ci in right place now * Moved zarr import down and added `# pylint: disable=wrong-import-position` * Update arviz/tests/base_tests/test_data_zarr.py Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com> Co-authored-by: Ari Hartikainen <ahartikainen@users.noreply.github.com> Co-authored-by: Oriol Abril-Pla <oriol.abril.pla@gmail.com>
Description
This pull request adds two methods to the 'InferenceData' class:
.from_zarr()
to construct an 'InferenceData' object via xarrays.open_zarr
method..to_zarr()
to convert a zarr 'store' or a path to an 'InferenceData' objectThese two methods are quite useful to save and restore 'InferenceData' objects to disk with zarr. Zarr is still experimental in xarray, but I think it could be a good addition.
For the implementation, I tried to use a similar approach as in the
from_dataframe
andto_dataframe
methods. I also tried to add the changes to the changelog, please take a look. I would appreciate if someone could assist me with writing tests for these methods.Best,
Sebastian
References
Zarr
Checklist