-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dimension_separator
for Python storage classes (See #715)
#716
Conversation
…developers#715) * All top-level storage classes now take an optional `dimension_separator` parameter which defaults to `None`, but can also be `.` or `/`. * A ValueError is raised at normalization time if this is not the case. * `None`s are normalized to the default of `.` in all except the NestedDirectoryStore case. * The value is stored as `self._dimension_separator` on participating classes so that array creation can lookup the value. * This value deprecates the `key_separator` value from FSStore. * Wrapper classes like LRUCacheStore and ConsolidatedMetadataStore *do not* follow this pattern and instead rely on the value in the underlying store.
All hexdigest tests were failing due to updated array metadata. In the case of NestedDirectoryStore and N5Store, this is necessary. If the dimension_separator key is excluded from the .zarray JSON when None, then most standard tests continue to pass.
Latest commit reverts the proactive saving of |
Codecov Report
@@ Coverage Diff @@
## master #716 +/- ##
========================================
Coverage 99.94% 99.94%
========================================
Files 28 28
Lines 10328 10433 +105
========================================
+ Hits 10322 10427 +105
Misses 6 6
|
Hello @joshmoore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-24 09:13:49 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barring a few minor cleanups (and beating codecov into submission!), this is largely ready for review. Most of the changes are propagating constructor arguments and setting the appropriate defaults under various conditions. I've highlighted one particular issue in this review that might be worth discussion.
Any @zarr-developers/core-devs up for having a first pass?
self.key_separator = "." | ||
|
||
# Pass attributes to array creation | ||
self._dimension_separator = dimension_separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This underscore attribute is now basically API for passing information to array creation. This may be worth re-evaluation.
Thanks for the suggestions, @Carreau. Changes pushed. From the call discussion, there were no objections to this work moving forward so please consider this a last call (incl. if adjustments to the spec wording from #715 are needed) The plan will be to get this merged and started preparing for a v2-specific 2.8.0 on top of which @Carreau can rebase his base store work (#612) Again: @zarr-developers/core-devs please speak up if there are any concerns. |
Moving forward with zarr-python 2.8.0 later today if there are no further comments. |
Note: one outstanding issue has been filed separately: #722 |
Re: zarr-developers/zarr-python#716 The Zarr version 2 spec has been extended to include the ability to choose the dimension separator in chunk name keys. The legal separators has been extended from {'.'} to {'.' '/'}. So now it is possible to use a key like "0/1/2/0" for chunk names. This PR implements this for NCZarr. The V2 spec now says that this separator can be set on a per-variable basis. For now, I have chosen to allow this be set only globally by adding a key named "ZARR.DIMENSION_SEPARATOR=<char>" in the .daprc/.dodsrc/ncrc file. Currently, the only legal separator characters are '.' (the default) and '/'. On writing, this key will only be written if its value is different than the default. This change caused problems because supporting a separator of '/' is difficult to parse when keys/paths use '/' as the path separator. A test case was added for this. Additionally, make nczarr be enabled default by default. This required some additional changes so that if zip and/or AWS S3 sdk are unavailable, then they are disabled for NCZarr. In addition the following unrelated changes were made. 1. Tested that pure-zarr mode could read an nczarr formatted store. 1. The .rc file handling now merges all known .rc files (.ncrc,.daprc, and .dodsrc) in that order and using those in HOME first, then in current directory. For duplicate entries, the later ones override the earlier ones. This change is to remove some of the conflicts inherent in the current .rc file load process. A set of test cases was also added. 1. Re-order tests in configure.ac and CMakeLists.txt so that if libcurl is not found then the other options that depend upon it properly are disabled. 1. I decided that xarray support should be enabled by default for pure zarr. In order to allow disabling, I added a new mode flag "noxarray". 1. Certain test in nczarr_test depend on use of .dodsrc. In order for these to work when testing in parallel, some inter-test dependencies needed to be added. 1. Improved authorization testing to use changes in thredds.ucar.edu
>=2.8.0 required for dimension_separator support: zarr-developers/zarr-python#716
Implements the new optional array metadata key,
dimension_separator
, from #715 for the Python library.close: #530
Details
All top-level storage classes now take an optional
dimension_separator
parameter which defaults to
None
, but can also be.
or/
.A ValueError is raised at normalization time if this is not the case.
None
s are normalized to the default of.
in all except theNestedDirectoryStore case.
The value is stored as
self._dimension_separator
on participating classesso that array creation can lookup the value.
This value deprecates the
key_separator
value from FSStore.Wrapper classes like LRUCacheStore and ConsolidatedMetadataStore do not
follow this pattern and instead rely on the value in the underlying store.
Discussion
The major issues I see with this is the propagation of "Yet Another Constructor Argument". A more complete proposal might take all array initialization parameters similar to those which might be stored in the entry-level metadata in v3.
Additionally, this only stores the flat separator (
.
) in.zarray
if the user actively requests it. Since the value is optional, implementations will default to that value anyway.TODO: