Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dimension_separator for Python storage classes (See #715) #716

Merged
merged 17 commits into from
Apr 24, 2021

Conversation

joshmoore
Copy link
Member

@joshmoore joshmoore commented Apr 15, 2021

Implements the new optional array metadata key, dimension_separator, from #715 for the Python library.

close: #530

Details

  • All top-level storage classes now take an optional dimension_separator
    parameter which defaults to None, but can also be . or /.

  • A ValueError is raised at normalization time if this is not the case.

  • Nones are normalized to the default of . in all except the
    NestedDirectoryStore case.

  • The value is stored as self._dimension_separator on participating classes
    so that array creation can lookup the value.

  • This value deprecates the key_separator value from FSStore.

  • Wrapper classes like LRUCacheStore and ConsolidatedMetadataStore do not
    follow this pattern and instead rely on the value in the underlying store.

Discussion

  • The major issues I see with this is the propagation of "Yet Another Constructor Argument". A more complete proposal might take all array initialization parameters similar to those which might be stored in the entry-level metadata in v3.

  • Additionally, this only stores the flat separator (.) in .zarray if the user actively requests it. Since the value is optional, implementations will default to that value anyway.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

…developers#715)

* All top-level storage classes now take an optional `dimension_separator`
  parameter which defaults to `None`, but can also be `.` or `/`.

* A ValueError is raised at normalization time if this is not the case.

* `None`s are normalized to the default of `.` in all except the
  NestedDirectoryStore case.

* The value is stored as `self._dimension_separator` on participating classes
  so that array creation can lookup the value.

* This value deprecates the `key_separator` value from FSStore.

* Wrapper classes like LRUCacheStore and ConsolidatedMetadataStore *do not*
  follow this pattern and instead rely on the value in the underlying store.
All hexdigest tests were failing due to updated array metadata.
In the case of NestedDirectoryStore and N5Store, this is necessary.
If the dimension_separator key is excluded from the .zarray JSON
when None, then most standard tests continue to pass.
@joshmoore
Copy link
Member Author

Latest commit reverts the proactive saving of dimension_separator metadata in order to keep hexdigest() as stable as possible.

@codecov
Copy link

codecov bot commented Apr 15, 2021

Codecov Report

Merging #716 (a60a40a) into master (4d4d833) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           master     #716    +/-   ##
========================================
  Coverage   99.94%   99.94%            
========================================
  Files          28       28            
  Lines       10328    10433   +105     
========================================
+ Hits        10322    10427   +105     
  Misses          6        6            
Impacted Files Coverage Δ
zarr/hierarchy.py 100.00% <ø> (ø)
zarr/core.py 100.00% <100.00%> (ø)
zarr/creation.py 100.00% <100.00%> (ø)
zarr/meta.py 100.00% <100.00%> (ø)
zarr/n5.py 100.00% <100.00%> (ø)
zarr/storage.py 100.00% <100.00%> (ø)
zarr/tests/test_core.py 100.00% <100.00%> (ø)
zarr/tests/test_storage.py 100.00% <100.00%> (ø)
zarr/tests/test_util.py 100.00% <100.00%> (ø)
zarr/util.py 100.00% <100.00%> (ø)

@pep8speaks
Copy link

pep8speaks commented Apr 17, 2021

Hello @joshmoore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-24 09:13:49 UTC

Copy link
Member Author

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring a few minor cleanups (and beating codecov into submission!), this is largely ready for review. Most of the changes are propagating constructor arguments and setting the appropriate defaults under various conditions. I've highlighted one particular issue in this review that might be worth discussion.

Any @zarr-developers/core-devs up for having a first pass?

zarr/storage.py Outdated Show resolved Hide resolved
self.key_separator = "."

# Pass attributes to array creation
self._dimension_separator = dimension_separator
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This underscore attribute is now basically API for passing information to array creation. This may be worth re-evaluation.

zarr/creation.py Outdated Show resolved Hide resolved
zarr/util.py Outdated Show resolved Hide resolved
@joshmoore
Copy link
Member Author

Thanks for the suggestions, @Carreau. Changes pushed.

From the call discussion, there were no objections to this work moving forward so please consider this a last call (incl. if adjustments to the spec wording from #715 are needed) The plan will be to get this merged and started preparing for a v2-specific 2.8.0 on top of which @Carreau can rebase his base store work (#612)

Again: @zarr-developers/core-devs please speak up if there are any concerns.

@joshmoore
Copy link
Member Author

Moving forward with zarr-python 2.8.0 later today if there are no further comments.

@joshmoore
Copy link
Member Author

Note: one outstanding issue has been filed separately: #722

@joshmoore joshmoore merged commit 2d0acfb into zarr-developers:master Apr 24, 2021
@joshmoore joshmoore deleted the dim-sep-impl branch April 24, 2021 09:37
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this pull request Apr 25, 2021
Re: zarr-developers/zarr-python#716

The Zarr version 2 spec has been extended to include the ability
to choose the dimension separator in chunk name keys. The legal
separators has been extended from {'.'} to {'.' '/'}.  So now it
is possible to use a key like "0/1/2/0" for chunk names.

This PR implements this for NCZarr. The V2 spec now says that
this separator can be set on a per-variable basis. For now, I
have chosen to allow this be set only globally by adding a key
named "ZARR.DIMENSION_SEPARATOR=<char>" in the
.daprc/.dodsrc/ncrc file. Currently, the only legal separator
characters are '.' (the default) and '/'. On writing, this key
will only be written if its value is different than the default.
This change caused problems because supporting a separator of '/'
is difficult to parse when keys/paths use '/' as the path separator.
A test case was added for this.

Additionally, make nczarr be enabled default by default. This required
some additional changes so that if zip and/or AWS S3 sdk are unavailable,
then they are disabled for NCZarr.

In addition the following unrelated changes were made.

1. Tested that pure-zarr mode could read an nczarr formatted store.
1. The .rc file handling now merges all known .rc files (.ncrc,.daprc, and .dodsrc) in that order and using those in HOME first, then in current directory. For duplicate entries, the later ones override the earlier ones. This change is to remove some of the conflicts inherent in the current .rc file load process. A set of test cases was also added.
1. Re-order tests in configure.ac and CMakeLists.txt so that if libcurl
   is not found then the other options that depend upon it properly
   are disabled.
1. I decided that xarray support should be enabled by default for pure
   zarr. In order to allow disabling, I added a new mode flag "noxarray".
1. Certain test in nczarr_test depend on use of .dodsrc. In order for these
   to work when testing in parallel, some inter-test dependencies needed to
   be added.
1. Improved authorization testing to use changes in thredds.ucar.edu
melissalinkert added a commit to chris-allan/isyntax2raw that referenced this pull request May 5, 2021
>=2.8.0 required for dimension_separator support: zarr-developers/zarr-python#716
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zarr.open_array does not properly recognize NestedDirectoryStore
3 participants