Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DirectoryStore #773

Merged
merged 12 commits into from
Aug 20, 2021
Merged

Fix DirectoryStore #773

merged 12 commits into from
Aug 20, 2021

Conversation

joshmoore
Copy link
Member

fix #769

Permit DirectoryStore to open Zarr filesets which have been saved with {"dimension_separator": "/"}. Currently they silently fallback to using the fill_value.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Previous tests (now commented out) used logic in the store
classes to convert "0/0" keys into "0.0" keys, forcing the
store to be aware of array details. This tries to swap the
logic so that stores are responsible for passing dimension
separator values down to the arrays only. Since arrays can
also get the dimension_separator value from a .zarray file
they are now in charge.
@codecov
Copy link

codecov bot commented Jun 14, 2021

Codecov Report

Merging #773 (a5f1811) into master (ce04aaa) will decrease coverage by 0.08%.
The diff coverage is 98.41%.

@@            Coverage Diff             @@
##           master     #773      +/-   ##
==========================================
- Coverage   99.94%   99.85%   -0.09%     
==========================================
  Files          30       31       +1     
  Lines       10586    10613      +27     
==========================================
+ Hits        10580    10598      +18     
- Misses          6       15       +9     
Impacted Files Coverage Δ
zarr/tests/test_dim_separator.py 97.56% <97.56%> (ø)
zarr/core.py 100.00% <100.00%> (ø)
zarr/n5.py 100.00% <100.00%> (ø)
zarr/storage.py 99.30% <100.00%> (-0.70%) ⬇️
zarr/tests/test_storage.py 100.00% <100.00%> (ø)

@joshmoore
Copy link
Member Author

Build failures are unrelated. See #774

@@ -1144,10 +1165,10 @@ def test_chunk_nesting(self):
# any path where last segment looks like a chunk key gets special handling
store['0.0'] = b'xxx'
assert b'xxx' == store['0.0']
assert b'xxx' == store['0/0']
# assert b'xxx' == store['0/0']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we loose there? Doesn't the store normalise "." -> "/" anyway?

Copy link
Member Author

@joshmoore joshmoore Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to describe in the commit message on e183566, at least to the best of my understanding:

Previous tests (now commented out) used logic in the store classes to convert "0/0" keys into "0.0" keys, forcing the store to be aware of array details. This tries to swap the logic so that stores are responsible for passing dimension separator values down to the arrays only. Since arrays can also get the dimension_separator value from a .zarray file they are now in charge.

@jakirkham
Copy link
Member

Build failures are unrelated. See #774

That's merged. Would merge master back into this PR (as I don't think GH Actions does that itself)

@pep8speaks
Copy link

pep8speaks commented Jun 17, 2021

Hello @joshmoore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-19 13:55:46 UTC

@joshmoore
Copy link
Member Author

Pushed more tests based on the blurb from @martindurant in #769. Note that without the fix in this PR these are the failures:

zarr/tests/test_dim_separator.py::test_nested[static_flat] FAILED
zarr/tests/test_dim_separator.py::test_nested[directory_flat] FAILED
zarr/tests/test_dim_separator.py::test_nested[directory_default] FAILED
zarr/tests/test_dim_separator.py::test_nested[fs_flat] FAILED
zarr/tests/test_dim_separator.py::test_nested[fs_default] FAILED

@joshmoore joshmoore marked this pull request as ready for review August 17, 2021 11:55
@joshmoore joshmoore mentioned this pull request Aug 17, 2021
5 tasks
@@ -1952,7 +1952,7 @@ def _process_for_setitem(self, ckey, chunk_selection, value, fields=None):
return self._encode_chunk(chunk)

def _chunk_key(self, chunk_coords):
return self._key_prefix + '.'.join(map(str, chunk_coords))
return self._key_prefix + self._dimension_separator.join(map(str, chunk_coords))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this change is compatible with FSStore._normalize_key? https://github.com/zarr-developers/zarr-python/blob/master/zarr/storage.py#L1076

FSStore._normalize_key assumes that all chunk keys are formatted foo/bar/0.0.0 -- this assumption is the basis of splitting the chunk key into a prefix and a chunk ID via key.split('/'). As I understand it, this change breaks this assumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the flat and nested fixtures from this repo (zarr.open(f"file:///tmp/{x}")[:]) with some sloppy debugging in place shows:

|               | Array._chunk_key | FSStore._normalize_keys |
|---------------|------------------|-------------------------|
| master:nested |  (0, 0) --> 0.0  |       0.0 --> 0.0       |
| master:flat   |  (0, 0) --> 0.0  |       0.0 --> 0.0       |
| PR:nested     |  (0, 0) --> 0/0  |       0/0 --> 0/0       |
| PR:flat       |  (0, 0) --> 0.0  |       0.0 --> 0.0       |

which likely points to some logic in FSStore being ripe for removal since the Store is basically just accepting what what the Array has detected. Now, how it is that that's working with your PR, I still haven't figured out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, as your test shows this is fine for FSStore (and maybe we don't need this code in the store at all if the chunk keys come pre-normalized). But this situation is dire for N5Stores, which need to be able to re-order the chunk keys before writing to storage.

joshmoore added a commit to joshmoore/zarr-python that referenced this pull request Aug 19, 2021
This allows N5 to detect the split between key and chunks
and pre-process them (re-ordering and changing the separator).

see: zarr-developers#773 zarr-developers#793
@joshmoore
Copy link
Member Author

See the extended discussion with @d-v-b at https://gitter.im/zarr-developers/community?at=611e5f31a1ffab59400448cc. Current proposal is to have N5 stores act like they are still using "." internally in order to allow them to differentiate between Zarr-like keys and N5-like keys and then convert to "/"-based keys just before writing. A result of this is that you can hack access to the true N5 locations by using "/"-based keys as questioned by @martindurant in #773 (comment)

joshmoore added a commit to joshmoore/zarr_implementations that referenced this pull request Aug 19, 2021
@joshmoore
Copy link
Member Author

Please see the additional evidence of zarr-developers/zarr_implementations#47 (and please ignore the code coverage failure behind the curtain... 🧙🏽 )

@joshmoore
Copy link
Member Author

Taking @d-v-b's 👍 on #773 (comment) as a sign-off. Merging and starting to prep a release.

@joshmoore joshmoore merged commit da88aa3 into zarr-developers:master Aug 20, 2021
@joshmoore joshmoore deleted the fix-dstore branch August 20, 2021 13:33
@joshmoore joshmoore mentioned this pull request Aug 20, 2021
joshmoore added a commit that referenced this pull request Sep 19, 2021
* Drop skip_if_nested_chunks from test_storage.py

* Add failing nested test

* Make DirectoryStore dimension_separator aware

* Migrate key logic to core rather than storage

Previous tests (now commented out) used logic in the store
classes to convert "0/0" keys into "0.0" keys, forcing the
store to be aware of array details. This tries to swap the
logic so that stores are responsible for passing dimension
separator values down to the arrays only. Since arrays can
also get the dimension_separator value from a .zarray file
they are now in charge.

* Fix linting in new test

* Extend the test suite for dim_sep

* add n5fsstore and tests

* slightly smarter kwarg interception

* remove outdated unittest ref and fix the name of a test func

* fix massive string block and fix default key_separator kwarg for FSStore

* flake8

* promote n5store to toplevel import and fix examples in docstring

* Try fsspec 2021.7 (see #802)

* Revert "Try fsspec 2021.7 (see #802)"

This reverts commit 68adca5.

* Add missing core tests for N5FSStore, and rchanges required for making them pass

* tmp: debug

* uncomment N5 chunk ordering test

* more commented tests get uncommented

* add dimension_separator to array metadata adaptor

* Revert "tmp: debug"

This reverts commit ee9cdbc.

* Attempt failed: keeping '.' and switching

* Revert "Attempt failed: keeping '.' and switching"

This reverts commit 51b3109.

* regex: attempt failed due to slight diff in files

* Revert "regex: attempt failed due to slight diff in files"

This reverts commit 3daea7c.

* N5: use "." internally for dimension separation

This allows N5 to detect the split between key and chunks
and pre-process them (re-ordering and changing the separator).

see: #773 #793

* move FSSpec import guard

* remove os.path.sep concatenation in listdir that was erroring a test, and add a mea culpa docstring about the dimension_separator for n5 stores

* resolve merge conflicts in favor of upstream

* make listdir implementation for n5fsstore look more like fsstore's listdir, and add crucial lstrip

* Update hexdigest tests for N5Stores to account for the presence of the dimension_separator keyword now present in metadata

* Add tests for dimension_separator in array meta for N5Stores

* N5FSStore: try to increase code coverage

 * Adds a test for the dimension_separator warning
 * uses the parent test_complex for listdir
 * "nocover" the import error since fsspec is ever present

* flake8

* add chunk nesting test to N5FSStore test suite

* make array_meta_key, group_meta_key, attrs_key private

* N5FSStore: Remove ImportError test

FSStore only throws ModuleNotFoundError on initialization
rather than on import. Therefore N5FSStore does the same.
If this *weren't* the case, then the import in zarr/init
would need to test the import as well, which isn't the case.

Co-authored-by: jmoore <josh@glencoesoftware.com>
Co-authored-by: Josh Moore <j.a.moore@dundee.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusion about the dimension_separator keyword
5 participants