-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DirectoryStore #773
Fix DirectoryStore #773
Conversation
Previous tests (now commented out) used logic in the store classes to convert "0/0" keys into "0.0" keys, forcing the store to be aware of array details. This tries to swap the logic so that stores are responsible for passing dimension separator values down to the arrays only. Since arrays can also get the dimension_separator value from a .zarray file they are now in charge.
Codecov Report
@@ Coverage Diff @@
## master #773 +/- ##
==========================================
- Coverage 99.94% 99.85% -0.09%
==========================================
Files 30 31 +1
Lines 10586 10613 +27
==========================================
+ Hits 10580 10598 +18
- Misses 6 15 +9
|
Build failures are unrelated. See #774 |
@@ -1144,10 +1165,10 @@ def test_chunk_nesting(self): | |||
# any path where last segment looks like a chunk key gets special handling | |||
store['0.0'] = b'xxx' | |||
assert b'xxx' == store['0.0'] | |||
assert b'xxx' == store['0/0'] | |||
# assert b'xxx' == store['0/0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we loose there? Doesn't the store normalise "." -> "/" anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to describe in the commit message on e183566, at least to the best of my understanding:
Previous tests (now commented out) used logic in the store classes to convert "0/0" keys into "0.0" keys, forcing the store to be aware of array details. This tries to swap the logic so that stores are responsible for passing dimension separator values down to the arrays only. Since arrays can also get the dimension_separator value from a .zarray file they are now in charge.
That's merged. Would merge |
Hello @joshmoore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-08-19 13:55:46 UTC |
Pushed more tests based on the blurb from @martindurant in #769. Note that without the fix in this PR these are the failures:
|
This reverts commit 68adca5.
@@ -1952,7 +1952,7 @@ def _process_for_setitem(self, ckey, chunk_selection, value, fields=None): | |||
return self._encode_chunk(chunk) | |||
|
|||
def _chunk_key(self, chunk_coords): | |||
return self._key_prefix + '.'.join(map(str, chunk_coords)) | |||
return self._key_prefix + self._dimension_separator.join(map(str, chunk_coords)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how this change is compatible with FSStore._normalize_key
? https://github.com/zarr-developers/zarr-python/blob/master/zarr/storage.py#L1076
FSStore._normalize_key
assumes that all chunk keys are formatted foo/bar/0.0.0
-- this assumption is the basis of splitting the chunk key into a prefix and a chunk ID via key.split('/')
. As I understand it, this change breaks this assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the flat
and nested
fixtures from this repo (zarr.open(f"file:///tmp/{x}")[:]
) with some sloppy debugging in place shows:
| | Array._chunk_key | FSStore._normalize_keys |
|---------------|------------------|-------------------------|
| master:nested | (0, 0) --> 0.0 | 0.0 --> 0.0 |
| master:flat | (0, 0) --> 0.0 | 0.0 --> 0.0 |
| PR:nested | (0, 0) --> 0/0 | 0/0 --> 0/0 |
| PR:flat | (0, 0) --> 0.0 | 0.0 --> 0.0 |
which likely points to some logic in FSStore being ripe for removal since the Store is basically just accepting what what the Array has detected. Now, how it is that that's working with your PR, I still haven't figured out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, as your test shows this is fine for FSStore (and maybe we don't need this code in the store at all if the chunk keys come pre-normalized). But this situation is dire for N5Stores, which need to be able to re-order the chunk keys before writing to storage.
This allows N5 to detect the split between key and chunks and pre-process them (re-ordering and changing the separator). see: zarr-developers#773 zarr-developers#793
See the extended discussion with @d-v-b at https://gitter.im/zarr-developers/community?at=611e5f31a1ffab59400448cc. Current proposal is to have N5 stores act like they are still using |
Please see the additional evidence of zarr-developers/zarr_implementations#47 (and please ignore the code coverage failure behind the curtain... 🧙🏽 ) |
Taking @d-v-b's 👍 on #773 (comment) as a sign-off. Merging and starting to prep a release. |
* Drop skip_if_nested_chunks from test_storage.py * Add failing nested test * Make DirectoryStore dimension_separator aware * Migrate key logic to core rather than storage Previous tests (now commented out) used logic in the store classes to convert "0/0" keys into "0.0" keys, forcing the store to be aware of array details. This tries to swap the logic so that stores are responsible for passing dimension separator values down to the arrays only. Since arrays can also get the dimension_separator value from a .zarray file they are now in charge. * Fix linting in new test * Extend the test suite for dim_sep * add n5fsstore and tests * slightly smarter kwarg interception * remove outdated unittest ref and fix the name of a test func * fix massive string block and fix default key_separator kwarg for FSStore * flake8 * promote n5store to toplevel import and fix examples in docstring * Try fsspec 2021.7 (see #802) * Revert "Try fsspec 2021.7 (see #802)" This reverts commit 68adca5. * Add missing core tests for N5FSStore, and rchanges required for making them pass * tmp: debug * uncomment N5 chunk ordering test * more commented tests get uncommented * add dimension_separator to array metadata adaptor * Revert "tmp: debug" This reverts commit ee9cdbc. * Attempt failed: keeping '.' and switching * Revert "Attempt failed: keeping '.' and switching" This reverts commit 51b3109. * regex: attempt failed due to slight diff in files * Revert "regex: attempt failed due to slight diff in files" This reverts commit 3daea7c. * N5: use "." internally for dimension separation This allows N5 to detect the split between key and chunks and pre-process them (re-ordering and changing the separator). see: #773 #793 * move FSSpec import guard * remove os.path.sep concatenation in listdir that was erroring a test, and add a mea culpa docstring about the dimension_separator for n5 stores * resolve merge conflicts in favor of upstream * make listdir implementation for n5fsstore look more like fsstore's listdir, and add crucial lstrip * Update hexdigest tests for N5Stores to account for the presence of the dimension_separator keyword now present in metadata * Add tests for dimension_separator in array meta for N5Stores * N5FSStore: try to increase code coverage * Adds a test for the dimension_separator warning * uses the parent test_complex for listdir * "nocover" the import error since fsspec is ever present * flake8 * add chunk nesting test to N5FSStore test suite * make array_meta_key, group_meta_key, attrs_key private * N5FSStore: Remove ImportError test FSStore only throws ModuleNotFoundError on initialization rather than on import. Therefore N5FSStore does the same. If this *weren't* the case, then the import in zarr/init would need to test the import as well, which isn't the case. Co-authored-by: jmoore <josh@glencoesoftware.com> Co-authored-by: Josh Moore <j.a.moore@dundee.ac.uk>
fix #769
Permit DirectoryStore to open Zarr filesets which have been saved with
{"dimension_separator": "/"}
. Currently they silently fallback to using thefill_value
.TODO: