-
Notifications
You must be signed in to change notification settings - Fork 51
Parse zarr v2 #822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse zarr v2 #822
Conversation
… V2 and V3 formats
for more information, see https://pre-commit.ci
How
|
|
Let me know when you would like a review of this @neilSchroeder ! |
|
@TomNicholas I think it's ready for a review. |
TomNicholas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @neilSchroeder ! I mostly have a bunch of small gripes 😁
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #822 +/- ##
==========================================
+ Coverage 87.71% 88.31% +0.60%
==========================================
Files 35 35
Lines 1880 1968 +88
==========================================
+ Hits 1649 1738 +89
+ Misses 231 230 -1
🚀 New features to boost your workflow:
|
|
@TomNicholas I'm ready for another review whenever you've got time |
TomNicholas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you so much @neilSchroeder !
| strategy = get_strategy(zarr_array) | ||
| chunk_map = await strategy.get_chunk_mapping(zarr_array, path) | ||
|
|
||
| if not chunk_map: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually ignore me, I think what you're done here is good.
I'm a little unclear about order of operations and whether or not these scenarios are realistic.
These scenarios are definitely plausible.
Or maybe handled differently?
There might be a way to refactor to have a few fewer levels of functions, but this is good.
|
@TomNicholas what are the next steps here? Will this be merged whenever someone has time to do the next release? Do we need another reviewer? |
Checklist
docs/releases.mdHow
zarr.pyHandles Zarr V2 StoresZarrParsershould now support both Zarr V2 and V3 stores by normalizing V2 stores to appear as V3. This approach ensures that all parsers produce V3-compatible outputs, and confines modifications tozarr.py.V2 → V3 Normalization Strategy
The parser performs a two-part normalization:
1. Chunk Key Mapping (
get_chunk_mapping_prefix)For V2 arrays:
array_name/0,array_name/0.1.22. Metadata Conversion (
get_metadata())After converting V2 metadata to V3 using
_convert_array_metadata, we have to replace thechunk_key_encoding.V2ChunkKeyEncodingin the V3 metadatazarr/xarrayseesV2ChunkKeyEncoding, it requests chunks using V2-style paths:array/0DefaultChunkKeyEncoding,zarrrequests chunks using V3-style paths:array/c/0ManifestStore.get()expects V3-style paths and usesparse_manifest_index()to extract chunk coordinatesparse_manifest_index()requires the/c/component to correctly parse the pathAdditional metadata handling
_ARRAY_DIMENSIONSattribute or generated as{array_name}_dim_{i}zarr's standard V2→V3 migration utilitiesImplementation Notes
I'm not convinced I've done a particularly elegant implementation here, but adding another class for V2 parsing didn't seem like it would be particularly extensible. Very happy to hear thoughts on perhaps a better implementation.
@TomNicholas thank you very much for your feedback, it definitely helped me wrap my head around the right approach to take here.
Edit: I've done a bit of re-design to use a strategy pattern for dispatching to parsing v2 and v3 arrays. This should make future integrations of zarr array version parsing a lot more maintainable. This is also just a lot easier to read than my original implementation. Tests and documentation are also up to date.