-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blosc compressor: numThreads serialization fix #15
Blosc compressor: numThreads serialization fix #15
Conversation
This API has the side-effect of serialization the number of threads under the numThreads key causing compatibility issues with other libraries like numcodecs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as expected when combined with glencoesoftware/bioformats2raw#203; .zarray
no longer contains a numThreads
(or nthreads
).
One other option might be to keep the getter, and annotate it with @JsonIgnore
(https://github.com/FasterXML/jackson-annotations/wiki/Jackson-Annotations#property-inclusion). That seemed to work locally at least (but agreed that more unit tests would be helpful in any case).
|
As I was wondering why this issue hadn't been reported earlier, I found out the compression constructors are not as strict as in
Support or not for unspecified key/value pairs in the
Given the impact, I would propose to focus the scope of this PR on fixing the serialization issue so that the latest version of the library generates Zarr arrays which metadata is compatible with the assumptions of the other reference libraries. |
Last commits should implement @melissalinkert suggestion of using the While adding new classes came across some inconsistencies/nomenclature questions:
|
@pedson: a heads up that once the header on the new (test) file is settled, we'll be getting this released unless you have a problem with it. |
Starting the 0.4.1 🚋 |
Fixes #14
As described in the accompanying issues, changes in 0.4.0 are causing the addition of the key
numThreads
to the compression map under.zarray
when using blosc.This serialization change causes compatibility issues with the expectations of
numcodecs
- see https://numcodecs.readthedocs.io/en/stable/blosc.html#numcodecs.blosc.Blosc. A formal specification of the blosc codec and its supported configuration values is available in the Zarr v3 specification https://zarr-specs.readthedocs.io/en/latest/v3/codecs/blosc/v1.0.html#blosc-codec-version-1-0. Although it's not 100% clear whether this dictionary should be considered as extensible, the number of threads used for compression is a writing concern which is fully independent of the reading/decompression mechanism (which might very well used different number of threads) so it seems incorrect to serialize it in the first place.Tracking down the source of the issue with @melissalinkert, it was found to be introduced in #4 more specifically via the
getNumThreads
getter method which seems to be serialized under.zarray
through thejackson-databind
ObjectMapper
API. 43cf7f3 proposes to remove the getter which suffices to fix the issue while retaining the initial feature.Opening for initial feedback, it is pretty clear that both the feature and the blosc serialization logic were missing some minimal unit tests which would be great to introduce as part of this PR so that we don't create inadvertent regressions in the future.