Add N5 Support #309

funkey · 2018-10-18T22:09:17Z

This adds support to read and write from and to N5 containers. The N5Store handling the conversion between the zarr and N5 format will automatically be selected whenever the path for a container ends in .n5 (similar to how the ZipStore is used for files ending in .zip).

The conversion is done mostly transparently, with one exception being the N5ChunkWrapper: This is a Codec with id n5_wrapper that will automatically be wrapped around the requested compressor. For example, if you create an array with a zlib compressor, in fact the n5_wrapper codec will be used that delegates to the zlib codec internally. The additional codec was necessary to introduce N5's chunk headers and ensure big endian storage.

In a related note, gzip compressed N5 arrays can at the moment not be read, since numcodecs treats zlib and gzip as synonyms, which they aren't (their compression headers differ). PR zarr-developers/numcodecs#87 solves this issue.

See also https://github.com/zarr-developers/zarr/issues/231

TODO:

https://github.com/saalfeldlab/n5/blob/45cb33ef6bd97fe6e872765a0bcfcc9e2a04591e/README.md

This requires numcodecs GZip support to be compatible with the N5 standard. Added with 410af66be0ea470d77b923e516d63cf5238114db to numcodecs.

zarr/tests/test_storage.py

As we are only using the logger to issue warnings, just issue warnings instead and drop the logger. Picked `RuntimeWarning`s for these two warnings as they are in regard to "dubious runtime behavior". Namely using compressors that are not explicitly supported by many N5 implementations to write N5 files.

Make sure that `N5Store` is raising `RuntimeWarnings` for some compressors that are not widely supported.

As N5 stores attributes and metadata in the same JSON object, there are some keys that are simply not allowed to be attributes as they would conflict with the metadata (and possibly corrupt the data if they were written in). The `N5Store` correctly raises for these cases, but this was not being tested previously. Here we add a test to cover this case.

Check that valid `fill_values` work with N5Store-backed arrays and that invalid `fill_values` raise.

We are explicitly setting the `blocksize` to `0` internally. So this assertion is not a user-facing error. It would also be difficult to test from the API. As we set the `blocksize` to `0` in a different function, it does make sense to `assert` this here.

jakirkham · 2019-02-23T23:02:32Z

Alright have updated all user-facing exceptions to be something other than asserts. Mostly these are ValueErrors, but am happy to change them if others have thoughts on these. A couple are still asserts as they are actually internal checks.

Also played around with some sample data that @constantinpape and @hanslovsky have compiled in the zarr_implementations repo. This worked pretty well, but had a few issues around LZ4 data from N5 (have dropped LZ4 support from this PR). Otherwise was able to load all other N5 files generated by N5 or z5py. More details about how this was tested and the results found in this comment.

In my view, this code is ready to be merged. The LZ4 issues should be explored, but they can be placed in a separate issue. Would appreciate it if some other people did some final reviews of the PR.

Note: Previously this mentioned there were issues with XZ. Some recent commits have been pushed to support XZ as a special case of Numcodecs' LMZA, which works well. Also LZ4 support has been disabled in N5Store for now. Raised issue ( zarr-developers/numcodecs#175 ) about the LZ4 differences.

Numcodecs' LZMA compressor is able to handle XZ compressed data as a special case. This maps XZ compression in N5 to Zarr's LZMA compression with the proper arguments as well as maps Zarr's LZMA support back to N5's XZ support if the right options are set.

Adds some tests for XZ support in N5 using Numcodecs' LZMA compressor with specific options. Also tests LZMA options that are not currently supported by N5 to ensure they still warn and are handled correctly.

There is not a clear mapping between LZ4 in the Zarr and N5 implementations at present. So this drops LZ4 from the supported codecs in `N5Store` for now.

constantinpape · 2019-02-24T10:28:35Z

As far as I understand it this is not fully compatible with the n5 format yet,
because it doesn't resize / clip edge chunks. This will make datasets whose edges don't align with chunks unreadable in n5-java (and for that matter also in z5py). See also the example by @hanslovsky from above.
If I remember correctly, @funkey and @axtimwalde were discussing if this should be changed in the n5 spec, but until this is decided, maybe delay merging this?

axtimwalde · 2019-02-24T18:14:00Z

@constantinpape your're right that this only implements a subset of n5, i.e. it does not store arbitrary block-sizes in any case and can not deal with multiset types or other varlength based tricks. As such, however, it is already useful. N5 has no spec that requires blocks to have any specific size. N5-ImgLib2 until recently expected that trailing blocks are cropped to size but I have changed this there saalfeldlab/n5-imglib2@dd26251 because this seemed to be easier than to support variable size blocks in zarr. The corresponding N5-ImgLib2 release is 3.2.0 https://github.com/saalfeldlab/n5-imglib2/tree/3.2.0.

jakirkham · 2019-02-24T19:16:10Z

That's very cool! Thanks for sharing that, @axtimwalde. Is N5-ImgLib2 the primary way you are reading in N5 files or are there other libraries we should also be thinking about?

I understand some things are still not supported, @constantinpape, but in my view this is a first pass, which will be easier to iterate on once merged. Raising issues for additional functionality is always possible and can be more easily pursued in parallel once this is integrated.

Also have talked to several people already who are pip installing this branch as opposed to a normal Zarr release. While it's nice that they are able to get started so easily, this is not great from a support perspective. For instance they may be missing other bug fixes that are in master or they may be getting a different product each time they (or their colleagues) install. A release would really help set a baseline for how things work going forward.

alimanfoo · 2019-02-24T19:41:48Z

FWIW I have no objections to this getting released as an "experimental feature" which is how we've been labelling new stuff that may need API changes or other fixes following broader use. Another question I've been meaning to raise is whether support for N5 should be implemented as a transformation layer over another store, rather than as something that interacts directly with the file system. If it was implemented just as a transformation layer, then passed through to an inner store, it could be immediately used with cloud object stores as well as file systems. If there is any existing N5 data in the cloud then wondering if this might be of value to the community. This could also be tackled after release so no objections to releasing as-is.

…

On Sun, 24 Feb 2019, 19:16 jakirkham, ***@***.***> wrote: That's very cool! Thanks for sharing that, @axtimwalde <https://github.com/axtimwalde>. Is N5-ImgLib2 the primary way you are reading in N5 files or are there other libraries we should also be thinking about? I understand some things are still not supported, @constantinpape <https://github.com/constantinpape>, but in my view this is a first pass, which will be easier to iterate on once merged. Raising issues for additional functionality is always possible and can be more easily pursued in parallel once this is integrated. Also have talked to several people already who are pip installing this branch as opposed to a normal Zarr release. While it's nice that they are able to get started so easily, this is not great from a support perspective. For instance they may be missing other bug fixes that are in master or they may be getting a different product each time they (or their colleagues) install. A release would really help set a baseline for how things work going forward. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#309 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAq8QnHgfnzwgoWoqLA8J_C8Br4OT-MOks5vQuT6gaJpZM4XvU9Z> .

axtimwalde · 2019-02-24T23:38:56Z

N5-ImgLib2 is the preferred way to load actual tensor data, but we also use the core block loading/ storing API directly, e.g. mutliset pixels are still elsewhere, Paintera uses it for indexing and agglomeration.

Thanks to @alimanfoo for the support to add this as an experimental feature. I also like the idea to decouple the backend from access API. I do indeed have a bunch of N5 containers sit on Google Cloud, and our stitching pipeline https://github.com/saalfeldlab/stitching-spark is meant to run on and load from and export to AWS.

constantinpape · 2019-02-25T08:12:47Z

I understand some things are still not supported, @constantinpape, but in my view this is a first pass, which will be easier to iterate on once merged. Raising issues for additional functionality is always possible and can be more easily pursued in parallel once this is integrated.

Fully agree. My concern was that this would not be compatible with core n5-java for edge chunks that do not align. @axtimwalde has clarified that this is not the case, so no objections to merge from my side.

jakirkham · 2019-02-25T08:20:23Z

Thanks for the info. Is it pretty easy for the other libraries to use the same edge chunk handling strategy?

Thanks @alimanfoo. Have already added some notes to the docs explaining this is experimental and subject to change.

The transformational layer is probably useful for a few cases and does sound interesting. That said, I lack the time to pursue this personally and am guessing @funkey is similarly busy. Could I suggest that you raise a new issue with some more info about what you have in mind? Maybe we can discuss this a bit more after this is merged.

alimanfoo · 2019-02-25T18:14:50Z

The transformational layer is probably useful for a few cases and does sound interesting. That said, I lack the time to pursue this personally and am guessing @funkey is similarly busy. Could I suggest that you raise a new issue with some more info about what you have in mind? Maybe we can discuss this a bit more after this is merged.

SGTM, have raised zarr-developers/n5py#9, hopefully that makes sense but happy to elaborate.

jakirkham · 2019-02-26T08:08:39Z

Thanks everyone! 🍾 🎉 😄

As it sounds like we are ok with this as is for now, have gone ahead and merged it. If you encounter any problems, please feel free to raise an issue and we can discuss how best to address it.

axtimwalde · 2019-02-26T15:04:25Z

Thanks a lot everybody!

funkey added 29 commits October 13, 2018 17:12

Add N5Store for paths ending in '.n5'

92eab8c

Add N5 chunk headers via N5ChunkWrapper as a codec

8f1d721

Add compressor support to N5ChunkWrapper

f21ae8e

Invert axis order for dimensions and blockSize N5 attributes

c74c79f

Convert raw chunks to big endian in N5ChunkWrapper

38675fe

Consolidate compressor conversion according to N5 spec 1.0.0

89b95dd

https://github.com/saalfeldlab/n5/blob/45cb33ef6bd97fe6e872765a0bcfcc9e2a04591e/README.md

Always change byte-order to big endian in N5ChunkWrapper

89ca5c9

Cleanup unused code fragments in n5.py

deb8c2d

Transparently inject N5ChunkWrapper in array meta data

1e84925

Map array attributes fill_value, order, and filters

e12e32b

Use numpy's tobytes for raw encoding in N5ChunkWrapper

0e416f4

Add gzip compressor support to N5Storage

cba3e56

This requires numcodecs GZip support to be compatible with the N5 standard. Added with 410af66be0ea470d77b923e516d63cf5238114db to numcodecs.

Support N5's partial chunks at dataset boundaries

719bf5b

Add unit test stub for N5Store

59cb72e

Comply with zarr meta key names in N5Store

1534e0c

Overwrite __delitem__ and listdir in N5Store

a4939e7

Allow fill_value==None in N5Store

ae95a38

Allow compressor_config to be None in N5Store

a6a6c15

Support older np.byteswap API

cdffeb7

Expose N5Store in zarr top-level package

2557661

Refactor attribute conversion between zarr and N5

f04eaf7

Delegate __getitem__ to parent class for unrecognized keys

f471e7d

Fix runtime errors in N5Store

3abf68c

Invert coords only for chunk keys in N5Store

d22aabb

Add no cover pragmas to N5Store

dd07cc2

Use _load_n5_attrs where possible in N5Store

2a17ae2

Remove conditional that is always true in N5Store

08b11a7

Add tests for N5Store

ca4c516

Add N5Store entry to release notes

82c7bc3

jakirkham reviewed Oct 18, 2018

View reviewed changes

zarr/tests/test_storage.py Outdated Show resolved Hide resolved

jakirkham added 7 commits February 23, 2019 15:49

Drop unused logging import

86f5f82

Check N5Store RuntimeWarnings for some compressors

8548f30

Make sure that `N5Store` is raising `RuntimeWarnings` for some compressors that are not widely supported.

Limit RuntimeWarning check to array creation

ca17069

Test fill_values with N5Store-backed arrays

c711246

Check that valid `fill_values` work with N5Store-backed arrays and that invalid `fill_values` raise.

jakirkham approved these changes Feb 23, 2019

View reviewed changes

jakirkham mentioned this pull request Feb 23, 2019

2.3 release planning #339

Closed

jakirkham added 2 commits February 23, 2019 19:03

Test XZ support and update LZMA for N5Store arrays

ba1298b

Adds some tests for XZ support in N5 using Numcodecs' LZMA compressor with specific options. Also tests LZMA options that are not currently supported by N5 to ensure they still warn and are handled correctly.

jakirkham force-pushed the master branch from 3bb30e0 to ba1298b Compare February 24, 2019 00:40

Remove LZ4 support for N5

ed467e8

There is not a clear mapping between LZ4 in the Zarr and N5 implementations at present. So this drops LZ4 from the supported codecs in `N5Store` for now.

jakirkham mentioned this pull request Feb 24, 2019

LZ4 in N5 vs Zarr zarr-developers/numcodecs#175

Open

Merge 'zarr-developers/master' into 'funkey/master'

08009f5

alimanfoo mentioned this pull request Feb 25, 2019

Implement N5Store as a transformation layer over other stores zarr-developers/n5py#9

Open

jakirkham merged commit a780691 into zarr-developers:master Feb 26, 2019

jakirkham added this to the v2.3 milestone Feb 26, 2019

jakirkham mentioned this pull request Feb 26, 2019

Encode/Decode complex fill values #363

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add N5 Support #309

Add N5 Support #309

funkey commented Oct 18, 2018 •

edited by jakirkham

Loading

jakirkham commented Feb 23, 2019 •

edited

Loading

constantinpape commented Feb 24, 2019

axtimwalde commented Feb 24, 2019

jakirkham commented Feb 24, 2019

alimanfoo commented Feb 24, 2019 via email

axtimwalde commented Feb 24, 2019

constantinpape commented Feb 25, 2019 •

edited

Loading

jakirkham commented Feb 25, 2019

alimanfoo commented Feb 25, 2019

jakirkham commented Feb 26, 2019

axtimwalde commented Feb 26, 2019

Add N5 Support #309

Add N5 Support #309

Conversation

funkey commented Oct 18, 2018 • edited by jakirkham Loading

jakirkham commented Feb 23, 2019 • edited Loading

constantinpape commented Feb 24, 2019

axtimwalde commented Feb 24, 2019

jakirkham commented Feb 24, 2019

alimanfoo commented Feb 24, 2019 via email

axtimwalde commented Feb 24, 2019

constantinpape commented Feb 25, 2019 • edited Loading

jakirkham commented Feb 25, 2019

alimanfoo commented Feb 25, 2019

jakirkham commented Feb 26, 2019

axtimwalde commented Feb 26, 2019

funkey commented Oct 18, 2018 •

edited by jakirkham

Loading

jakirkham commented Feb 23, 2019 •

edited

Loading

constantinpape commented Feb 25, 2019 •

edited

Loading