-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using JPEG2000 for chunk compression #73
Comments
Hi Justin, the general approach to implementing a new compression codec is
to sub-class the numcodecs.Codec class and implement the methods encode(),
decode(), get_config(), from_config(), and also the codec_id attribute.
Docs here: http://numcodecs.readthedocs.io/en/stable/abc.html
To use the codec with zarr you need to register it with a call to
numcodecs.register_codec(cls). That just sets up the mapping from codec ID
to codec class. Docs here:
http://numcodecs.readthedocs.io/en/stable/registry.html
In terms of implementation, any of the existing codec classes is worth
looking at as an example. If you need to interface with external C code
then there's various options. The existing codecs like Zstd, LZ4 and Blosc
use Cython but there's other ways to do it.
I don't know anything about JPEG encoding but very happy to learn more if
you find it useful.
…On Sat, 14 Apr 2018, 16:53 Justin Swaney, ***@***.***> wrote:
I've been using chunk compressed Zarr arrays for some neuroscience image
processing tasks, and it's been great so far. However, JPEG2000 might
perform better than lz4 or Zstd for my images. I'd like to use Zarr to
handle the image chunking with a JPEG2000 compressor, but I'm not sure if
this is possible. I realize that this feature isn't as general as numcodecs
would want, but I'm mostly asking what the steps would be to see if I
should even try.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#73>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qq_K5se-4OPYJf_6uBTsmYRqJVoqks5tohuJgaJpZM4TVHh0>
.
|
Adding a JPEG2000 compression filter would be great. Know others use this compression for image data as well. FWIW we made some changes described in this comment, which should make wrapping a compressor pretty simple. Feel free to ask questions if you need any help. |
@joshmoore, @jakirkham, and I looked into this for a while today. @sofroniewn described how pyramidal-image support (cf. zarr-developers/zarr-specs#23) is implemented in napari:
Each resolution-level is a sibling Dataset in a containing Zarr Group. Napari loads each resolution level as a Dask Array, and changes which resolution level it pulls chunks from based on the user's zoom level. That process works pretty well today, and longer-term we'd like to clean up and factor that pyramiding code out of Napari (which could have a cleaner interface to it, in addition to benefitting from more general community support of pyramiding). Napari's main pain point is that the Zarr pyramids are e.g. 60x larger on disk than the pyramidal TIFF files that they originated as. The Zarr pyramids use Zarr's default Blosc compressor codec (which is likely bad at compressing image data), while the original TIFFs likely use JPEG2000 (which is quite good), so we think adding a JEPG2000 codec to numcodecs, and having Napari use that, will solve Napari's main issue with its Zarr pyramids. @jakirkham started prototying a JPEG2000 codec today; a nice thing is that the Otherwise, we just need a good python binding to a JPEG2000 codec. imageio, imagecodecs, and glymur were looked at. There were a mix of concerns about dependencies / installation hassle as well as API semantics (we need something shaped like Dependency concerns could be mitigated by adding a pip qualifier (e.g. |
Imagecodecs includes a bytes<->numpy encoder and decoder for JPEG200 based on the OpenJPEG library. I think it should be relatively easy to take the Cython code out of imagecodecs (BSD licensed) and adapt it for numcodecs. |
Thanks Christoph! 😄 Using that I wrote the following. This seems like what we would want for a first pass. from numcodecs.abc import Codec
from numcodecs.compat import ensure_ndarray
from numcodecs.registry import register_codec
from imagecodecs import jpeg2k_encode, jpeg2k_decode
class JPEG2000(Codec):
codec_id = "JPEG2000"
def encode(self, buf):
return jpeg2k_encode(ensure_ndarray(buf))
def decode(self, buf):
return jpeg2k_decode(ensure_ndarray(buf))
register_codec(JPEG2000) This works for encoding. However we have an issue on decoding. Maybe there's something I'm missing above? 🙂 ---------------------------------------------------------------------------
Jpeg2kError Traceback (most recent call last)
<ipython-input-6-7d1c93c78b4f> in <module>
----> 1 c.decode(c.encode(a))
<ipython-input-1-01776c52a8bc> in decode(self, buf)
13
14 def decode(self, buf):
---> 15 return jpeg2k_decode(ensure_ndarray(buf))
16
17
imagecodecs/_jpeg2k.pyx in imagecodecs._jpeg2k.jpeg2k_decode()
Jpeg2kError: opj_read_header failed |
I didn't try to reproduce this yet, but it looks like this simple roundtrip should work if the output of |
Thanks Christoph! Yeah was wondering about that too. So had tried with and without Sure let me provide a clear MRE. Sorry if I missed something, but how do we set the verbosity? |
Here's an MRE showing what I'm seeing. Happy to play with this more (adding verbosity and such) as is helpful 🙂 In [1]: import numpy as np
In [2]: a = np.arange(6, dtype="u4").reshape(2, 3)
In [3]: a
Out[3]:
array([[0, 1, 2],
[3, 4, 5]], dtype=uint32)
In [4]: from imagecodecs import jpeg2k_encode, jpeg2k_decode
In [5]: b = jpeg2k_encode(a)
In [6]: b
Out[6]: bytearray(b'\x00\x00\x00\x0cjP \r\n\x87\n\x00\x00\x00\x14ftypjp2 \x00\x00\x00\x00jp2 \x00\x00\x00-jp2h\x00\x00\x00\x16ihdr\x00\x00\x00\x02\x00\x00\x00\x03\x00\x01\x1f\x07\x00\x00\x00\x00\x00\x0fcolr\x01\x00\x00\x00\x00\x00\x11\x00\x00\x00\x89jp2c\xffO\xffQ\x00)\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x1f\x01\x01\xffR\x00\x0c\x00\x00\x00\x01\x00\x00\x04\x04\x00\x01\xff\\\x00\x04@\x00\xffd\x00%\x00\x01Created by OpenJPEG version 2.3.1\xff\x90\x00\n\x00\x00\x00\x00\x00\x17\x00\x01\xff\x93\xc0\x00\x00\x00\xf8C\x0fwv\xff\xd9')
In [7]: len(b)
Out[7]: 214
In [8]: jpeg2k_decode(b)
---------------------------------------------------------------------------
Jpeg2kError Traceback (most recent call last)
<ipython-input-8-d3265f5af6b1> in <module>
----> 1 jpeg2k_decode(b)
imagecodecs/_jpeg2k.pyx in imagecodecs._jpeg2k.jpeg2k_decode()
Jpeg2kError: opj_read_header failed |
I see:
Not sure why OpenJPEG doesn't throw an error in |
Another thought: since this issue is about efficiently compressing image data, you might want to have a look at JPEG-LS via the CharLS library. There's also JPEG-XR (used commonly in CZI files), which also handles float32, but the |
Ah ok. So this is bad usage on my part. Thanks for clarifying Christoph! Should there be an error if the user supplies an unsupported type or are there situations where this might work? Great, thanks for the suggestions. Will check those out too. Yeah this is mostly about compression. Just trying to think what makes a reasonably generic/useful compressor here (given the different type support of these). Do you have any thoughts on this? 🙂 |
If this issue needs a champion, I think I can make a case for taking on this work (@jmswaney above was part of our lab). @jakirkham I'm not sure if you have a branch going that I should contribute to - if not, I'm fine with restarting. Regarding support for 32 and 64 bit integers, we also would only use 8 and 16 bits, so pragmatically, I'd vote for disallowing 32 and 64 bit integers early on in the encoding process and by trying to detect failures due to 32 and 64 bit integers in decoding and reporting them. If that plan seems workable, I'll go ahead and start work towards the goal of a pull request. |
My use case for JPEG2000 is grayscale 3D stacks of JPEG2000 planes and my plan was to JPEG2000-encode each of the planes separately (and for 4 and 5D, stack over the first dimensions and encode over the last 2 dimensions), but an alternate would be to interpret arrays with 3 axes as Y, X and color if the size of the last axis was 3 (RGB) and encode as a color image. My gut tells me to avoid a heuristic that operates differently depending on the size of the last dimension and encode what might be a color image as three grayscale planes. This also has a side-effect of not requiring the LCMS library (see https://github.com/cgohlke/imagecodecs/tree/master/3rdparty/openjpeg in imagecodecs) which simplifies the build. I'd appreciate any feedback. |
Thanks for offering to help here Lee! 😄 Unfortunately I don't have an existing branch, but I think the code in comment ( #73 (comment) ) should be a good starting point and likely pretty close to what we need here. So would see if you can get that to run and go from there. Please let us know if you have any questions 🙂 |
Looks like @d-v-b did some work on a JPEG codec ( https://github.com/d-v-b/zarr-jpeg ). Not sure if JPEG2000 is considered there as well |
I think it's not. I started work on the codec, have had to pause it recently. |
Does |
For my own purposes the codec registration api in numcodecs sufficed perfectly |
@d-v-b : just to clarify, you mean numcodecs API worked for you, not imagecodecs, right? Thinking through some of the recent conversations with @DennisHeimbigner, if we're going to lean on imagecodecs for JPEG2000 support, we may want to go about defining an ID for it in this repo a la #278 cc: @cgohlke |
Not a bad idea, but imagecodecs does already provide unambiguous numcodecs IDs for all the classes it registers - I would not suggest changing them (although adding aliases would be fine). The current list I get in my installation:
|
Correct, I defined a jpeg compressor and registered it with the numcodecs I should add that there's complexity involved in compressing 3D+ data with 2D codecs. You will almost certainly want to generate a 2D tiled version of the ND data, and compress that, but this requires codec metadata that defines the ND -> 2D transformation. I have not implemented this to my satisfaction. |
Hello all! Based on the initial investigations of @cgohlke and @jakirkham on this thread along with some of our own by @muhanadz we have released, heavily inspired by the existing work from @d-v-b, a Zarr JPEG-2000 codec using imagecodecs and by extension OpenJPEG: Any and all feedback welcome! Similar to the discussion on d-v-b/zarr-jpeg#1, our primary motivation for the codec is the compression of interleaved RGB bright-field whole slide imaging data. |
@martindurant what would one need to do add an entrypoint to use zarr-jpeg2k above? |
An entrypoint needs to be registered roughly of the form:
|
I read further up the thread and deleted my comment... I am a little confused. Why is there a different package for jpeg2k as a numcodecs codec, which calls imagecodecs, when imagecodecs already has one? All the codecs there can be registered with numcodecs by calling |
For the time being I decided to distribute the numcodecs entry points as a separate package: |
That sounds reasonable, @cgohlke . Unfortunately, it doesn't have a conda package. |
It looks like the work on integrating jpeg2000 was abandoned. Is there any progress on this I'm missing? This is the only numcodecs thread I found related to this work. |
jpeg2000 is included in imagecodecs, which has numcodecs wrappers |
Thanks. I guess I missed that in the docs. I'm trying to figure out how to use j2k as the compression scheme in a zarr file. |
I believe so long as you have https://pypi.org/project/imagecodecs-numcodecs/ installed, "imagecodecs_jpeg2k" wll an available codec without further effort. |
I've been using chunk compressed Zarr arrays for some neuroscience image processing tasks, and it's been great so far. However, JPEG2000 might perform better than lz4 or Zstd for my images. I'd like to use Zarr to handle the image chunking with a JPEG2000 compressor, but I'm not sure if this is possible. I realize that this feature isn't as general as numcodecs would want, but I'm mostly asking what the steps would be to see if I should even try.
The text was updated successfully, but these errors were encountered: