Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using JPEG2000 for chunk compression #73

Open
jmswaney opened this issue Apr 14, 2018 · 32 comments
Open

Using JPEG2000 for chunk compression #73

jmswaney opened this issue Apr 14, 2018 · 32 comments
Labels
enhancement New codec Suggestion for a new codec

Comments

@jmswaney
Copy link

I've been using chunk compressed Zarr arrays for some neuroscience image processing tasks, and it's been great so far. However, JPEG2000 might perform better than lz4 or Zstd for my images. I'd like to use Zarr to handle the image chunking with a JPEG2000 compressor, but I'm not sure if this is possible. I realize that this feature isn't as general as numcodecs would want, but I'm mostly asking what the steps would be to see if I should even try.

@alimanfoo
Copy link
Member

alimanfoo commented Apr 14, 2018 via email

@jakirkham
Copy link
Member

Adding a JPEG2000 compression filter would be great. Know others use this compression for image data as well.

FWIW we made some changes described in this comment, which should make wrapping a compressor pretty simple. Feel free to ask questions if you need any help.

@ryan-williams
Copy link
Member

ryan-williams commented Feb 27, 2020

@joshmoore, @jakirkham, and I looked into this for a while today.

@sofroniewn described how pyramidal-image support (cf. zarr-developers/zarr-specs#23) is implemented in napari:

I have a zarr pyramid on s3://sofroniewn/image-data/camelyon16/ which came from https://camelyon16.grand-challenge.org/Data/ (there is a google drive with tiff if you poke around)

Each resolution-level is a sibling Dataset in a containing Zarr Group. Napari loads each resolution level as a Dask Array, and changes which resolution level it pulls chunks from based on the user's zoom level.

That process works pretty well today, and longer-term we'd like to clean up and factor that pyramiding code out of Napari (which could have a cleaner interface to it, in addition to benefitting from more general community support of pyramiding).

Napari's main pain point is that the Zarr pyramids are e.g. 60x larger on disk than the pyramidal TIFF files that they originated as. The Zarr pyramids use Zarr's default Blosc compressor codec (which is likely bad at compressing image data), while the original TIFFs likely use JPEG2000 (which is quite good), so we think adding a JEPG2000 codec to numcodecs, and having Napari use that, will solve Napari's main issue with its Zarr pyramids.

@jakirkham started prototying a JPEG2000 codec today; a nice thing is that the Codec interface receives an ndarray as input (we originally thought it only received a BytesLike, which would be hard to reconstruct image dimensions from, which JPEG2000 would need). One caveat is that filters can't be applied before the JPEG2000 codec, bc then the latter would actually just receive a BytesLike; raiseing seems appropriate in this situation.

Otherwise, we just need a good python binding to a JPEG2000 codec. imageio, imagecodecs, and glymur were looked at. There were a mix of concerns about dependencies / installation hassle as well as API semantics (we need something shaped like Buffer ⇒ BytesLike not PathLike ⇒ PathLike).

Dependency concerns could be mitigated by adding a pip qualifier (e.g. pip install numcodecs[jpeg]), and some light fork to expose in-memory access to the one of those projects could be undertaken, if necessary.

@cgohlke
Copy link

cgohlke commented Feb 27, 2020

Imagecodecs includes a bytes<->numpy encoder and decoder for JPEG200 based on the OpenJPEG library. I think it should be relatively easy to take the Cython code out of imagecodecs (BSD licensed) and adapt it for numcodecs.

@jakirkham
Copy link
Member

Thanks Christoph! 😄

Using that I wrote the following. This seems like what we would want for a first pass.

from numcodecs.abc import Codec
from numcodecs.compat import ensure_ndarray
from numcodecs.registry import register_codec

from imagecodecs import jpeg2k_encode, jpeg2k_decode


class JPEG2000(Codec):
    codec_id = "JPEG2000"

    def encode(self, buf):
        return jpeg2k_encode(ensure_ndarray(buf))

    def decode(self, buf):
        return jpeg2k_decode(ensure_ndarray(buf))


register_codec(JPEG2000)

This works for encoding. However we have an issue on decoding. Maybe there's something I'm missing above? 🙂

---------------------------------------------------------------------------
Jpeg2kError                               Traceback (most recent call last)
<ipython-input-6-7d1c93c78b4f> in <module>
----> 1 c.decode(c.encode(a))

<ipython-input-1-01776c52a8bc> in decode(self, buf)
     13 
     14     def decode(self, buf):
---> 15         return jpeg2k_decode(ensure_ndarray(buf))
     16 
     17 

imagecodecs/_jpeg2k.pyx in imagecodecs._jpeg2k.jpeg2k_decode()

Jpeg2kError: opj_read_header failed

@cgohlke
Copy link

cgohlke commented Feb 27, 2020

I didn't try to reproduce this yet, but it looks like this simple roundtrip should work if the output of ensure_ndarray(buf) can be cast to uint8_t[::1] by Cython, which appears to be the case since otherwise the detection of the codecformat would likely fail. Please try passing the buf bytes directly to jpeg2k_decode and enable OpenJPEG error handling and warnings with verbose=3. What is the shape and dtype of the input a?

@jakirkham
Copy link
Member

Thanks Christoph!

Yeah was wondering about that too. So had tried with and without ensure_ndarray just in case, but got the same error. Either way the data provided to jpeg2k_decode was something that could be cast to uint8_t[::1] as it was just the output of jpeg2k_encode.

Sure let me provide a clear MRE.

Sorry if I missed something, but how do we set the verbosity?

@jakirkham
Copy link
Member

Here's an MRE showing what I'm seeing. Happy to play with this more (adding verbosity and such) as is helpful 🙂

In [1]: import numpy as np                                                      

In [2]: a = np.arange(6, dtype="u4").reshape(2, 3)                              

In [3]: a                                                                       
Out[3]: 
array([[0, 1, 2],
       [3, 4, 5]], dtype=uint32)

In [4]: from imagecodecs import jpeg2k_encode, jpeg2k_decode                    

In [5]: b = jpeg2k_encode(a)                                                    

In [6]: b                                                                       
Out[6]: bytearray(b'\x00\x00\x00\x0cjP  \r\n\x87\n\x00\x00\x00\x14ftypjp2 \x00\x00\x00\x00jp2 \x00\x00\x00-jp2h\x00\x00\x00\x16ihdr\x00\x00\x00\x02\x00\x00\x00\x03\x00\x01\x1f\x07\x00\x00\x00\x00\x00\x0fcolr\x01\x00\x00\x00\x00\x00\x11\x00\x00\x00\x89jp2c\xffO\xffQ\x00)\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x1f\x01\x01\xffR\x00\x0c\x00\x00\x00\x01\x00\x00\x04\x04\x00\x01\xff\\\x00\x04@\x00\xffd\x00%\x00\x01Created by OpenJPEG version 2.3.1\xff\x90\x00\n\x00\x00\x00\x00\x00\x17\x00\x01\xff\x93\xc0\x00\x00\x00\xf8C\x0fwv\xff\xd9')

In [7]: len(b)                                                                  
Out[7]: 214

In [8]: jpeg2k_decode(b)                                                        
---------------------------------------------------------------------------
Jpeg2kError                               Traceback (most recent call last)
<ipython-input-8-d3265f5af6b1> in <module>
----> 1 jpeg2k_decode(b)

imagecodecs/_jpeg2k.pyx in imagecodecs._jpeg2k.jpeg2k_decode()

Jpeg2kError: opj_read_header failed

@cgohlke
Copy link

cgohlke commented Feb 27, 2020

I see: dtype=uint32. While JPEG 2000 supports 32 and 64 bit integers (up to 38 bits), OpenJPEG doesn't. I obviously never fully tested these cases, only 8 and 16 bit. You can get the OpenJPEG warnings and errors as follows:

>>> b = jpeg2k_encode(a, verbose=3)
JPEG2K info: tile number 1 / 1
>>> jpeg2k_decode(b, verbose=3)
JPEG2K info: Start to read j2k main header (85).
imagecodecs._jpeg2k.Jpeg2kError: Invalid values for comp = 0 : prec=32 (should be between 1 and 38 according to the JPEG2000 norm. OpenJpeg only supports up to 31)
Exception ignored in: 'imagecodecs._jpeg2k.j2k_error_callback'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
imagecodecs._jpeg2k.Jpeg2kError: Invalid values for comp = 0 : prec=32 (should be between 1 and 38 according to the JPEG2000 norm. OpenJpeg only supports up to 31)
imagecodecs._jpeg2k.Jpeg2kError: Marker handler function failed to read the marker segment
Exception ignored in: 'imagecodecs._jpeg2k.j2k_error_callback'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
imagecodecs._jpeg2k.Jpeg2kError: Marker handler function failed to read the marker segment
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "imagecodecs\_jpeg2k.pyx", line 390, in imagecodecs._jpeg2k.jpeg2k_decode
imagecodecs._jpeg2k.Jpeg2kError: opj_read_header failed

Not sure why OpenJPEG doesn't throw an error in jpeg2k_encode. Maybe OpenJPEG does create a valid JPEG 2000 stream, but can't decode it...

@cgohlke
Copy link

cgohlke commented Feb 27, 2020

Another thought: since this issue is about efficiently compressing image data, you might want to have a look at JPEG-LS via the CharLS library. There's also JPEG-XR (used commonly in CZI files), which also handles float32, but the jxrlib library is not so nice to work with. Imagecodes supports both, but I never benchmarked the codecs/implementations. None of these formats support 32 or 64 bit integers.

@jakirkham
Copy link
Member

Ah ok. So this is bad usage on my part. Thanks for clarifying Christoph! Should there be an error if the user supplies an unsupported type or are there situations where this might work?

Great, thanks for the suggestions. Will check those out too. Yeah this is mostly about compression. Just trying to think what makes a reasonably generic/useful compressor here (given the different type support of these). Do you have any thoughts on this? 🙂

@LeeKamentsky
Copy link

If this issue needs a champion, I think I can make a case for taking on this work (@jmswaney above was part of our lab). @jakirkham I'm not sure if you have a branch going that I should contribute to - if not, I'm fine with restarting. Regarding support for 32 and 64 bit integers, we also would only use 8 and 16 bits, so pragmatically, I'd vote for disallowing 32 and 64 bit integers early on in the encoding process and by trying to detect failures due to 32 and 64 bit integers in decoding and reporting them.

If that plan seems workable, I'll go ahead and start work towards the goal of a pull request.

@LeeKamentsky
Copy link

My use case for JPEG2000 is grayscale 3D stacks of JPEG2000 planes and my plan was to JPEG2000-encode each of the planes separately (and for 4 and 5D, stack over the first dimensions and encode over the last 2 dimensions), but an alternate would be to interpret arrays with 3 axes as Y, X and color if the size of the last axis was 3 (RGB) and encode as a color image.

My gut tells me to avoid a heuristic that operates differently depending on the size of the last dimension and encode what might be a color image as three grayscale planes. This also has a side-effect of not requiring the LCMS library (see https://github.com/cgohlke/imagecodecs/tree/master/3rdparty/openjpeg in imagecodecs) which simplifies the build.

I'd appreciate any feedback.

@jakirkham
Copy link
Member

Thanks for offering to help here Lee! 😄

Unfortunately I don't have an existing branch, but I think the code in comment ( #73 (comment) ) should be a good starting point and likely pretty close to what we need here. So would see if you can get that to run and go from there. Please let us know if you have any questions 🙂

@jakirkham
Copy link
Member

Looks like @d-v-b did some work on a JPEG codec ( https://github.com/d-v-b/zarr-jpeg ). Not sure if JPEG2000 is considered there as well

@LeeKamentsky
Copy link

I think it's not. I started work on the codec, have had to pause it recently.

@martindurant
Copy link
Member

Does imagecodecs.numcodecs.register_codecs() suffice now to cover needs here?

@jakirkham
Copy link
Member

^ @d-v-b @LeeKamentsky @joshmoore

@d-v-b
Copy link
Contributor

d-v-b commented Jul 27, 2021

For my own purposes the codec registration api in numcodecs sufficed perfectly

@joshmoore
Copy link
Member

@d-v-b : just to clarify, you mean numcodecs API worked for you, not imagecodecs, right?

Thinking through some of the recent conversations with @DennisHeimbigner, if we're going to lean on imagecodecs for JPEG2000 support, we may want to go about defining an ID for it in this repo a la #278

cc: @cgohlke

@martindurant
Copy link
Member

Not a bad idea, but imagecodecs does already provide unambiguous numcodecs IDs for all the classes it registers - I would not suggest changing them (although adding aliases would be fine).

The current list I get in my installation:

['imagecodecs_aec',
 'imagecodecs_avif',
 'imagecodecs_bitorder',
 'imagecodecs_bitshuffle',
 'imagecodecs_blosc',
 'imagecodecs_brotli',
 'imagecodecs_bz2',
 'imagecodecs_deflate',
 'imagecodecs_delta',
 'imagecodecs_float24',
 'imagecodecs_floatpred',
 'imagecodecs_gif',
 'imagecodecs_jpeg',
 'imagecodecs_jpeg2k',
 'imagecodecs_jpegls',
 'imagecodecs_jpegxr',
 'imagecodecs_lerc',
 'imagecodecs_ljpeg',
 'imagecodecs_lz4',
 'imagecodecs_lz4f',
 'imagecodecs_lzf',
 'imagecodecs_lzma',
 'imagecodecs_lzw',
 'imagecodecs_packbits',
 'imagecodecs_png',
 'imagecodecs_snappy',
 'imagecodecs_tiff',
 'imagecodecs_webp',
 'imagecodecs_xor',
 'imagecodecs_zfp',
 'imagecodecs_zlib',
 'imagecodecs_zopfli',
 'imagecodecs_zstd']

@d-v-b
Copy link
Contributor

d-v-b commented Aug 3, 2021

@d-v-b : just to clarify, you mean numcodecs API worked for you, not imagecodecs, right?

Correct, I defined a jpeg compressor and registered it with the numcodecs register_codec function.

I should add that there's complexity involved in compressing 3D+ data with 2D codecs. You will almost certainly want to generate a 2D tiled version of the ND data, and compress that, but this requires codec metadata that defines the ND -> 2D transformation. I have not implemented this to my satisfaction.

@chris-allan
Copy link

Hello all!

Based on the initial investigations of @cgohlke and @jakirkham on this thread along with some of our own by @muhanadz we have released, heavily inspired by the existing work from @d-v-b, a Zarr JPEG-2000 codec using imagecodecs and by extension OpenJPEG:

Any and all feedback welcome!

Similar to the discussion on d-v-b/zarr-jpeg#1, our primary motivation for the codec is the compression of interleaved RGB bright-field whole slide imaging data.

@jakirkham
Copy link
Member

@martindurant what would one need to do add an entrypoint to use zarr-jpeg2k above?

@joshmoore
Copy link
Member

An entrypoint needs to be registered roughly of the form:

[numcodecs.codecs]
jpeg2k = zarr_jpeg2k.zarr_jpeg2k:jpeg2k

@martindurant
Copy link
Member

I read further up the thread and deleted my comment...

I am a little confused. Why is there a different package for jpeg2k as a numcodecs codec, which calls imagecodecs, when imagecodecs already has one? All the codecs there can be registered with numcodecs by calling imagecodecs.numcodecs.register_codecs(). We just need a PR there to add the entrypoints, I'm sure it would be accepted. Perhaps when the conversation above happened, imagecodecs had not yet progressed as far.

@cgohlke
Copy link

cgohlke commented Oct 31, 2022

All the codecs there can be registered with numcodecs by calling imagecodecs.numcodecs.register_codecs(). We just need a PR there to add the entrypoints, I'm sure it would be accepted. Perhaps when the conversation above happened, imagecodecs had not yet progressed as far.

For the time being I decided to distribute the numcodecs entry points as a separate package:
https://pypi.org/project/imagecodecs-numcodecs/#files.

@martindurant
Copy link
Member

That sounds reasonable, @cgohlke . Unfortunately, it doesn't have a conda package.

@rahedges
Copy link

It looks like the work on integrating jpeg2000 was abandoned. Is there any progress on this I'm missing? This is the only numcodecs thread I found related to this work.

@martindurant
Copy link
Member

jpeg2000 is included in imagecodecs, which has numcodecs wrappers

@rahedges
Copy link

Thanks. I guess I missed that in the docs. I'm trying to figure out how to use j2k as the compression scheme in a zarr file.

@martindurant
Copy link
Member

I believe so long as you have https://pypi.org/project/imagecodecs-numcodecs/ installed, "imagecodecs_jpeg2k" wll an available codec without further effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New codec Suggestion for a new codec
Projects
None yet
Development

No branches or pull requests