Skip to content

Lossy JPEG compression for 8 bit arrays stored with zarr

Notifications You must be signed in to change notification settings

d-v-b/zarr-jpeg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zarr-jpeg

About

Enable JPEG encoding of zarr / n5 chunks. Only tested with 2D or 3D uint8 numeric data. This encoding is lossy. Because jpeg encoding only works on 2D arrays, this implementation "planarizes" 3D data before encoding by concatenating the first and second axes of the data using np.reshape, which does not incur a data copy. I.e., if one wishes to encode an array with shape = (4,5,6), this array is reshaped to (20,6) before compression; The inverse procedure occurs when decoding data. Because of this planarization procedure, you will get the best compression ratio when the spatial correlation of your data is highest in the space defined by the last two axes. To put it another way, for best compression results ensure that your intensity values vary the most along the first axis of your data.

Usage

Stand-alone:

from zarr_jpeg import jpeg
import numpy as np
data = np.random.randint((100,255), (100,100,100), dtype='uint8')
codec = jpeg(quality=100)
encoded = codec.encode(data)
# on decoding, the original shape is unknown, so we have to reshape ourselves
decoded = codec.decode(encoded).reshape(data.shape)

With zarr:

from zarr_jpeg import jpeg
import zarr
array = zarr.open_array('foo/bar.zarr', path='path/to/array', compressor=jpeg(quality=50), shape=(100,100,100), dtype='uint8')

Note that if an image has more than two dimensions then all but the last dimension are collapsed together to make a two-dimensional image to be encoded. For example, an image with shape (10, 200, 3000) is encoded as the shape (2000, 3000). However the collapsing can be suppressed with:

codec = jpeg(quality=100, axis_reduction=None)

Alternatively, the collapsing can be specified explicitly. For example:

codec = jpeg(quality=100, axis_reduction=[[0], [1, 2], [3, 4, 5]])

reshapes the shape (2, 3, 4, 5, 6, 7) to (2, 12, 210).

References

This repo is inspired by the neuroglancer "precomputed" format, which uses jpeg encoding to compress chunks of imaging data.

About

Lossy JPEG compression for 8 bit arrays stored with zarr

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages