Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for high bit depth multichannel images #1888

Open
wiredfool opened this issue May 5, 2016 · 39 comments · May be fixed by #8224
Open

Add support for high bit depth multichannel images #1888

wiredfool opened this issue May 5, 2016 · 39 comments · May be fixed by #8224

Comments

@wiredfool
Copy link
Member

wiredfool commented May 5, 2016

Pillow (and PIL) is currently able to open 8 bit per channel multi-channel images (such as RGB) but is able to open higher bit depth images (e.g. I16, I32, or Float32 images) if they are single channel (e.g., grayscale).

Previous References

This has been requested many times: #1828, #1885, #1839, #1602, and farther back.

Requirements

  • We should be able to support common GIS formats as well as high bit depth RGB(A) images.
  • At least 4 channels, but potentially more (see Add tests for opening 2-5 layer uint16 greyscale TIFFs #1839)
  • Different pixel formats, including I16, I32, and Float.
  • There should be definitions for the array interface to exchange images with numpy/scipy
  • There should be enough support to read and write TIFFs and raw image data.
  • Support for resize, crop, and convert operations at the very least.

Background Reference Info

The rough sequence for image loading is:

  • Image file is opened

  • Each of the ImagePlugin _accept functions have a chance to look at the first few bytes to determine if they should attempt to open the file

  • The *ImagePlugin._open method is called giving the image plugin a chance to read more of the image and determine if it still wants to consider it a valid image of it's particular type. If it does, it passes back a tile definition which includes a decoder and an image size.

  • If there is a successful _open call, at some point later *ImagePlugin._load may be called on the image, which runs the decoder producing a set of bytes in a raw mode. This is where things like compression are handled, but the output of the decoder is not necessarily what we're storing in our internal structures.

  • The image is unpacked (Unpack.c) from the raw mode (e.g. I16;BS) into a storage (Storage.c) mode (I).

  • It's now possible to operate on the image (e.g. crop, pixel access, etc)

    There are 3 (or 4) image data pointers, as defined in Imaging.h:

struct ImagingMemoryInstance {

    /* Format */
    char mode[IMAGING_MODE_LENGTH]; /* Band names ("1", "L", "P", "RGB", "RGBA", "CMYK", "YCbCr", "BGR;xy") */
    int type;       /* Data type (IMAGING_TYPE_*) */
    int depth;      /* Depth (ignored in this version) */
    int bands;      /* Number of bands (1, 2, 3, or 4) */
    int xsize;      /* Image dimension. */
    int ysize;

    /* Colour palette (for "P" images only) */
    ImagingPalette palette;

    /* Data pointers */
    UINT8 **image8; /* Set for 8-bit images (pixelsize=1). */
    INT32 **image32;    /* Set for 32-bit images (pixelsize=4). */

    /* Internals */
    char **image;   /* Actual raster data. */
    char *block;    /* Set if data is allocated in a single block. */

    int pixelsize;  /* Size of a pixel, in bytes (1, 2 or 4) */
    int linesize;   /* Size of a line, in bytes (xsize * pixelsize) */

    /* Virtual methods */
    void (*destroy)(Imaging im);
};

The only one that is guaranteed to be set is **image, which is an array of pointers to row data.

Changes Required

  • Definitions for all of the modes that we're planning, and potentially a [format];MB[#bands] style generic mode.

Core Imaging Structure

  • The imaging structure has the fields required to add the additional channels. (type, bands, pixelsize, linesize)
  • The **image pointer can be used for any width of pixel.
  • We may or may not want to set the **image32 pointer.
  • Currently type of IMAGING_TYPE_INT32 and IMAGING_TYPE_FLOAT32 imply 1 band. This will change.
  • Consider promoting int16 to IMAGING_TYPE_INT16

Storage

  • Updates to Storage.c, Unpack.c, Pack.c, Access.c, PyAccess.py, and Convert.c

Ways to Help

We need a better definition of the format requirements. What are the various types of images that are used in GIS, Medical, or other fields that we'd want to interpret? We need small, redistributable versions of images that we can test against.

[in progress]

@terramars
Copy link

I'm having the same problem with 16 bit single-channel paletted TIFFs, created by GDAL. It would be "really" nice if Pillow could play nicely with GIS and scientific image formats, as GDAL is a pain in the ass and I'd rather not use it.

tiffinfo as follows:

TIFFReadDirectory: Warning, Unknown field with tag 33550 (0x830e) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 33922 (0x8482) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 34735 (0x87af) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 34737 (0x87b1) encountered.
TIFFReadDirectory: Warning, Unknown field with tag 42113 (0xa481) encountered.
TIFF Directory at offset 0x34293c6 (54694854)
Image Width: 10774 Image Length: 12577
Bits/Sample: 16
Sample Format: unsigned integer
Compression Scheme: LZW
Photometric Interpretation: palette color (RGB from colormap)
Samples/Pixel: 1
Rows/Strip: 1
Planar Configuration: single image plane
Color Map: (present)
Tag 33550: 4.999617,4.999789,0.000000
Tag 33922: 0.000000,0.000000,0.000000,679006.067110,9955209.915048,0.000000
Tag 34735: 1,1,0,7,1024,0,1,1,1025,0,1,1,1026,34737,22,0,2049,34737,7,22,2054,0,1,9102,3072,0,1,32736,3076,0,1,9001
Tag 34737: WGS 84 / UTM zone 36S|WGS 84|
Tag 42113: 0
Predictor: horizontal differencing 2 (0x2)

@bodokaiser
Copy link

Any updates on this?

@wiredfool
Copy link
Member Author

Unfortunately, no.

@vfdev-5
Copy link

vfdev-5 commented Feb 20, 2018

@wiredfool what do you think about to add the support of multichannel images as sequence of Image ? For example, 4 channels image with uint16 is represented (more less equivalently) by
['<PIL.Image.Image image mode=I;16 size=... >', '<PIL.Image.Image image mode=I;16 size=...>', ..., '<PIL.Image.Image image mode=I;16 size=...>']. I mean by that, maybe, to provide a class inheriting from Image and tuple and override all method to work on a tuple of images... Sure that it looks like a hack, however it could unlock more features (and create issues :) ) at least while working with Image.fromarray.

@wiredfool
Copy link
Member Author

To do anything useful with it, we'd have to have support in the C layer, so it would have to be at the core imaging layer, and especially Unpack/Pack.

@vfdev-5
Copy link

vfdev-5 commented Feb 21, 2018

@wiredfool following your "Ways to help",

We need a better definition of the format requirements. What are the various types of images that are used in GIS, Medical, or other fields that we'd want to interpret?

For GIS, as there is a huge amount of different formats (for example, gdal format list), this can be left for GIS libraries as gdal, rasterio etc.
However, a support of Image.fromarray on input multi-channel (3,4,5,...) arrays of dtype np.uint16, np.float32 would be, imho, essential.

We need small, redistributable versions of images that we can test against.

For GIS imagery, this can be easily created manually with gdal, rasterio.

I would like to give a hand on this, so, feel free to ask me.

@edowson
Copy link

edowson commented Jun 7, 2018

PIL cannot handle processing multi-channel images. They get truncated to 3-ch images if you perform any transformation using PIL. #3160

@akinuri
Copy link

akinuri commented Jun 8, 2018

@bjtho08
Copy link

bjtho08 commented Feb 15, 2019

What is the status of this issue? It has been almost three years since the first proposal. I am unfortunately unable to provide any help since I have zero experience with coding in C, but I am among the people that is awaiting support for e.g. multi-channel floating-point images (with possibilities for negative pixel values). This especially useful in deep learning, where it is preferable to have all values normalized with zero mean. PIL has some really awesome ImageOps, which is one of the reasons for wanting this support.

@hugovk
Copy link
Member

hugovk commented Feb 17, 2019

@bjtho08 No updates.


#2485 links to a multipage RGB TIFF containing float64 values.

@omaghsoudi
Copy link

omaghsoudi commented Jul 5, 2019

Please fix the issue with multi-channel 16 bit images.
Thank you!

@aclark4life
Copy link
Member

in favor of dealing directly with np.arrays and cv2 functions for manipulating the data as images. It's not as convenient as what PIL offers but 8bit is a deal breaker.

@herronelou Can you (or anyone?) say any more about the convenience of PIL and how meaningful > 8 bit multichannel support in PIL would be? Would you switch back to PIL if this feature were added and would you expect an uptick in usage from VFX studios in general? I got interested in VFX recently so I'm especially curious about this issue now.

@aclark4life aclark4life changed the title Tracking Issue for high bit depth multichannel images Add support for high bit depth multichannel images Apr 15, 2024
@terramars
Copy link

terramars commented Apr 15, 2024 via email

@herronelou
Copy link

@aclark4life For the most part, VFX studios tend to work with EXR file formats. Internally most of our softwares process in 32bit float, although saving the resulting images in 16bit float is usually enough, except for a small number of specific data passes that we tend to store in other channels.

I've not been doing much personally recently that could have used PIL, the main cases I've run into when I posted were external tools we brought into our pipeline that used PIL for their image reading, and we had to strip it away so we could run our 16bit float images through without the loss caused by going through 8bit, so yes, absolutely, if PIL supported those natively we wouldn't need to go out of our way to strip PIL away when somebody uses it, which would be great.

@rbavery
Copy link

rbavery commented Apr 17, 2024

PIL is used in many ML frameworks for reading images, like FastAI and detectron2 and countless ML projects. When someone tries to use these frameworks or projects as examples with their high bit depth multichannel images, often the first thing to cause grief is this issue. On multiple occasions I've had to rewrite image data loaders for ML because Pillow does not support multichannel float32 tifs. This imagery is really common in geospatial analysis, most satellite imagery comes in high bit depth.

@aclark4life
Copy link
Member

@cgohlke Does any of your code here potentially help us by way of example to implement high bit depth multichannel in Pillow? https://github.com/cgohlke/tifffile/blob/master/tifffile/_imagecodecs.py

Thanks for any info

@aclark4life
Copy link
Member

Via @wiredfool , thanks!

  • I think that there's a good argument for planar image storage, i.e. r/g/b in separate arrays. Any single band calculation would just work, and the more complicated modes (e.g., channels with different bit depth) would be trivial to add, as they would essentially just be part of a list of planes.It would complicate the shufflers, and especially those image formats that currently just splat into an array without using the packer/unpacker. It's also less useful for luminance style calculations, though it's possible. There's definitely a tension in image formats on the interleaved vs planar approach, and I suspect it comes down to "one is easier for basic images, and one is more general.

  • I think there's a super strong argument for being able to have our storage be directly compatible with the arrow memory layout. I'm unclear if we could have arbitrary structs there, if we'd just want a linear array of one datatype, or if we'd want to do a tensor layout, or what the mechanics are for a dataframe style interop. Arrow + the evolution of the array interface would give us 0 copy interaction with polars/pandas2 and anything else in the new data space.

  • I think that interleaved storage with anything more than 1|3|4 channel x [list of pixel storage modes] is going to be a pain.

  • GIS is going to be a pain. I'd still recommend using gdal backed (e.g. rasterio) readers/writers for that, as we've got 0 support for pyramids, spatial metadata, and tiled tiffs. It's a huge field, and we're not even at square 1 for it.

So looking at that, I think there's two definite possibilities for progress.

  1. Planar Image Storage, in parallel with the current interleaved image storage. There's probably a couple of core bits here that would need to be in C, but most could probably be done at the Image.py layer.
  2. Arrow as a core storage interface. This is going to be all c, with a very small shim for the dataframe interface.

@aclark4life
Copy link
Member

Also possibly of interest: https://github.com/girder/large_image

@wiredfool
Copy link
Member Author

FWIW, some references on Arrow.

@aclark4life
Copy link
Member

aclark4life commented May 29, 2024

Can anyone suggest some test data we can use to develop this feature? This event is happening tomorrow and would be nice to have a success target in mind e.g. "If we can read/write this type of data …" https://www.meetup.com/dcpython/events/301086016/

@rbavery
Copy link

rbavery commented May 30, 2024

I think that interleaved storage with anything more than 1|3|4 channel x [list of pixel storage modes] is going to be a pain.

In case it isn't too much pain to work with more than 4 bands, we host this example subset of Eurosat, here is an example image s3://wherobots-examples/data/eurosat_small/Highway/Highway_1.tif.

Each image is 13 bands, uint16, planar

>>> tiff_image = tifffile.TiffFile("Highway_1.tif")
>>> print(tiff_image.pages[0].tags['PlanarConfiguration'].value)
PLANARCONFIG.CONTIG

@aclark4life
Copy link
Member

@wiredfool If we use Arrow that implies adding a dependency on pyarrow, ideally optionally via extras like pip install pillow[arrow], correct?

@wiredfool
Copy link
Member Author

@aclark4life Maybe. There's definitely a C-only implementation (nanoarrow) that might be what we want, since all of our image allocations are in the C layer now. PyArrow might be easier for integration/interop at the high level, but my sense here is that it wouldn't necessarily be giving us a whole lot that we'd not already have with a C arrow implementation + our usual set of accessors.

@aclark4life
Copy link
Member

aclark4life commented Jun 20, 2024

Folks interested in this issue, please test #8224 and give feedback, thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment