Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of _repr_jpeg_ causes both JPEG and PNG format images to be attached to Jupyter cells #7259

Closed
smason opened this issue Jul 4, 2023 · 24 comments
Labels

Comments

@smason
Copy link
Contributor

smason commented Jul 4, 2023

I've had a go at improving the interaction between Jupyter and Images. As I commented on in #7135, that change causes the image to be saved in JPEG and PNG format and both files attached to the invoking cell.

This behavior seems redundant, and given that no resizing is done can result in very large .ipynb files if the user isn't careful when working with high resolution images. I've had a look at how to avoid this, and the nicest way I could see to do this was by adding a Image._repr_mimebundle_ method that would do the "right thing" in more circumstances.

Specifically, it would know whether the user has called display_jpeg or display_png and could return the appropriate encoding. If neither format is specifically requested it could return something appropriate for the data.

I've had a go at implementing this and it currently looks like:

THRESHOLD_SIZE = 1200

MODE_MAP = {
    'La': 'LA',
    'LAB': 'RGB',
    'HSV': 'RGB',
    'RGBX': 'RGB',
    'RGBa': 'RGBA',
}

VALID_JPEG_MODES = {'L', 'RGB', 'YCbCr', 'CMYK'}
VALID_PNG_MODES = {'1', 'P', 'L', 'RGB', 'RGBA', 'LA', 'PA'}

def _repr_mimebundle_(self, include=None, exclude=None, **kwargs):
    # let IPython interpret its ambiguous include/exclude flags
    if include is not None or exclude is not None or kwargs:
        return None

    # massage the image into something that has a reasonable
    # chance of being viewed by the user

    # handle the special snowflake modes
    if self.mode in {'I', 'F'}:
        # linearly transform extrema to fit in [0, 255]
        # this should have a similar result as Image.histogram
        lo, hi = self.getextrema()
        scale = 256 / (hi - lo) if lo != hi else 1
        image = self.point(lambda e: (e - lo) * scale).convert('L')
    elif self.mode == 'I;16':
        # linearly transform max down to 255
        image = self.point(lambda e: e / 256).convert('L')
    else:
        image = self

    # shrink large images so they don't take too long to transfer/render
    factor = max(image.size) // THRESHOLD_SIZE
    if factor > 1:
        image = image.reduce(factor)

    # process remaining modes into things supported by writers
    if image.mode in MODE_MAP:
        image = image.convert(MODE_MAP[image.mode])

    jpeg = image._repr_jpeg_() if image.mode in VALID_JPEG_MODES else None
    png = image._repr_png_() if image.mode in VALID_PNG_MODES else None

    # prefer lossless format if it's not significantly larger
    if jpeg and png:
        # 1.125 and 2**18 used as they have nice binary representations
        if len(png) < len(jpeg) * 1.125 + 2**18:
            jpeg = None
        else:
            png = None

    return {
        'image/jpeg': jpeg,
        'image/png': png,
    }

This can be tested with something like:

size = 100, 100
arr = np.linspace(0, 10, np.prod(size), endpoint=False, dtype=np.int32)
image = Image.fromarray(arr.reshape(size))
display.display(_repr_mimebundle_(image), raw=True)

Could turn this into a pull-request if this seems like the right way to go.

Note the if include is not None or exclude is not None code at the top would mean that explicit calls like:

display.display_jpeg(image)

would cause Jupyter to end up calling Image._repr_jpeg_ directly, bypassing this code. If called with a large image, this would cause a full-res JPEG to be returned. Not sure if this is the right thing, or even what the "correct" way of handling these arguments should be.

I've only seen exclude=None in my testing, so am going on description in IPython/core/formatters.py and what graphviz does.

@radarhere radarhere changed the title addition of _repr_jpeg_ causes both JPEG and PNG format images to be attached to Jupyter cells Addition of _repr_jpeg_ causes both JPEG and PNG format images to be attached to Jupyter cells Jul 4, 2023
@radarhere
Copy link
Member

the image to be saved in JPEG and PNG format and both files attached to the invoking cell

Particularly for those of us not familiar with IPython, could you explain how to replicate this?

@smason
Copy link
Contributor Author

smason commented Jul 5, 2023

could you explain how to replicate this?

Yup, sorry! In the shell you can run (in a new venv if needed):

pip install -U pillow notebook
jupyter notebook

This should result in your web browser opening the Jupyter home page. You can then click on the "New" button in the top right and create a new notebook (i.e. .ipynb file) using a Python kernel. This should open a new tab where you can enter something like the following code:

from PIL import Image
Image.open('something.png')

and either hit the Run button at the top, or press Ctrl+Enter to evaluate this block.

Running this should cause the image to appear below the code. If you then press the "Save and Checkpoint" icon (top left toolbar) it'll save the file. Switching back to the shell, you can look at the resulting JSON encoded file. Something like:

less -S Untitled.ipynb

would do the job. Under a data key you should see an object with with image/jpeg and image/png keys, with the images as their value. They're base64 encoded adding to the file size.

@radarhere
Copy link
Member

Thanks for that.

https://ipython.readthedocs.io/en/stable/config/integrating.html#MyObject states that _repr_mimebundle_

Should return a dictionary of multiple formats, keyed by mimetype

https://ipython.readthedocs.io/en/stable/config/integrating.html#custom-methods states

In general, all available formatters will be called when an object is displayed, and it is up to the UI to select which to display. A given formatter should not generally change its output based on what other formats are available - that should be handled at a different level, such as the DisplayFormatter, or configuration.

Correct me if I'm wrong, but it would seem that IPython has made a decision to include all supported representations when saving. "Pillow supports iPython's API too well" doesn't sound like a problem to be solved. Are you not fighting against IPython's decision to include all formats? If this is how they think their software should behave, why should we try and override that decision?

@smason
Copy link
Contributor Author

smason commented Jul 5, 2023

Are you not fighting against IPython's decision to include all formats? If this is how they think their software should behave, why should we try and override that decision?

Yup, I was wondering the same when writing this code but when I noticed _repr_mimebundle_ I interpreted it as being there for this reason. I'll ask for clarification from the IPython project.

That said, I think it would be helpful if something similar to my above code ran somewhere so more images have a chance of appearing by default.

Thinking about it now, it might be worth using the warnings package when it's doing things like resizing images or changing image modes so users get some notice of what's going on.

@mtreinish
Copy link
Contributor

I think this is also causing failure in certain scenarios. Specifically with the release of 10.0.0 the rustworkx project's CI has started failing because the jpeg output from a Image loaded from a png fails in _repr_jpeg_ as called by jupyter: https://github.com/Qiskit/rustworkx/actions/runs/5465653032/jobs/9949777022?pr=921#step:7:237 The traceback looks like it's failing when trying to save a JPEG as RGBA from a loaded png as part of Image._repr_jpeg_ (which makes sense since jpeg doesn't support transparency).

In that case the docs (running via jupyter) are writing a png image to disk by calling out to graphviz and loading that file with PIL.Image.open() and returning that as the output to the jupyter cell. You can see the code for this in the docs here: https://github.com/Qiskit/rustworkx/blob/main/src/digraph.rs#L1853-L1872

mtreinish added a commit to mtreinish/retworkx that referenced this issue Jul 5, 2023
Since the recent release of Pillow 10.0.0 the docs CI job has started
failing due to an error in Pillow when trying to run the jupyter-execute
cell in the `.to_dot()` docstring. It looks like a bug that was
introduced in the new release which is being tracked in
python-pillow/Pillow#7259 where it's trying to return a jpeg
representation of the object from the RGBA data loaded from a PNG. Until
the issue is resolved upstream in pillow this commit just caps the
version we run in CI via the constraints file. While pillow is an
optional dependency and we could cap the version in the extras, this
issue isn't severe enough to warrant that, and the typical pillow usage,
especially via the rustworkx api (i.e. graphviz_draw() which returns a
PIL.Image object) will continue to work as expected with pillow 10.0.0
there isn't a reason to cap more broadly. This is just needed as a
workaround to unblock CI.
@radarhere
Copy link
Member

Rather than a change in Pillow, wouldn't a more general solution be to ask IPython to change their code, so that if some representation formats pass and others fail, then the exceptions are caught?

mergify bot pushed a commit to Qiskit/rustworkx that referenced this issue Jul 5, 2023
Since the recent release of Pillow 10.0.0 the docs CI job has started
failing due to an error in Pillow when trying to run the jupyter-execute
cell in the `.to_dot()` docstring. It looks like a bug that was
introduced in the new release which is being tracked in
python-pillow/Pillow#7259 where it's trying to return a jpeg
representation of the object from the RGBA data loaded from a PNG. Until
the issue is resolved upstream in pillow this commit just caps the
version we run in CI via the constraints file. While pillow is an
optional dependency and we could cap the version in the extras, this
issue isn't severe enough to warrant that, and the typical pillow usage,
especially via the rustworkx api (i.e. graphviz_draw() which returns a
PIL.Image object) will continue to work as expected with pillow 10.0.0
there isn't a reason to cap more broadly. This is just needed as a
workaround to unblock CI.
mtreinish added a commit to mtreinish/Pillow that referenced this issue Jul 6, 2023
In 10.0.0 a _repr_jpeg_ implementation was added to the Image object to
enable the use of display_jpeg() in IPython environments. However, in
some cases the implementation of this method could result in an
exception being raised while trying to generate the jpeg data. The best
example is if the image data is in an RGBA format as a result of the
object being created by opening a PNG file. In this case trying to save
the Image object as a jpeg will error because the jpeg format can't
represent the transparency in the alpha channel. This results in an
exception being raised in the IPython/Jupyter context when outputing the
image object. However, in cases like this IPython allows the repr
methods to return None to indicate there is no representation in that
format available. [1] This commit updates the _repr_png_ and _repr_jpeg_
methods to catch any exception that might be raised while trying to
generate the image data. If an exception is raised we treat that as not
being able to generate image data in that format and return None
instead.

Related to python-pillow#7259

[1] https://ipython.readthedocs.io/en/stable/config/integrating.html#custom-methods
@mtreinish
Copy link
Contributor

Sure, it's arguable where the bug actually lies. But, raising an exception in this case where the Image object doesn't have a native JPEG representation like what I encountered doesn't seem like the best behavior to me though. Especially because IPython lets the _repr_* functions return None as an indication there isn't a valid format of that type. So at least in the case where there is an alpha channel in the data having the _repr_jpeg_ method return None seems like better behavior for Pillow. The alternative would be that Pillow converts the RGBA data to something jpeg could handle and outputs that, but raising an exception as a side effect of where the image came from doesn't seem like good behavior regardless of how IPython behaves. As for the larger discussion around the reprs I don't really have a strong opinion.

I opened a pull request to adjust the behavior of the repr methods so they return None in case saving the image data fails for any reason: #7266

@smason
Copy link
Contributor Author

smason commented Jul 6, 2023

I'm pretty sure that throwing an exception is the wrong thing to be doing as well.

My code at the top does this mode conversion and causes the endpoint to return either PNG or JPEG data depending on which format produces the smaller output. It also resizes the image as it's assumed that the image is for displaying in a moderately sized window as working with PNG images is relatively slow (i.e. several seconds for encoding, transferring, decoding in a browser for a 10 megapixel image).

@homm
Copy link
Member

homm commented Jul 13, 2023

Could anyone explain why Pillow needs _repr_jpeg_ at all? Original intention was support display_jpeg. Why we have to support it?

@radarhere
Copy link
Member

Only because @n3011 suggested it in #7135, and there was no reason not to at the time.

If it is decided that it is not worth the effort, we could remove it again.

@aclark4life
Copy link
Member

@homm We probably don't have to support it, but I agree with the original intention of trying to support it based on user feedback, and it should become clear over time if more folks than @smason have an issue, we could consider removing it. But it seems like the current implementation can stand for now, based on everything I've read.

@rmjarvis
Copy link

rmjarvis commented Jul 13, 2023

I'll add myself as +1 on this issue. I've pinned pillow<10 until this is fixed. Either in Pillow or in Jupyter, I don't much care. But currently, the following very simple bit of code in a Jupyter cell doesn't work.

import PIL
from IPython import display
display.display(PIL.Image.open('image.png'))

Especially if the png file has transparency, which is common in matplotlib plots. I get errors like

KeyError: 'RGBA'
[...]
OSError: cannot write mode RGBA as JPEG
[...]
ValueError: Could not save to JPEG for display

So I need either a fix from IPython or PIL, or a workaround. Until then I'm pinning pillow<10.

mtreinish added a commit to Qiskit/rustworkx that referenced this issue Jul 17, 2023
Since the recent release of Pillow 10.0.0 the docs CI job has started
failing due to an error in Pillow when trying to run the jupyter-execute
cell in the `.to_dot()` docstring. It looks like a bug that was
introduced in the new release which is being tracked in
python-pillow/Pillow#7259 where it's trying to return a jpeg
representation of the object from the RGBA data loaded from a PNG. Until
the issue is resolved upstream in pillow this commit just caps the
version we run in CI via the constraints file. While pillow is an
optional dependency and we could cap the version in the extras, this
issue isn't severe enough to warrant that, and the typical pillow usage,
especially via the rustworkx api (i.e. graphviz_draw() which returns a
PIL.Image object) will continue to work as expected with pillow 10.0.0
there isn't a reason to cap more broadly. This is just needed as a
workaround to unblock CI.
mergify bot added a commit to Qiskit/rustworkx that referenced this issue Jul 17, 2023
)  (#931)

* Adding fix that clippy warns for (#930)

Because of a new Rust update, Clippy now warned for a line of code in cycle_basis.rs in rustworkx-core.
This PR fixes that one line so that Clippy passes.

(cherry picked from commit d88f18b)

* Pin pillow<10 in CI (#922)

Since the recent release of Pillow 10.0.0 the docs CI job has started
failing due to an error in Pillow when trying to run the jupyter-execute
cell in the `.to_dot()` docstring. It looks like a bug that was
introduced in the new release which is being tracked in
python-pillow/Pillow#7259 where it's trying to return a jpeg
representation of the object from the RGBA data loaded from a PNG. Until
the issue is resolved upstream in pillow this commit just caps the
version we run in CI via the constraints file. While pillow is an
optional dependency and we could cap the version in the extras, this
issue isn't severe enough to warrant that, and the typical pillow usage,
especially via the rustworkx api (i.e. graphviz_draw() which returns a
PIL.Image object) will continue to work as expected with pillow 10.0.0
there isn't a reason to cap more broadly. This is just needed as a
workaround to unblock CI.

---------

Co-authored-by: danielleodigie <97267313+danielleodigie@users.noreply.github.com>
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
@radarhere
Copy link
Member

#7266 has been merged. In the next version of Pillow, that should resolve the matter of "Could not save to JPEG for display".

@radarhere
Copy link
Member

Over in the issue asking IPython about whether we're correctly including all formats, and the file size is not something that needs to be solved by Pillow,

ipython/ipython#14109 (comment)

@radarhere this issue should probably be closed shouldn't it?

AFAIU, the main concern from Pillow with turning exceptions into Nones was that it "should be handled at a different level" — requoting the same IPython docs as I did when opening.

My response was

https://ipython.readthedocs.io/en/stable/config/integrating.html#custom-methods

In general, all available formatters will be called when an object is displayed, and it is up to the UI to select which to display. A given formatter should not generally change its output based on what other formats are available - that should be handled at a different level, such as the DisplayFormatter, or configuration.

My understanding is that the documentation states that one format should not affect another. So if PNG is available, then that shouldn't affect whether or not JPEG is saved as well.

Changing JPEG images to return None for RGBA images is not one format affecting another, since it happens independently of whether PNG is available.

I thought the question here was about a large notebook resulting from multiple image formats being saved, and your goal was to only have one format returned based on which was available - thus having one format's output affecting another format's output. If you are no longer concerned about file size, then yes, this can be closed, and so can #7259.

So I'm somewhat confused now. If @smason no longer thinks there is a problem, then #7266 has fixed the errors reported by others here, and this issue can be closed.

@github-actions
Copy link

Closing this issue as no feedback has been received.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2023
@smason
Copy link
Contributor Author

smason commented Aug 28, 2023

@radarhere I'm monkey patching Pillow with my own code in to fix the issue in a way that works for me (i.e. similar to #7268). I'm not sure if there's enough interest to maintain the relevant state in Pillow directly.

Specifically, I'm dealing with raw camera data (i.e. lots of pixels at 12 or 14bpc) and Pillow doesn't seem to track the numeric range from the source data in a way that would help me. I'm also interested in just previewing images while I'm exploring the data in Jupyter (hence my code downsampling images) and returning both PNG and JPEG versions of the image seems wasteful.

When I've got a workflow/code that's mostly correct I'll turn it into a proper Python module that's tested to process things correctly, but, e.g., while working out whether I'm correctly debayering an image correctly I just want a quick preview that things are on track. In my experience this is a good way to use Jupyter notebooks; exploratory work goes into notebooks where it's very quick to iterate on changes (no need to rerun Python and/or reload large datasets), then the code is tidied up for production into a module (or a package for large chunks of code).

You can close as Wont Fix if you think that's relevant, I'm not sure how many other people use Pillow in the way I am.

@radarhere
Copy link
Member

returning both PNG and JPEG versions of the image seems wasteful.

As has been discussed, I don't think this is a Pillow problem, but an IPython decision.

I've left a comment on #7268.

Thanks for your thoughts, and for providing regular feedback along the journey of this issue.

@homm
Copy link
Member

homm commented Aug 29, 2023

For me "is not Pillow problem" is not appropriate here, since this part is not just Pillow internals, this is intentional integration with iPython API. It would be correct to follow the clarification from IPython how to use this API. It is very sad that there are none of them still.

@aclark4life
Copy link
Member

Yeah I agree with @homm or at least the standard for feature requests, as far as I'm concerned, should always be "can we support this feature request without creating issues for ourselves or others and if so, do it". There are plenty of times when we can't support a feature request for whatever reason, so I think that if something works for @smason and one other person and doesn't create issues for anyone else, then it should go in. If there are any issues to be worked through, then it could go in but not until the issues are resolved.

@radarhere
Copy link
Member

radarhere commented Aug 29, 2023

Sorry, to clarify, I meant "I don't think this is a Pillow bug, but a decision that IPython has made". As far as I can tell, we are following https://ipython.readthedocs.io/en/stable/config/integrating.html#custom-methods

In general, all available formatters will be called when an object is displayed, and it is up to the UI to select which to display. A given formatter should not generally change its output based on what other formats are available

From my reading of that, IPython made a decision that saving all formats is the best thing to do. I closed this issue because the creator of the issue felt like it was no longer necessary to question that decision.

@gsgxnet
Copy link

gsgxnet commented Oct 6, 2023

In a notebook I want to display a PNG image, resized:
Image.open("img/PCA_all_features_2D.png").resize(size=(600,500))

Still a similar issue as posted before, although the image is displayed in the notebook.

KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/PIL/JpegImagePlugin.py:639, in _save(im, fp, filename)
    638 try:
--> 639     rawmode = RAWMODE[im.mode]
    640 except KeyError as e:

KeyError: 'RGBA'

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/PIL/Image.py:643, in Image._repr_image(self, image_format, **kwargs)
    642 try:
--> 643     self.save(b, image_format, **kwargs)
    644 except Exception as e:

File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/PIL/Image.py:2413, in Image.save(self, fp, format, **params)
   2412 try:
-> 2413     save_handler(self, fp, filename)
   2414 except Exception:

File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/PIL/JpegImagePlugin.py:642, in _save(im, fp, filename)
    641     msg = f"cannot write mode {im.mode} as JPEG"
--> 642     raise OSError(msg) from e
    644 info = im.encoderinfo

OSError: cannot write mode RGBA as JPEG

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/IPython/core/formatters.py:344, in BaseFormatter.__call__(self, obj)
    342     method = get_real_method(obj, self.print_method)
    343     if method is not None:
--> 344         return method()
    345     return None
    346 else:

File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/PIL/Image.py:661, in Image._repr_jpeg_(self)
    656 def _repr_jpeg_(self):
    657     """iPython display hook support for JPEG format.
    658 
    659     :returns: JPEG version of the image as bytes
    660     """
--> 661     return self._repr_image("JPEG")

File ~/miniconda3/envs/ClustLLM/lib/python3.10/site-packages/PIL/Image.py:646, in Image._repr_image(self, image_format, **kwargs)
    644 except Exception as e:
    645     msg = f"Could not save to {image_format} for display"
--> 646     raise ValueError(msg) from e
    647 return b.getvalue()

ValueError: Could not save to JPEG for display

Why with code in a notebook?
Because there is no standard which works in all notebook IDEs for resizing an image.

@aclark4life
Copy link
Member

What version of Pillow?

@radarhere
Copy link
Member

Hi @gsgxnet. Your problem should be resolved by #7266 on October 15 with the release of Pillow 10.1.0.

Until then, a suggested workaround is

from PIL import Image
del Image._repr_jpeg_
Image.open("img/PCA_all_features_2D.png").resize(size=(600,500))

@radarhere
Copy link
Member

Pillow 10.1.0 has now been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants