Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot identify image file (PNG file from scanner) #7993

Closed
OmlineEditor opened this issue Apr 19, 2024 · 20 comments · Fixed by #8063
Closed

cannot identify image file (PNG file from scanner) #7993

OmlineEditor opened this issue Apr 19, 2024 · 20 comments · Fixed by #8063

Comments

@OmlineEditor
Copy link

OmlineEditor commented Apr 19, 2024

What did you do?

img = Image.open(path) # I open a file in a script

What did you expect to happen?

script open file with script will continue execution

What actually happened?

an error occurs:

 File "/usr/local/lib/python3.9/dist-packages/PIL/Image.py", line 3339, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file '/var/www/python_for_site/bug.png'

What are your OS, Python and Pillow versions?

  • OS: Debian 11
  • Python: Python 3.9.2
  • Pillow: 10.3.0
Please paste here the output of running:
> python3 -m PIL.report
--------------------------------------------------------------------
Pillow 10.3.0
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
       [GCC 10.2.1 20210110]
--------------------------------------------------------------------
Python executable is /usr/bin/python3
System Python files loaded from /usr
--------------------------------------------------------------------
Python Pillow modules loaded from /usr/local/lib/python3.9/dist-packages/PIL
Binary Pillow modules loaded from /usr/local/lib/python3.9/dist-packages/PIL
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 10.3.0
*** TKINTER support not installed
--- FREETYPE2 support ok, loaded 2.13.2
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.3.2
--- WEBP Transparency support ok
--- WEBPMUX support ok
--- WEBP Animation support ok
--- JPEG support ok, compiled for libjpeg-turbo 3.0.2
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.2
--- ZLIB (PNG/ZIP) support ok, loaded 1.2.11
--- LIBTIFF support ok, loaded 4.6.0
--- RAQM (Bidirectional Text) support ok, loaded 0.10.1, fribidi 1.0.8, harfbuzz 8.4.0
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------

My Code:

from PIL import Image

image_path = "bug.png"
image = Image.open(image_path)
width, height = image.size

print("width:", width, "px")
print("height:", height, "px")

I worked a lot with files, but I can’t open this file even though it’s normal. I can’t open a single file that I scan on a scanner in PNG format.

cannot_identify_image_file.zip

@radarhere
Copy link
Member

If I run pngcheck over your image, I get

CRC error in chunk pHYs (computed eee74573, expected c76fa864)

To skip the check in Pillow, use

from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

image_path = "bug.png"
image = Image.open(image_path)

@aclark4life
Copy link
Member

aclark4life commented Apr 19, 2024

Same issue with convert, although macOS Preview opens it.

% convert bug.png bug.png
convert: pHYs: CRC error `bug.png' @ warning/png.c/MagickPNGWarningHandler/1526.

Actually, convert fixes it:

% convert bug.png bug.png
convert: pHYs: CRC error `bug.png' @ warning/png.c/MagickPNGWarningHandler/1526.
% pngcheck bug.png       
OK: bug.png (579x864, 24-bit RGB, non-interlaced, 57.7%).
% convert bug.png bug.png 
%

@OmlineEditor
Copy link
Author

ImageFile.LOAD_TRUNCATED_IMAGES = True

This code helps solve the issue, but it's crucial to ensure there won't be any issues when processing the image further. Could this code affect the functionality, potentially causing problems down the line?

@radarhere
Copy link
Member

radarhere commented Apr 19, 2024

Apart from skipping some checks with PNGs, the other behaviour of LOAD_TRUNCATED_IMAGES is to try and load images that end prematurely.

The internal Pillow data will not be in a corrupted state, no, all operations on the loaded image will be as valid as they ever were. This is just ignoring the fact that the pixels being read from the image are perhaps not what they are supposed to be.

@OmlineEditor
Copy link
Author

Okay, thanks for the help. The problem in the scanner that cannot correctly calculate the control amount for the file. You can make changes to the code so that there is no error and the message was shown - the file is damaged and has not the right CRC? If there is a message about the CRC problem, and not the error will be better and more understandable then.

@radarhere
Copy link
Member

radarhere commented Apr 20, 2024

You're requesting that we only raise a warning in this situation?

If the image is corrupted or ends prematurely, I think we both agree that users should know there is something wrong. Whether the user would want to continue using a flawed image anyway is a matter of personal preference, and so there is a setting for it. I'd like there to be a stronger argument before changing Pillow's default setting.

The meaning behind UnidentifiedImageError is documented, specifically mentioning this PNG behaviour - https://pillow.readthedocs.io/en/stable/PIL.html#PIL.UnidentifiedImageError

As some background, the error behaviour has been here since the fork from PIL. It was only #1991 that allowed LOAD_TRUNCATED_IMAGES to workaround it.

You might be interested to know that

from PIL import PngImagePlugin
PngImagePlugin.PngImageFile("bug.png")

will show you the SyntaxError directly.

Traceback (most recent call last):
  File "demo.py", line 6, in <module>
    PngImagePlugin.PngImageFile("bug.png")
  File "PIL/ImageFile.py", line 137, in __init__
    self._open()
  File "PIL/PngImagePlugin.py", line 733, in _open
    self.png.crc(cid, s)
  File "PIL/PngImagePlugin.py", line 209, in crc
    raise SyntaxError(msg)
SyntaxError: broken PNG file (bad header checksum in b'pHYs')

@aclark4life
Copy link
Member

Agree this is an error and we're not going to change to warning. Also super-interesting that the PngImagePlugin raises SyntaxError and reveals the bad checksum. The only change I'd consider making here is to add an option similar to LOAD_TRUNCATED_IMAGES to enable more verbose output from Pillow when the image plugin fails to return an open image to ImagePlugin._open. Not sure what that would look like or if there are any existing verbose options in Pillow, but something like --show-me-what-really-happened.

@OmlineEditor
Copy link
Author

Okay, let it show an error, but not just “cannot identify image file”. Let there be a more detailed and understandable error, just change only the text of the error message to: “cannot identify image file, the file is damaged, the file has an incorrect CRC signature

@radarhere
Copy link
Member

radarhere commented Apr 22, 2024

That's not as easy as it sounds.

By default, Pillow checks your image against multiple formats. Some formats can be easily rejected because your image data does not start with the required identifier, but not all.

So if I adjust Pillow to print out the errors raised by any formats against your image

diff --git a/src/PIL/Image.py b/src/PIL/Image.py
index c65cf3850..ab41f525f 100644
--- a/src/PIL/Image.py
+++ b/src/PIL/Image.py
@@ -3333,10 +3333,11 @@ def open(
                     im = factory(fp, filename)
                     _decompression_bomb_check(im.size)
                     return im
-            except (SyntaxError, IndexError, TypeError, struct.error):
+            except (SyntaxError, IndexError, TypeError, struct.error) as e:
                 # Leave disabled by default, spams the logs with image
                 # opening failures that are entirely expected.
                 # logger.debug("", exc_info=True)
+                print(i+": "+str(e))
                 continue
             except BaseException:
                 if exclusive_fp:

I get

PNG: broken PNG file (bad header checksum in b'pHYs')
IM: Syntax error in IM header: �PNG
IMT: not identified by this driver
IPTC: invalid IPTC/NAA file
MPEG: not an MPEG file
PCD: not a PCD file
SPIDER: not a valid Spider file
TGA: not a TGA file

I imagine you don't want to see all of that.

@OmlineEditor
Copy link
Author

I imagine you don't want to see all of that.

This is how it became clearer, let there be more messages to understand where the error is and how to fix it.

@Yay295
Copy link
Contributor

Yay295 commented Apr 29, 2024

Those messages would show even if the image opened successfully, because all of the other attempted formats would print their failures.

@aclark4life
Copy link
Member

I imagine you don't want to see all of that.

I think I'd like to be able to say Image.verbose = True and see all that, but I expect that also may not be as easy as it sounds to implement.

@Yay295
Copy link
Contributor

Yay295 commented Apr 29, 2024

It looks like warnings are added to a list that gets shown at the end if the image can't be opened. Exception messages could probably be treated similarly.

@radarhere
Copy link
Member

I've created #8033 to allow Image.open("bug.png", warn_possible_formats=True) to show the various exceptions as warnings, but only if the image is not able to be opened successfully. See what you think.

@hugovk
Copy link
Member

hugovk commented Apr 30, 2024

I'm not sure about the scalability of adding Boolean flags here and there.

How about adding it to a logger?

@radarhere
Copy link
Member

radarhere commented Apr 30, 2024

I feel the concern about scalability, but as for a logger, as @nulano pointed out, this is something that previously existed, but was removed in #1423.

Pillow/src/PIL/Image.py

Lines 3343 to 3347 in ddbf08f

except (SyntaxError, IndexError, TypeError, struct.error):
# Leave disabled by default, spams the logs with image
# opening failures that are entirely expected.
# logger.debug("", exc_info=True)
continue

I am cautious about making decisions and then undoing them. @wiredfool, as the author of #1423, do you have any thoughts on this?

@aclark4life
Copy link
Member

This is a "nice to have" so I wouldn't add anything for logging or to increase verbose output unless "no other way forward". In this case, it's unfortunate to not get the appropriate information right away, but certainly not critical for us to fix it.

@nulano
Copy link
Contributor

nulano commented Apr 30, 2024

While I'm not sure we should do either of these, I have thought of two options:

  • Add a global setting (similar to MAX_IMAGE_PIXELS) - I agree that a new function parameter for debugging is not very scalable, but a global setting (perhaps even reused from other functions) would not complicate the interface too much.
  • Append all detected issues to the raised UnidentifiedImageError, e.g.:
     File "/usr/local/lib/python3.9/dist-packages/PIL/Image.py", line 3339, in open
        raise UnidentifiedImageError(msg)
    PIL.UnidentifiedImageError: cannot identify image file '/var/www/python_for_site/bug.png'
    The following warnings were raised while attempting to open the file:
    PNG: broken PNG file (bad header checksum in b'pHYs')
    IM: Syntax error in IM header: �PNG
    IMT: not identified by this driver
    IPTC: invalid IPTC/NAA file
    MPEG: not an MPEG file
    PCD: not a PCD file
    SPIDER: not a valid Spider file
    TGA: not a TGA file
    

@aclark4life
Copy link
Member

  • Add a global setting (similar to MAX_IMAGE_PIXELS) - I agree that a new function parameter for debugging is not very scalable, but a global setting (perhaps even reused from other functions) would not complicate the interface too much.

Right, global setting is what I suggested here too.

Append all detected issues to the raised UnidentifiedImageError

If you append based on the global setting, probably OK. If not, probably not.

@radarhere
Copy link
Member

I've created #8063 with Image.WARN_POSSIBLE_FORMATS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants