Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchvision.io.read_image does not always fail gracefully #3613

Closed
ghost opened this issue Mar 28, 2021 · 7 comments
Closed

torchvision.io.read_image does not always fail gracefully #3613

ghost opened this issue Mar 28, 2021 · 7 comments

Comments

@ghost
Copy link

ghost commented Mar 28, 2021

🐛 Bug

torchvision.io.read_image() will sometimes segfault or abort in other uncatchable ways on malformed images, rather than failing gracefully (e.g. with a RuntimeError).

To Reproduce

Steps to reproduce the behavior:

  1. Download a problematic image file (one that I have found is here)
  2. Try to load the image with torchvision.io.read_image:
>>> import torchvision
>>> image = torchvision.io.read_image("283xnnabju4z.png")
libpng warning: iCCP: known incorrect sRGB profile
munmap_chunk(): invalid pointer
Aborted (core dumped)

Expected behavior

I expected that trying to read an unsupported or malformed image would instead raise a RuntimeError or other catchable error so that it could be handled in code, rather than aborting.

Environment

PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.20.0

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce GTX 1050
Nvidia driver version: 460.67
cuDNN version: /usr/local/cuda-10.2/lib64/libcudnn.so.7.6.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.8.1
[pip3] torchvision==0.9.1

Additional context

Something even more strange also happens with this particular image, which is that setting the mode to ImageReadMode.RGB will allow it to be read once, but attempting to read it a second time fails as above (i.e. torchvision.io.read_image is not idempotent). I'm not sure if this behavior is unrelated, but whatever the root cause is, it would be nice to be able to just catch an error, e.g. to log the filename and skip the image during processing.

>>> import torchvision
>>> image = torchvision.io.read_image("283xnnabju4z.png", mode=torchvision.io.image.ImageReadMode.RGB)
libpng warning: iCCP: known incorrect sRGB profile
>>> image.shape
torch.Size([3, 1410, 2048])
>>> image = torchvision.io.read_image("283xnnabju4z.png", mode=torchvision.io.image.ImageReadMode.RGB)
libpng warning: iCCP: known incorrect sRGB profile
munmap_chunk(): invalid pointer
Aborted (core dumped)

Some quick investigation shows that the problematic images that exhibit this behavior are usually PNGs with a depth of 16 bits. OpenCV and PIL do not appear to have problems reading them.

Additionally, the error message changes sometimes, e.g. to Segmentation fault or double free or corruption (out).

@fmassa
Copy link
Member

fmassa commented Mar 29, 2021

Thanks for the report!

We will be looking into fixing this!

@andfoy
Copy link
Contributor

andfoy commented Mar 29, 2021

Thanks for the information @apisutilis, I'll take a detailed look into this one!

@andfoy
Copy link
Contributor

andfoy commented Mar 30, 2021

It seems like the error happens when the png reading function is trying to destroy the png reading structure after catching the error, that means that torchvision is catching the error, but it causes a segfault when calling png_destroy_read_struct on

png_destroy_read_struct(&png_ptr, &info_ptr, nullptr);

Which in turn calls https://github.com/glennrp/libpng/blob/a37d4836519517bdce6cb9d956092321eca3e73b/pngread.c#L948, where png_free is an alias to free. Therefore this error is related to memory management. I checked if big_row_buf was NULL, but it wasn't.

In my reproduction scenario, torchvision was able to load the image once, but the second call caused the segfault and produced the message libpng error: IDAT: bad parameters to zlib. Which according to this issue ContinuumIO/anaconda-issues#7315, it might be related to the version of zlib used when libpng is invoked. An user commented that the segfault occurred on the second call to libpng, which is the same scenario that we are having right now.

The proposed solution involves downgrading the zlib version (which I haven't verified myself). I'll try to compile ZLib as well as libpng to see if we can get more information.

@fmassa
Copy link
Member

fmassa commented May 26, 2021

@andfoy did you have the chance to look at this again?

@andfoy
Copy link
Contributor

andfoy commented May 27, 2021

@fmassa I haven't tried to compile Zlib locally, I'll give it a go tomorrow!

@NicolasHug
Copy link
Member

Closing, since with #4101 torchvision will now fail gracefully.

@fmassa should we open another issue to keep track of the progress on support for pngs with more than 8 bits ?

@fmassa
Copy link
Member

fmassa commented Jun 24, 2021

@NicolasHug yes, it would be good to have an issue to track supporting pngs with more than 8 bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants