Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

premature end of JPEG images #916

Closed
ImsuperSH opened this issue Sep 5, 2020 · 27 comments · Fixed by #3638 or #4548
Closed

premature end of JPEG images #916

ImsuperSH opened this issue Sep 5, 2020 · 27 comments · Fixed by #3638 or #4548
Labels
question Further information is requested Stale Stale and schedule for closing soon

Comments

@ImsuperSH
Copy link

❔Question

Epoch gpu_mem GIoU obj cls total targets img_size 1/99 2.87G 0.05456 0.04197 0 0.09652 10 640: 100% 157/157 [00:52<00:00, 2.98it/s] Class Images Targets P R mAP@.5 mAP@.5:.95: 0% 0/157 [00:00<?, ?it/s]Premature end of JPEG file Class Images Targets P R mAP@.5 mAP@.5:.95: 100% 157/157 [00:19<00:00, 8.21it/s] all 2.5e+03 1e+04 0.362 0.777 0.684 0.338

It shows premature end of JPEG images during validation, what leads to this?

Additional context

@ImsuperSH ImsuperSH added the question Further information is requested label Sep 5, 2020
@glenn-jocher
Copy link
Member

This is caused by a corrupted image.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Oct 6, 2020
@seekFire
Copy link

seekFire commented Nov 19, 2020

@glenn-jocher If the error occurs in the beginning of training and shows "Premature end of JPEG file", Is the error due to the corrupted image?

@glenn-jocher
Copy link
Member

@seekFire its not an error is a message, its self descriptive.

@jaqub-manuel
Copy link

Dear @glenn-jocher,
I also have same problem. I wonder how many broken pictures there are. Also more these messages arrived in the new epoch?
These corrupted pictures are probably not something related to annotations, but the images are not exactly copied? When the images were cached, only 1 was written out of order.
Thanks in advance ...
Screenshot from 2020-12-06 18-10-51

@glenn-jocher
Copy link
Member

@jaqub-manuel this is a very low level C++ warning in the cv2 image loader I think. It does not produce an error and is not possible to tag these as corrupted in any way that I know currently. Stackoverflow has a few conversations on the topic.

The result is an image will only partially load, the rest of the area will be black. 1 or 2 images with this problem should not harm your dataset.

@jaqub-manuel
Copy link

@jaqub-manuel this is a very low level C++ warning in the cv2 image loader I think. It does not produce an error and is not possible to tag these as corrupted in any way that I know currently. Stackoverflow has a few conversations on the topic.

The result is an image will only partially load, the rest of the area will be black. 1 or 2 images with this problem should not harm your dataset.

Many Thanks for clarification...

@sramakrishnan247
Copy link

sramakrishnan247 commented Mar 5, 2021

@jacklinquan @glenn-jocher
How do you know the number of files that have this issue?
I see something like this on my logs:


Transferred 794/802 items from yolov5x.pt
Optimizer groups: 134 .bias, 142 conv.weight, 131 other
Scanning images: 100%|██████████| 1822/1822 [00:00<00:00, 15125.35it/s]
Scanning labels /home/mli/sramakrishnan/exp6/obj_detector_training/labels.cache (1822 found, 0 missing, 0 empty, 0 duplicate, for 1822 images): 1822it [00:00, 34582.57it/s]
Scanning images: 100%|██████████| 472/472 [00:00<00:00, 15131.13it/s]
Scanning labels /home/mli/sramakrishnan/exp6/obj_detector_training/labels.cache (472 found, 0 missing, 0 empty, 0 duplicate, for 472 images): 472it [00:00, 33152.67it/s]
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

Analyzing anchors... anchors/target = 5.13, Best Possible Recall (BPR) = 0.9999
Image sizes 640 train, 640 test
Using 4 dataloader workers

Does it mean only for these many files?

@glenn-jocher
Copy link
Member

@sramakrishnan247 it looks like 8 of your files have 'premature end of JPEG'. This is a low level warning, and is not caught by python asserts or cv2 loading errors, so these files will all be used for training.

@sramakrishnan247
Copy link

sramakrishnan247 commented Mar 5, 2021

@glenn-jocher
Thanks for letting me know. As long as its 8 files, I assume it is safe to ignore. I have around 2000 samples.

@madr3z
Copy link

madr3z commented Mar 6, 2021

Can anyone please let us know how to find which images have this problem or how can we fix these images?
I tried detecting the problematic images using cv2.imread() but did not find any.

@glenn-jocher
Copy link
Member

@madr3z this is a low level warning, and is not caught by python asserts or cv2 loading errors, so these files will all be used for training. There is currently no way to identify them, though you could always debug this by printing each image name as it's cached and observing which coincides with the messages.

@xiaowk5516
Copy link
Contributor

xiaowk5516 commented Jun 16, 2021

it may occur when not downloads the complete image file.
check code:

image_path = ''
if image_path.endswith('jpg'):
    with open(image_path, 'rb') as f: 
        f.seek(-2, 2)
        if f.read() == '\xff\xd9':
            # complete image
        else:
            # Incomplete image

you can try this.

@glenn-jocher
Copy link
Member

@xiaowk5516 that's an interesting piece of code! We may be able to integrate this into the dataset checks if the speed is fast and it works as intended. The correct location for this would be here:

yolov5/utils/datasets.py

Lines 1054 to 1061 in 65f81bf

# verify images
im = Image.open(im_file)
im.verify() # PIL verify
shape = exif_size(im) # image size
segments = [] # instance segments
assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
assert im.format.lower() in img_formats, f'invalid image format {im.format}'

@glenn-jocher
Copy link
Member

@xiaowk5516 I think the following image scanning code should work based on your idea. Can you submit a PR to help integrate this code into master to help everyone with this problem?

        # verify images
        im = Image.open(im_file)
        im.verify()  # PIL verify
        shape = exif_size(im)  # image size
        assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
        assert im.format.lower() in img_formats, f'invalid image format {im.format}'
        if im.format.lower() in ('jpg', 'jpeg'):
            with open(im_file, 'rb') as f:
                f.seek(-2, 2)
                assert f.read() == b'\xff\xd9', 'corrupted JPEG'

@glenn-jocher glenn-jocher added the TODO High priority items label Jun 16, 2021
@xiaowk5516
Copy link
Contributor

@xiaowk5516 I think the following image scanning code should work based on your idea. Can you submit a PR to help integrate this code into master to help everyone with this problem?

        # verify images
        im = Image.open(im_file)
        im.verify()  # PIL verify
        shape = exif_size(im)  # image size
        assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
        assert im.format.lower() in img_formats, f'invalid image format {im.format}'
        if im.format.lower() in ('jpg', 'jpeg'):
            with open(im_file, 'rb') as f:
                f.seek(-2, 2)
                assert f.read() == b'\xff\xd9', 'corrupted JPEG'

Of course! I will submit it soon.

@glenn-jocher
Copy link
Member

@xiaowk5516 great!

@glenn-jocher glenn-jocher linked a pull request Jun 16, 2021 that will close this issue
@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 16, 2021

@ImsuperSH @seekFire @sramakrishnan247 @jacklinquan @madr3z good news 😃! Your original issue may now be fixed ✅ in PR #3638. This PR adds JPEG corruption error checking by @xiaowk5516 to the YOLOv5 train and testloaders. To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@glenn-jocher glenn-jocher removed the TODO High priority items label Jun 16, 2021
@Poulinakis-Konstantinos

Hello, this might be a little late but I found a solution to fixing premature ending error. I leave this here in case anyone needs it in the future.

In short, using opencv to read the image and then save it using opencv will fix the image and add the EOI code 'D9' in the end of the hex file.
https://github.com/Poulinakis-Konstantinos/ML-util-functions/blob/master/scripts/Img_Premature_Ending-Detect_Fix.py

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 25, 2021

@Poulinakis-Konstantinos thanks for the idea! Do you know if PIL Image saving also resolves the issue?

The reason I ask is the images are already opened with PIL as im when the corruption scanning is performed:

yolov5/utils/datasets.py

Lines 866 to 876 in 2da6444

# verify images
im = Image.open(im_file)
im.verify() # PIL verify
shape = exif_size(im) # image size
assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
assert im.format.lower() in IMG_FORMATS, f'invalid image format {im.format}'
if im.format.lower() in ('jpg', 'jpeg'):
with open(im_file, 'rb') as f:
f.seek(-2, 2)
assert f.read() == b'\xff\xd9', 'corrupted JPEG'

@Poulinakis-Konstantinos

@glenn-jocher I just tested it with PIL. Yes, saving the image with PIL does restore the image's EOI mark !

Adding a save command in case a corrupted image is detected would probably be beneficial .

@glenn-jocher
Copy link
Member

@Poulinakis-Konstantinos hmm interesting. Ok, we need to be very careful about saving the images as PIL includes a default compression level, cv2 I'm not sure, but we want to make sure the new JPG pixel values are not altered in any way.

If we can get some corrupted images to pass an np.allclose() test before and after I think that should suffice. Do you have any corrupted JPEGs you could share?

@glenn-jocher
Copy link
Member

Maybe something like this:

im.save(im_file, format='JPEG', subsampling=0, quality=100)

From https://stackoverflow.com/questions/19303621/why-is-the-quality-of-jpeg-images-produced-by-pil-so-poor

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 26, 2021

@Poulinakis-Konstantinos I can't figure out how to save a JPG without altering it. I created a script here that shows significant differences in pixel values on both cv2 and PIL saving. Do you have any ideas?

import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

fp = '000000000034.jpg'  # original image file path
fp_pil = fp + '.PIL.jpg'
fp_cv2 = fp + '.cv2.jpg'

# Read and write cv2 and PIL JPGs
Image.open(fp).save(fp_pil)
cv2.imwrite(fp_cv2, cv2.imread(fp))

# Read new JPGs and compare
im = cv2.imread(fp)
im_pil = cv2.imread(fp_pil)
im_cv2 = cv2.imread(fp_cv2)
dp = (im - im_pil).ravel()
dc = (im - im_cv2).ravel()
print(np.allclose(im, im_pil))
print(np.allclose(im, im_cv2))

# Plot
fig, ax = plt.subplots(1, 2, figsize=(8, 4), tight_layout=True)
ax[0].hist(dp, 255)
ax[1].hist(dc, 255)
plt.savefig('results.jpg')

results

Related: https://stackoverflow.com/questions/54610705/copied-image-saved-with-different-pixels-to-original-with-pil

@glenn-jocher glenn-jocher reopened this Aug 26, 2021
@glenn-jocher glenn-jocher linked a pull request Aug 26, 2021 that will close this issue
@glenn-jocher
Copy link
Member

@Poulinakis-Konstantinos I've opened a PR with a fix in #4548. Can you review please?

@xiaowk5516
Copy link
Contributor

@glenn-jocher jpg and jpeg is lossy compression for digital images. that is, its compression is irreversible, and the pixel value of the image obtained by decompression and recompression will be different.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 26, 2021

@ImsuperSH @Poulinakis-Konstantinos @seekFire @jaqub-manuel @xiaowk5516 good news 😃! Your original issue may now be fixed ✅ in PR #4548. This PR automatically restores and saves corrupted JPEGs before training starts, and all images are now used for training, including the restored JPEGs.

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload with model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale Stale and schedule for closing soon
Projects
None yet
8 participants