save_image_file should set DPI for derived images #343

bertsky · 2019-11-11T12:24:25Z

Currently, any information on image resolution provided in the original image (and made available via OcrdExif in Workspace.image_from_page) is ignored when saving derived images in the workspace (via Workspace.save_image_file). Due to PIL.Image format internals, the PNG then contains a setting of 72 DPI however. This might create problems for processors that look at the derived image files alone.

But this is hard to fix in core: the image passed to save_image_file could come from anywhere (and usually does not have an info['dpi']; even simple PIL.Image operations omit that in the result).

Realistically though, it will have been created some way from the source image file under the same pageId, and since rescaling is currently not permitted in the spec, one could assume the same DPI for all derived images.

The text was updated successfully, but these errors were encountered:

wrznr · 2019-11-11T12:44:08Z

and usually does not have an info['dpi']

Why can't we check whether such information is available? Copy it, if that's the case and delete the default value if not? We should definitely not write default values which we know to be wrong (I added the label bug).

bertsky · 2019-11-11T12:52:55Z

and usually does not have an info['dpi']

Why can't we check whether such information is available?

We could, but as I said: usually it is not. Or should I say: almost always. Any non-mutating PIL.Image operation (even those that wouldn't affect the meta-data) omits info from the result. We cannot change that. And we cannot force processors to give us much more than what PIL.Image does.

We should definitely not write default values which we know to be wrong (I added the label bug).

Okay, but that's probably a bug in Pillow's save for the PNG format.

bertsky · 2019-11-11T13:06:38Z

We should definitely not write default values which we know to be wrong (I added the label bug).

Okay, but that's probably a bug in Pillow's save for the PNG format.

Oh, just found that this appears to have been fixed in Pillow 6.2.1. Since we cannot likely get any better, I will close for now.

kba · 2019-11-11T13:30:14Z

https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst#621-2019-10-21

Should we update to 6.2.1 then? Do you have a sample for me to test? Thanks!

bertsky · 2019-11-11T13:59:04Z

Sorry, I was imprecise: It could have been fixed in an earlier version already.

I saw the following with 5.4.1 (leading up to this issue):

python -c "import PIL.Image; PIL.Image.open('repo/data/assets/scribo-test/data/OCR-D-IMG/OCR-D-IMG-orig_tiff.tif').save('test.png')"
identify -verbose test.png | grep Resolution:
 Resolution: 72x72

However, this does not happen any more with 6.2.1 (where correctly no DPI is saved).

bertsky · 2020-01-07T14:55:54Z

@wrznr just convinced me that we should indeed take action to ensure core and modules comply with the spec (which requires PPI information to be kept for derived images, cf. OCR-D/spec#137).

bertsky · 2020-01-07T14:58:27Z

Not so sure about the time frame for this though. Since it involves patching all modules, I guess the final workshop is out of the question. Setting 3.0 to take out the pressure.

bertsky · 2020-09-21T14:45:29Z

Since it involves patching all modules

If we just assume the derived image passed to save_image_file must come from the (cropped/deskewed/dewarped/binarized/denoised/...) original without rescaling somehow, then we'll only need to patch core (finding the original image with the same pageId and looking at its OcrdExif). Of course we could still allow processors to specify a DPI themselves (in case they did rescale, or just to speed up).

bertsky · 2024-06-26T10:59:07Z

If we just assume the derived image passed to save_image_file must come from the (cropped/deskewed/dewarped/binarized/denoised/...) original without rescaling somehow, then we'll only need to patch core (finding the original image with the same pageId and looking at its OcrdExif). Of course we could still allow processors to specify a DPI themselves (in case they did rescale, or just to speed up).

Alternatively, we could inject DPI info into the coords dict (along with affine transform and image features) at the top image_from_page and then make sure it gets passed down with each image_from_segment. Since images are only ever useful along with their coordinates, and image operations usually have to update that dict consistently (for example when rescaling), any caller of Workspace.save_image_file should have the DPI info nearby.

bertsky added the enhancement label Nov 11, 2019

wrznr added the bug label Nov 11, 2019

bertsky closed this as completed Nov 11, 2019

bertsky mentioned this issue Nov 22, 2019

workspace validator: don't check resolution for derived images #355

Closed

bertsky mentioned this issue Jan 6, 2020

relax DPI metadata requirement for derived images OCR-D/spec#137

Open

bertsky reopened this Jan 7, 2020

bertsky added this to the 3.0.0 milestone Jan 7, 2020

EEngl52 assigned kba Jan 13, 2020

kba added a commit that referenced this issue Sep 21, 2020

workspace.save_image_file: set dpi on image.save, #343

fcb3503

bertsky mentioned this issue Apr 14, 2021

Ocrd cli qurator-spk/eynollah#33

Merged

bertsky mentioned this issue Jun 25, 2024

New processor API #1240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save_image_file should set DPI for derived images #343

save_image_file should set DPI for derived images #343

bertsky commented Nov 11, 2019 •

edited

Loading

wrznr commented Nov 11, 2019

bertsky commented Nov 11, 2019 •

edited

Loading

bertsky commented Nov 11, 2019

kba commented Nov 11, 2019

bertsky commented Nov 11, 2019

bertsky commented Jan 7, 2020

bertsky commented Jan 7, 2020

bertsky commented Sep 21, 2020

bertsky commented Jun 26, 2024

save_image_file should set DPI for derived images #343

save_image_file should set DPI for derived images #343

Comments

bertsky commented Nov 11, 2019 • edited Loading

wrznr commented Nov 11, 2019

bertsky commented Nov 11, 2019 • edited Loading

bertsky commented Nov 11, 2019

kba commented Nov 11, 2019

bertsky commented Nov 11, 2019

bertsky commented Jan 7, 2020

bertsky commented Jan 7, 2020

bertsky commented Sep 21, 2020

bertsky commented Jun 26, 2024

bertsky commented Nov 11, 2019 •

edited

Loading

bertsky commented Nov 11, 2019 •

edited

Loading