Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing images when uploading to the server #710

Closed
karlcow opened this issue Sep 28, 2015 · 28 comments
Closed

Optimizing images when uploading to the server #710

karlcow opened this issue Sep 28, 2015 · 28 comments
Assignees

Comments

@karlcow
Copy link
Member

karlcow commented Sep 28, 2015

We may want to optimize an image when it's being uploaded:

  • reduce the size of the image (performance concerns)
  • remove EXIF data from the image (privacy concerns)

Discussed at Paris WebCompat meeting on September 2015 with @miketaylr
See https://github.com/webcompat/webcompat.com/blob/master/webcompat/api/uploads.py#L53

@karlcow
Copy link
Member Author

karlcow commented Oct 13, 2015

We will be probably using pillow for dealing with images.
http://python-pillow.github.io/

karlcow added a commit to karlcow/webcompat.com that referenced this issue Oct 13, 2015
@karlcow
Copy link
Member Author

karlcow commented Oct 13, 2015

Many of Pillow’s features require external libraries:

    libjpeg provides JPEG functionality.
        Pillow has been tested with libjpeg versions 6b, 8, and 9 and libjpeg-turbo version 8.
        Starting with Pillow 3.0.0, libjpeg is required by default, but may be disabled with the --disable-jpeg flag.
    zlib provides access to compressed PNGs
        Starting with Pillow 3.0.0, zlib is required by default, but may be disabled with the --disable-zlib flag.

@karlcow
Copy link
Member Author

karlcow commented Oct 13, 2015

→ brew install libjpeg webp
Warning: You are using OS X 10.11.
We do not provide support for this pre-release version.
You may encounter build failures or other breakage.
==> Downloading https://homebrew.bintray.com/bottles/jpeg-8d.el_capitan.bottle.2.tar.gz
######################################################################## 100,0%
==> Pouring jpeg-8d.el_capitan.bottle.2.tar.gz
🍺  /usr/local/Cellar/jpeg/8d: 18 files, 760K
==> Installing dependencies for webp: libpng
==> Installing webp dependency: libpng
==> Downloading https://homebrew.bintray.com/bottles/libpng-1.6.18.el_capitan.bottle.tar.gz
######################################################################## 100,0%
==> Pouring libpng-1.6.18.el_capitan.bottle.tar.gz
🍺  /usr/local/Cellar/libpng/1.6.18: 17 files, 1,2M
==> Installing webp
==> Downloading https://homebrew.bintray.com/bottles/webp-0.4.3.el_capitan.bottle.tar.gz
######################################################################## 100,0%
==> Pouring webp-0.4.3.el_capitan.bottle.tar.gz
🍺  /usr/local/Cellar/webp/0.4.3: 32 files, 1,7M
(webcompatcom)17:59:51 ~/code/webcompat.com

then on MacOSX it might be necessary to activate xcode-select --install. Once everything is installed, you can do, in the work environment for webcompat


pip install Pillow


---

PIL SETUP SUMMARY

version Pillow 3.0.0
platform darwin 2.7.10 (default, Aug 22 2015, 20:33:39)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)]

--- TKINTER support available
--- JPEG support available
*** OPENJPEG (JPEG2000) support not available
--- ZLIB (PNG/ZIP) support available
*** LIBTIFF support not available
*** FREETYPE2 support not available
*** LITTLECMS2 support not available
--- WEBP support available
--- WEBPMUX support available


@karlcow
Copy link
Member Author

karlcow commented Oct 13, 2015

testing code before implementing a solution.

>>> from PIL import Image
>>> # setting a generic size
>>> size = (256,256)
>>> im3 = Image.open('/Users/karl/foobar.jpg')
>>> # display the image to check if it's the right one.
>>> im3.show()
>>> # display the exif
>>> im3._getexif()
{36864: ('0221',), 37121: ('\x01\x02\x03\x00',), 37378: (2.5260688216892597,), 36867: (u'2015:10:05 16:01:01',), 36868: (u'2015:10:05 16:01:01',), 41989: (32,), 40960: ('0100',), 37383: (5,), 37385: (32,), 37386: (3.85,), 41986: (0,), 271: u'Brand', 272: u'model_blah', 274: 6, 531: 1, 41495: (2,), 282: 72.0, 283: 72.0, 33434: (0.03333333333333333,), 34850: (2,), 40961: (1,), 34853: (576,), 34855: (400,), 296: 2, 41987: (0,), 33437: (2.4,), 305: u'6.1.6', 306: u'2015:10:05 16:01:01', 37377: (4.914738124238733,), 40962: (960,), 41990: (0,), 40963: (720,), 34665: 206, 37379: (2.2344952467179717,)}
>>> im3.thumbnail(size)
>>> im3.save('pic-thumb-test.jpg', 'JPEG')
>>> im4 = Image.open('pic-thumb-test.jpg')
>>> im4.show()
>>> # return an empty result. The tumbnail image doesn't have exif anymore.
>>> im4._getexif()
>>> 

I need to find out to optimize and remove the exif from the big one.

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

OK. I guess I have all the parts to make this works:

Saving a file with the same name is removing the EXIF automatically.

>>> from PIL import Image
>>> im = Image.open('/Users/karl/Desktop/foobar.jpg')
>>> im._getexif()
{36864: ('0221',), 37121: ('\x01\x02\x03\x00',), 37378: (2.5260688216892597,), 36867: (u'2015:10:05 16:01:01',), 36868: (u'2015:10:05 16:01:01',), 41989: (32,), 40960: ('0100',), 37383: (5,), 37385: (32,), 37386: (3.85,), 41986: (0,), 271: u'Brand', 272: u'model_blah', 274: 6, 531: 1, 41495: (2,), 282: 72.0, 283: 72.0, 33434: (0.03333333333333333,), 34850: (2,), 40961: (1,), 34853: (576,), 34855: (400,), 296: 2, 41987: (0,), 33437: (2.4,), 305: u'6.1.6', 306: u'2015:10:05 16:01:01', 37377: (4.914738124238733,), 40962: (960,), 41990: (0,), 40963: (720,), 34665: 206, 37379: (2.2344952467179717,)}
>>> im.save('/Users/karl/Desktop/foobar.jpg', 'JPEG')
>>> im2 = Image.open('/Users/karl/Desktop/foobar.jpg')
>>> im2._getexif()

It also reduces the size in weight for the same dimensions.

ls -ahl ~/Desktop/foobar.jpg
# BEFORE
# -rw-------  1 karl  staff   222K 24 aoû 07:43 /Users/karl/Desktop/foobar.jpg
# AFTER
# -rw-------  1 karl  staff    70K 22 oct 15:32 /Users/karl/Desktop/foobar.jpg

\o/ ok now that I have everything in place, we can start coding a bit.

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

@miketaylr Do we want to keep the original format or do we want to save everything to jpg?
Aka someone uploads a png screenshot and we save it as jpeg instead of png?

I'm in favor of saving always as jpeg so we can save weight, but I might forget about some issues.

@magsout
Copy link
Member

magsout commented Oct 22, 2015

Aka someone uploads a png screenshot and we save it as jpeg instead of png?

IMO I think it's better to keep original format and resolution, so the original file. If in the futur we want to use png or jpeg or better resolution we have original file so better to improve what we want.

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

Thanks @magsout
Reasonable
I guess I can keep the original format. And I will create the smaller version in JPEG after finishing with this. It's Issue #722

@miketaylr
Copy link
Member

@karlcow are those JPEG file size reductions just for removing EXIF/metadata? Or is there some compression involved?

I think I agree with @magsout, let's start with original format and we have the option of doing something fancier in the future.

+1 to the idea of small JPEGs for #722.

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

@miketaylr I need to test more what the library does by default. The stripping of EXIF would not remove that much. For JPEG, I'm pretty sure there is compression going on. I haven't tested for png yet. I'll do that tomorrow probably.

@karlcow
Copy link
Member Author

karlcow commented Oct 23, 2015

ok cool.
When saving the PNG as PNG. It creates also a compression without apparent loss of quality.
with a test image, it allowed me to reduce from 20Ko to 16 Ko. (to remember that there is no EXIF in PNG files)

I guess for JPEG, it applies a default compression.

Another neat feature of Pillow and that we wanted is to have the real format compare to someone uploading a blabla.png which is in fact a blabla.js or a blabla.jpg.

>>> img = Image.open(image_file)
>>> img.size
(486, 53)
>>> img.format
'PNG'

Now I just have to figure out how to save the modified image into the UploadSet defined by images = UploadSet('uploads', IMAGES).

@karlcow
Copy link
Member Author

karlcow commented Oct 23, 2015

to note also, that for JPEG in between the original version and the optimized version, from my tests, I didn't detect any weird compression artifacts.

@karlcow
Copy link
Member Author

karlcow commented Oct 23, 2015

From http://pillow.readthedocs.org/en/latest/handbook/image-file-formats.html about JPEG on save() method.

quality. The image quality, on a scale from 1 (worst) to 95 (best). The default is 75.
It's why the image is being reduced.

And for PNG

optimize. If present, instructs the PNG writer to make the output file as small as possible. This includes extra processing in order to find optimal encoder settings.

@miketaylr
Copy link
Member

For JPEG, 75 is a pretty good compromise between quality and size, IMO (especially for screenshots of website bugs, rather than user avatar uploads or a photography social media website).

karlcow added a commit to karlcow/webcompat.com that referenced this issue Dec 12, 2015
This will probably require installation of other things at the system level.
You will need to have
- libjpeg
- webp
- libpng
karlcow added a commit to karlcow/webcompat.com that referenced this issue Dec 12, 2015
@miketaylr
Copy link
Member

http://flask.pocoo.org/docs/0.10/patterns/fileuploads/#improving-uploads explains how we can set app.config['MAX_CONTENT_LENGTH'] to get the same functionality as the current call to patch_request_class(app, 4 * 1024 * 1024) from flask.uploads.

@karlcow
Copy link
Member Author

karlcow commented Dec 19, 2015

Oh cool! Thanks.

@miketaylr
Copy link
Member

@karlcow OK, #882 was just merged. I think that should make this task very simple for you now -- you can do your optimization magic somewhere in the Upload.to_image_object or Upload.save methods (which both use Pillow).

👀 Just read the docs, and it looks like we just need to pass the right params to image_object.save()

@karlcow
Copy link
Member Author

karlcow commented Dec 28, 2015

\o/ ^_^

@miketaylr
Copy link
Member

@karlcow, are you actively working on this? I might steal it from you if not -- but don't want to step on any work-in-progress code.

@karlcow
Copy link
Member Author

karlcow commented Mar 2, 2016

ah nop nop wait. ^_^ I want to give it a stab. It's just that I was doing other things

@miketaylr
Copy link
Member

@karlcow cool, no issue.

@karlcow
Copy link
Member Author

karlcow commented Mar 4, 2016

ok started to write pseudo code yesterday for images optim.
I need to put in place some tests so we are sure we do the right thing with images.

  • an image which is 300x200 (should not be resized)
  • an image which is 800x500 (should be resized in width for 700 and proportionally in height)
  • an image which is 300x800 (should not be resized)
  • an image with EXIF data (no more EXIF after saving)
  • an image which has been resized should provide something like
    <a href="original.png"><img src="original-thumb.png" alt="…"/></a>

Other tests?

@miketaylr
Copy link
Member

(It sounds like some of these tests and scenarios are for #722, but maybe you're fixing both issues in a single pass)

  • an already optimized image (should produce the same output)
  • a corrupt or fake image (should do what we expect, i.e., bail -- no change in current behavior)

@karlcow
Copy link
Member Author

karlcow commented Mar 10, 2016

huh. :) yup ok I will just push the commit for this one and work after on #722 THanks @miketaylr

@karlcow
Copy link
Member Author

karlcow commented Mar 10, 2016

Ah!
hmm… the save method
So, upload.save() is in fact calling now Image.save() from Pillow.

Saves this image under the given filename. If no format is specified, the format to use is determined from the filename extension, if possible.

Keyword options can be used to provide additional instructions to the writer. If a writer doesn’t recognise an option, it is silently ignored. The available options are described in the image format documentation for each writer.
https://pillow.readthedocs.org/en/3.1.x/reference/Image.html#PIL.Image.Image.save

Good things:

  1. the save already optimizes the image I tested already
    capture d ecran 2016-03-10 a 15 10 07 As you can see the image has been modified and size is reduced. For JPEG. "The image quality, on a scale from 1 (worst) to 95 (best). The default is 75."
  2. The save() method of Pillow removes the EXIF data by default.

    exif. If present, the image will be stored with the provided raw EXIF data.

Bad thing

  1. An animated GIF will not be animated anymore. the save() is dependent on the format. For GIF, the doc says:

    When calling save(), if a multiframe image is used, by default only the first frame will be saved. To save all frames, the save_all parameter must be present and set to True.

QUESTIONS / TODO

  • Do we want to make sure to have animated GIF. I can modify this.
  • animated GIF becomes sometimes a bit dirty once they go through save. Might be an issue with pillow here. What do we want?
  • Do we want to optimize the JPEG further. there is an optimize parameter.
  • Do we want to optimize the PNG further. there is an optimize parameter.
  • tests? not sure. We would end up testing that pillow is working as expected. The test we could do which are really our parts is that we pass the right parameters for the right format @miketaylr ?

karlcow added a commit to karlcow/webcompat.com that referenced this issue Mar 10, 2016
karlcow added a commit to karlcow/webcompat.com that referenced this issue Mar 10, 2016
karlcow added a commit to karlcow/webcompat.com that referenced this issue Mar 10, 2016
@miketaylr
Copy link
Member

animated GIF becomes sometimes a bit dirty once they go through save. Might be an issue with pillow here. What do we want?

@karlcow is there any way to detect an animated gif and just not do anything for the time being? If so I'd suggest we ship for non-animated images first and come up with something more creative after that.

@miketaylr
Copy link
Member

We would end up testing that pillow is working as expected. The test we could do which are really our parts is that we pass the right parameters for the right format @miketaylr ?

Agreed with you here @karlcow.

@karlcow
Copy link
Member Author

karlcow commented Mar 29, 2016

@karlcow is there any way to detect an animated gif

@miketaylr Yes already testing for this with duration karlcow@3240856#commitcomment-16870889
I will see what I can do to bypass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants