-
-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent GZIP compression level in AWS S3 storage backend #572
Comments
Good catch. I am loathe to add yet another setting and think we should simply standardize the value. It’s interesting that Python defaults to 9.
… On Aug 29, 2018, at 10:19 AM, Goetz ***@***.***> wrote:
When looking through AWS S3 storage backend in the current release 1.6.6 <https://github.com/jschneier/django-storages/blob/1.6.6/storages/backends/s3boto3.py>, I was wondering why the GZIP compression was implemented with two different compression levels:
Line 97 <https://github.com/jschneier/django-storages/blob/1.6.6/storages/backends/s3boto3.py#L97> in S3Boto3StorageFile . _get_file:
self._file = GzipFile(mode=self._mode, fileobj=self._file, mtime=0.0)
Line 396 <https://github.com/jschneier/django-storages/blob/1.6.6/storages/backends/s3boto3.py#L396> in S3Boto3Storage. _compress_content:
zfile = GzipFile(mode='wb', compresslevel=6, fileobj=zbuf, mtime=0.0)
According to the Python 3.6 documentation <https://docs.python.org/3.6/library/gzip.html#gzip.GzipFile>, the default compresslevel for GzipFile is 9. So, if I understand the code correctly, in line 97, the file is being compressed with level 9 if it already exists, and in line 396 with level 6 if it is a new file.
While I think this does not cause any problems, it is somewhat unexpected and inconsistent. If there is no technical reason for this particular behaviour, I would suggest to standardise and always use the same compresslevel.
Looking a bit further, I found that Django 2.1 <https://github.com/django/django/blob/stable/2.1.x/django/utils/text.py#L282> also includes a function django.utils.text.compress_string (and compress_sequence) which currently also uses a compresslevel of 6, so maybe that's where the 6 comes from?
Related to this, when the code might already get touched anyway, why not go one step further and add a Django configuration setting to allow the user to adjust the compression level and provide a sensible default if the setting is not provided?
Currently, the gzip-related configuration settings for the AWS S3 storage backend are:
AWS_IS_GZIPPED
GZIP_CONTENT_TYPES
I also checked out quickly how other Django storage packages deal with compression:
django-pipeline <https://github.com/jazzband/django-pipeline/> in its current release 1.6.14 does not specify a compression level on line 53 <https://github.com/jazzband/django-pipeline/blob/1.6.x/pipeline/storage.py#L53> in GZIPMixin. _compress: gzip_file = gzip.GzipFile(mode='wb', fileobj=content) - which means that it falls back to the Python default compresslevel, which is 9.
django-precompressed <https://libraries.io/pypi/django-precompressed>, which is probably no longer actively maintained, provides a configuration setting DEFAULT_COMPRESS_LEVEL with a default of 9.
Based on the existing configuration settings in django-storages, a new setting GZIP_COMPRESSION_LEVEL could be one of the options.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#572>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACJB2GpY6JKtyPjz1XPKfFRi_hSneTtMks5uVqLWgaJpZM4WRpV_>.
|
While spelunking through the |
Thanks for looking into that! I wanted to compare the file sizes of different compression levels, but haven't done any tests yet. If |
When looking through AWS S3 storage backend in the current release 1.6.6, I was wondering why the GZIP compression was implemented with two different compression levels:
S3Boto3StorageFile . _get_file
:self._file = GzipFile(mode=self._mode, fileobj=self._file, mtime=0.0)
S3Boto3Storage. _compress_content
:zfile = GzipFile(mode='wb', compresslevel=6, fileobj=zbuf, mtime=0.0)
According to the Python 3.6 documentation, the default
compresslevel
forGzipFile
is9
. So, if I understand the code correctly, in line 97, the file is being compressed with level 9 if it already exists, and in line 396 with level 6 if it is a new file.While I think this does not cause any problems, it is somewhat unexpected and inconsistent. If there is no technical reason for this particular behaviour, I would suggest to standardise and always use the same
compresslevel
.Looking a bit further, I found that Django 2.1 also includes a function
django.utils.text.compress_string
(andcompress_sequence
) which currently also uses acompresslevel
of6
, so maybe that's where the6
comes from?Related to this, when the code might already get touched anyway, why not go one step further and add a Django configuration setting to allow the user to adjust the compression level and provide a sensible default if the setting is not provided?
Currently, the gzip-related configuration settings for the AWS S3 storage backend are:
AWS_IS_GZIPPED
GZIP_CONTENT_TYPES
I also checked out quickly how other Django storage packages deal with compression:
django-pipeline
in its current release 1.6.14 does not specify a compression level on line 53 inGZIPMixin. _compress
:gzip_file = gzip.GzipFile(mode='wb', fileobj=content)
- which means that it falls back to the Python defaultcompresslevel
, which is9
.django-precompressed
, which is probably no longer actively maintained, provides a configuration settingDEFAULT_COMPRESS_LEVEL
with a default of9
.Based on the existing configuration settings in
django-storages
, a new settingGZIP_COMPRESSION_LEVEL
could be one of the options.The text was updated successfully, but these errors were encountered: