diff --git a/Doc/library/gzip.rst b/Doc/library/gzip.rst index 33c40676f747c5..8cea2649ee6cb6 100644 --- a/Doc/library/gzip.rst +++ b/Doc/library/gzip.rst @@ -174,19 +174,30 @@ The module defines the following items: Compress the *data*, returning a :class:`bytes` object containing the compressed data. *compresslevel* and *mtime* have the same meaning as in - the :class:`GzipFile` constructor above. + the :class:`GzipFile` constructor above. When *mtime* is set to ``0``, this + function is equivalent to :func:`zlib.compress` with *wbits* set to ``31``. + The zlib function is faster. .. versionadded:: 3.2 .. versionchanged:: 3.8 Added the *mtime* parameter for reproducible output. + .. versionchanged:: 3.11 + Speed is improved by compressing all data at once instead of in a + streamed fashion. Calls with *mtime* set to ``0`` are delegated to + :func:`zlib.compress` for better speed. .. function:: decompress(data) Decompress the *data*, returning a :class:`bytes` object containing the - uncompressed data. + uncompressed data. This function is capable of decompressing multi-member + gzip data (multiple gzip blocks concatenated together). When the data is + certain to contain only one member the :func:`zlib.decompress` function with + *wbits* set to 31 is faster. .. versionadded:: 3.2 - + .. versionchanged:: 3.11 + Speed is improved by decompressing members at once in memory instead of in + a streamed fashion. .. _gzip-usage-examples: diff --git a/Doc/library/zlib.rst b/Doc/library/zlib.rst index ec60ea24db6627..793c90f3c4e7a4 100644 --- a/Doc/library/zlib.rst +++ b/Doc/library/zlib.rst @@ -47,7 +47,7 @@ The available exception and functions in this module are: platforms, use ``adler32(data) & 0xffffffff``. -.. function:: compress(data, /, level=-1) +.. function:: compress(data, /, level=-1, wbits=MAX_WBITS) Compresses the bytes in *data*, returning a bytes object containing compressed data. *level* is an integer from ``0`` to ``9`` or ``-1`` controlling the level of compression; @@ -55,11 +55,35 @@ The available exception and functions in this module are: is slowest and produces the most. ``0`` (Z_NO_COMPRESSION) is no compression. The default value is ``-1`` (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default compromise between speed and compression (currently equivalent to level 6). + + .. _compress-wbits: + + The *wbits* argument controls the size of the history buffer (or the + "window size") used when compressing data, and whether a header and + trailer is included in the output. It can take several ranges of values, + defaulting to ``15`` (MAX_WBITS): + + * +9 to +15: The base-two logarithm of the window size, which + therefore ranges between 512 and 32768. Larger values produce + better compression at the expense of greater memory usage. The + resulting output will include a zlib-specific header and trailer. + + * −9 to −15: Uses the absolute value of *wbits* as the + window size logarithm, while producing a raw output stream with no + header or trailing checksum. + + * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the + window size logarithm, while including a basic :program:`gzip` header + and trailing checksum in the output. + Raises the :exc:`error` exception if any error occurs. .. versionchanged:: 3.6 *level* can now be used as a keyword parameter. + .. versionchanged:: 3.11 + The *wbits* parameter is now available to set window bits and + compression type. .. function:: compressobj(level=-1, method=DEFLATED, wbits=MAX_WBITS, memLevel=DEF_MEM_LEVEL, strategy=Z_DEFAULT_STRATEGY[, zdict]) @@ -76,23 +100,9 @@ The available exception and functions in this module are: *method* is the compression algorithm. Currently, the only supported value is :const:`DEFLATED`. - The *wbits* argument controls the size of the history buffer (or the - "window size") used when compressing data, and whether a header and - trailer is included in the output. It can take several ranges of values, - defaulting to ``15`` (MAX_WBITS): - - * +9 to +15: The base-two logarithm of the window size, which - therefore ranges between 512 and 32768. Larger values produce - better compression at the expense of greater memory usage. The - resulting output will include a zlib-specific header and trailer. - - * −9 to −15: Uses the absolute value of *wbits* as the - window size logarithm, while producing a raw output stream with no - header or trailing checksum. - - * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the - window size logarithm, while including a basic :program:`gzip` header - and trailing checksum in the output. + The *wbits* parameter controls the size of the history buffer (or the + "window size"), and what header and trailer format will be used. It has + the same meaning as `described for compress() <#compress-wbits>`__. The *memLevel* argument controls the amount of memory used for the internal compression state. Valid values range from ``1`` to ``9``. diff --git a/Lib/gzip.py b/Lib/gzip.py index 3d837b744800ed..0dddb51553fabd 100644 --- a/Lib/gzip.py +++ b/Lib/gzip.py @@ -403,6 +403,59 @@ def __iter__(self): return self._buffer.__iter__() +def _read_exact(fp, n): + '''Read exactly *n* bytes from `fp` + + This method is required because fp may be unbuffered, + i.e. return short reads. + ''' + data = fp.read(n) + while len(data) < n: + b = fp.read(n - len(data)) + if not b: + raise EOFError("Compressed file ended before the " + "end-of-stream marker was reached") + data += b + return data + + +def _read_gzip_header(fp): + '''Read a gzip header from `fp` and progress to the end of the header. + + Returns last mtime if header was present or None otherwise. + ''' + magic = fp.read(2) + if magic == b'': + return None + + if magic != b'\037\213': + raise BadGzipFile('Not a gzipped file (%r)' % magic) + + (method, flag, last_mtime) = struct.unpack(" bytes: + """ + Write a simple gzip header with no extra fields. + :param compresslevel: Compresslevel used to determine the xfl bytes. + :param mtime: The mtime (must support conversion to a 32-bit integer). + :return: A bytes object representing the gzip header. + """ + if mtime is None: + mtime = time.time() + if compresslevel == _COMPRESS_LEVEL_BEST: + xfl = 2 + elif compresslevel == _COMPRESS_LEVEL_FAST: + xfl = 4 + else: + xfl = 0 + # Pack ID1 and ID2 magic bytes, method (8=deflate), header flags (no extra + # fields added to header), mtime, xfl and os (255 for unknown OS). + return struct.pack("