Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Zip64 when compressing iterables and strings #25

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

arjan-s
Copy link

@arjan-s arjan-s commented Mar 23, 2017

Zip64 support currently doesn't work when the inputs for the generated zip file are iterables and/or strings. The reason for this is that __write() assumes a file size of 0 bytes for those inputs. Because of the file size the module doesn't enable Zip64 support. When it detects during compression that the file size is larger than ZIP64_LIMIT, it (correctly) raises RuntimeError('File size has increased during compressing').

This patch adds the optional (and thus backwards compatible) argument buffer_size to write_iter and writestr. This allows programs using the module to specify the buffer size that will result from the iterable or string, and in turn that allows __write() to enable Zip64 support when necessary.

Arjan Schrijver and others added 4 commits March 23, 2017 10:32
I use this to flush partial zips as files are streamed into
 them from requests, and then at the end add manifest and
 error files to the end of the archive

I've also added a related test and example of use
@matthewatabet
Copy link

Would it be possible to merge this? I ran into this issue because I need to stream very large amounts of data.

@arjan-s
Copy link
Author

arjan-s commented May 23, 2019

Hi @matthewatabet, it seems the original developer of this package is not active anymore. So I've forked the package as zipstream-new: #33

arjan-s and others added 13 commits June 6, 2019 10:30
Add partial flushing of ZipStreams
When flushing, stream out the iterators in First in first out order. Python `pop()` with no arguments would take the last path but I think it makes sense to stream the first things first.

We ran into this issue where we add a bunch of files which depend on long-running futures to provide the data. The futures hit a server which processes them roughly in order, so we get better streaming performance if we make this change.
Stream data in order it was received
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request May 30, 2021
Changes made after forking v1.1.4:

v1.1.5 (2019-03-18)

 * Support Zip64 when compressing iterables and strings (allanlei/python-zipstream#25)

v1.1.6 (2019-06-06)

 * Add partial flushing of ZipStreams (arjan-s/python-zipstream#1)

v1.1.7 (2019-10-22)

 * Stream data in the order it was received (arjan-s/python-zipstream#4)

v1.1.8 (2020-09-14)

 * New datetime parameter in write_iter (arjan-s/python-zipstream#8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants