-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zstd as a ZIP compression method #25
Comments
Update: |
I added both Method IDs in c66c832, which means the The latest version of PKWARE's PKZIP for Windows does not offer Zstd compression in ZIP files, and I don't know what other ZIP programs (besides WinZip) offer Zstd support. |
I read your blog article "Random access compression and zstd" and learned that you would like to have both the space savings and the random access option. Would the ZIP format with the Zstd compression help? Any feedback about this feature would be greatly appreciated. |
A key problem with zip files is that each file is compressed independently. That is crucial to allow random access, after all. With zstd, you can have dictionary that is trained over the files and get the best of both worlds. That said, we already implemented this in our software, which deals with compression of data inside a database. Having that in a zip file is nice, but not required. |
@ipaucek4680 |
@ayende @ipaucek4680 |
@jinfeihan57
In this case, is 7-zip (or p7zip) able to skip reading and decompressing the first 3 GB? For ZIP, it can access |
@jinfeihan57 I actually continuously test the compression ratio, and it if drops below a certain value, we'll generate a new dictionary based on recent information. So it is self adjusting. |
@ipaucek4680 |
@ayende |
I may end up with a lot of dictionaries, yes. |
@ipaucek4680 |
There is nothing easier than creating non-solid archive. Every file separately:
In N-files blocks ( N is number of files in block, here: 100):
In M-bytes blocks ( M - size of block, here: 10MB)
All files in one solid block:
As @jinfeihan57 said it will allow you random access but will decrease compression. You may want to test block solid options that will allow you faster access and better than non-solid but worse than (full) solid compression. You can test how exactly it works with |
Regarding which method id to use, zip format version 6.3.7 uses id 20, but 6.3.8 moved it to 93 for some reason. (Source: wikipedia) So, which method we want to use depends on which version of the zip standard we want to follow. I'm guessing that's why WinZip uses id 93, as that pertains to a newer version of zip. Maybe we should just do the same? |
First of all, thank you for your work on enabling p7zip users to use Zstd with the 7z archive file format.
I would like to use Zstd as a ZIP compression method because of the random access feature present in ZIP but not in 7z:
In addition, the ZIP File Format Specification by PKWARE Inc. has added the Zstandard compression method ID, and WinZip has added the Zstd method to the ZIP (ZIPX) format (please see this PDF and this webpage).
I found that the compression ratio of a ZIP file using Zstd Level 3 is similar to Deflate64 Level 5, while Zstd Level 3 performs (much) faster than Deflate64 Level 5.
I tried adding Zstd as a ZIP compression method (please see my code here) and it seems I can create, list and update a ZIP file using Zstd.
The text was updated successfully, but these errors were encountered: