-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
append to brotli compressed file #628
Comments
No. Brotli is a bare data stream format, so we decided to make it an error, if something goes after a stream. The good news, is that we are working on framing format, that (likely) will provide such ability... and much more =) (with some overhead being paid, of course). |
It might be possible to structure a Brotli stream to have such a property with a few small tweaks to the compressor:
With these restrictions you should be able to concatenate them together after dropping the last 2 nonzero bits, right? You may need a bit more magic, like by inserting a MNIBBLES=0 block to byte-align the metablocks at the end if you don't want to concatenate on the bit-level |
I solved this issue in the drop-in brotli library here https://github.com/dropbox/rust-brotli by doing the above 3 ideas and by disabling the recent items in the distance map. Hope that helps! |
The problem with "catable" brotli is that it is impossible to prove that given file is "catable" without fully decompressing it. BTW, thank you, Daniel for developing rust-brotli, that is awesome project! |
Yes it's true that it is impossible to verify without decompressing. That is one of the main reasons I added this header magic number as the first metadata metablock: the header contains information about whether the file was designed to concatenate. Of course that's advisory, you would still need to decompress to fully verify. It's still likely faster than compressing the concatenated chunk. Perhaps the default mode should refuse to concatenate the file if the header is missing. It has some heuristics already to look for 'concatability', which rule out files generated with default brotli-like tools. And you are correct: rust-brotli uses this catable flag internally to make multithreaded files. I think for internally created files, it could allow dictionary usage from any of the threads since we prepend the previous parts of the file to the ring buffer, so naturally it should look farther for the dictionary, but I haven't tried that mode of operation It doesn't seem to me that splitting a file N ways often results in Nx improvement: it appears that certain parts of the file require significantly more CPU time than other parts. I haven't profiled the compression much yet. |
Hi, can you please provide issues/plans, where we can read about this? |
There is a need to combine several precompressed chunks. This chunks might be precompressed in the special format that allows concatenating in any sequence. This need relates to output precached content from webserver. Have any related features been implemented in the library? Thanks. |
@s-sols : I have created a binary-compatible brotli library here with a new option flag to create concattable files here: |
@danielrh hey super cool work! I got the c libraries to compile without problem question about python and appending to an existing brotli "stream" https://github.com/dropbox/rust-brotli/blob/master/c/py/brotli_test.py how would that work? Are you able to add a that py test case there in that src? Not quite sure where that existing compressed brotli would get passed to output = BrotliCompress(self.test_data,
{
BROTLI_PARAM_QUALITY:5,
BROTLI_PARAM_CATABLE:1,
BROTLI_PARAM_MAGIC_NUMBER:1,
},
1)
rt = BrotliDecode(output, 2) |
Hmm. Reconsidering. While brotli stream does not like "tails" we can overcome that in CLI. Will see if we can have it in v1.1 |
Is it possible to append data to a brotli compressed file such that it is compressed automatically?
The following would work, e.g. for gzip
While the same approach would fail using
brotli
withcorrupt input
The text was updated successfully, but these errors were encountered: