You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I have a tar archive containing periodic full backups of the same database. Each backup is very similar to the previous one, so each additional backup compressed in the same tar archive is orders of magnitude smaller, after compression, than the same file compressed standalone. Unfortunately, today appending each new backup (so that compression can leverage the redundancies between files) basically entails decompressing the full tar file, appending the new backup to it, and then compressing the whole tar file again - because no compressor I am aware of implements a way to persist (or even just reconstruct) the compression state of an existing stream.
Describe the solution you'd like
I would like a way to append data to an existing zstd stream making use of state of the whole stream, so that the new data can be compressed efficiently exploiting redundancies with the data already present in the compressed stream.
The persistent compression state could very well be larger than the compressed stream: this is acceptable.
The persistent compression state does not need to be publicly documented, nor stable across versions or platforms. If the persisted compression state is invalid/corrupt, it should be ignored.
This could take the form of a --state STATE_FILE switch that could be used as follows:
# persist the compression state in STATE_FILE
zstd --state STATE_FILE -o OUTPUT_FILE INPUT_FILE
# append data to OUTPUT_FILE using the state from STATE_FILE, persist the final compression state in STATE_FILE
zstd --state STATE_FILE --append -o OUTPUT_FILE INPUT_FILE
Ideally, it should also be possible to reconstruct (and persist) the compression state of an existing stream.
If a stream consists of multiple independent sections (e.g. because the stream is rsyncable, or because a section was appended without making use of the persistent compression state) the persistent state would only be the one covering the section since the last state reset.
Describe alternatives you've considered
There are alternatives in the specific scenario I described above (e.g. do incremental backups, use a diff-like tool before compression, decompress+append+recompress, etc.) but they are not always practical or applicable in this or other scenarios.
Additional context
The text was updated successfully, but these errors were encountered:
There could be a way to append new data as part of the same stream as the existing compressed tar file,
however, there would still be a need to decompress the whole tar file first.
The main benefit would be that there would be no need to compress again the whole tar file.
Such an approach wouldn't need an additional state.
Is your feature request related to a problem? Please describe.
I have a tar archive containing periodic full backups of the same database. Each backup is very similar to the previous one, so each additional backup compressed in the same tar archive is orders of magnitude smaller, after compression, than the same file compressed standalone. Unfortunately, today appending each new backup (so that compression can leverage the redundancies between files) basically entails decompressing the full tar file, appending the new backup to it, and then compressing the whole tar file again - because no compressor I am aware of implements a way to persist (or even just reconstruct) the compression state of an existing stream.
Describe the solution you'd like
I would like a way to append data to an existing zstd stream making use of state of the whole stream, so that the new data can be compressed efficiently exploiting redundancies with the data already present in the compressed stream.
The persistent compression state could very well be larger than the compressed stream: this is acceptable.
The persistent compression state does not need to be publicly documented, nor stable across versions or platforms. If the persisted compression state is invalid/corrupt, it should be ignored.
This could take the form of a
--state STATE_FILE
switch that could be used as follows:Ideally, it should also be possible to reconstruct (and persist) the compression state of an existing stream.
If a stream consists of multiple independent sections (e.g. because the stream is rsyncable, or because a section was appended without making use of the persistent compression state) the persistent state would only be the one covering the section since the last state reset.
Describe alternatives you've considered
There are alternatives in the specific scenario I described above (e.g. do incremental backups, use a diff-like tool before compression, decompress+append+recompress, etc.) but they are not always practical or applicable in this or other scenarios.
Additional context
The text was updated successfully, but these errors were encountered: