Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible improvement to compression ratio #17

Open
silentbicycle opened this issue May 24, 2015 · 2 comments
Open

Possible improvement to compression ratio #17

silentbicycle opened this issue May 24, 2015 · 2 comments

Comments

@silentbicycle
Copy link
Collaborator

Currently, the match length is encoded as the length - 1 (since a match of 0 would be useless), but the current rule for whether a match is worth using clamps the minimum length to 3. Storing the match length as length - 3 would allow matches of a few more bytes to be represented in the same number of bits, improving the compression ratio in some cases.

On quick review, it also appears the value for max_possible in st_step_search could be increased by this amount. This doesn't affect correctness, but again could make compression more effective.

These changes will break reverse compatibility for old compressed bytestreams, so they should not be integrated until an appropriate version number increase.

@unixdj
Copy link
Contributor

unixdj commented May 24, 2015

This would increase the window length to (1 << window_sz2 + break_even_point), which would make buffering trickier and possibly break things when window_sz2 == lookahead_sz2. That said, it's still a good idea.

But if you're going to break backwards compatibility, you may also drop the first bit of the compressed stream, because no stream may start with a backref. If we push this idea further, we may make early backrefs (occuring before 1 << (window_sz2 - 1) bytes into the stream) shorter (e.g., a backref 8 bytes into the stream would have no more than 3 bits of index and count each, and a break even point of 1), but the ridiculousness level in this paragraph is rising fast.

@unixdj
Copy link
Contributor

unixdj commented May 25, 2015

Oh wait, no, it wouldn't increase the window length. But the window==lookahead case is still tricky, due to lookahead increase that you mention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants