Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add approximate compression ratio for messages and values #191

Merged
merged 8 commits into from
Apr 11, 2024

Conversation

brayniac
Copy link
Contributor

@brayniac brayniac commented Apr 4, 2024

Adds approximate compression ratio targeting for messages and values.

This is done by limiting the number of random bytes in the payload to the first N bytes and iteratively estimating the compression ratio using gzip.

brayniac added 3 commits April 4, 2024 09:59
Adds approximate compression ratio targeting for messages and
values.

This is done by limiting the number of random bytes in the payload
to the first N bytes and iteratively estimating the compression
ratio using gzip.
Adds a new smoketest config
@brayniac brayniac changed the title feat: add approximate compression ratio for messgaes and values feat: add approximate compression ratio for messages and values Apr 4, 2024
@brayniac brayniac force-pushed the compression-ratio branch from f18415e to cd813eb Compare April 10, 2024 18:21
@brayniac brayniac requested a review from mihirn April 10, 2024 18:23
src/config/workload.rs Outdated Show resolved Hide resolved
configs/smoketest.toml Outdated Show resolved Hide resolved
configs/segcache.toml Outdated Show resolved Hide resolved
configs/blabber.toml Outdated Show resolved Hide resolved
configs/smoketest.toml Outdated Show resolved Hide resolved
@@ -162,7 +163,7 @@ impl Generator {
// add a header
[m[0], m[1], m[2], m[3], m[4], m[5], m[6], m[7]] =
[0x54, 0x45, 0x53, 0x54, 0x49, 0x4E, 0x47, 0x21];
rng.fill(&mut m[32..topics.message_len]);
rng.fill(&mut m[32..(topics.message_random_bytes + 32)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To Yuri's point about the entropy being evenly distributed across the payload, is it worth shuffling the vector after this if message_random_bytes != message_len?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it really depends what content we're trying to emulate. I think some further analysis and design decisions would be necessary before deciding on a strategy. I expect that random bytes spread throughout the message still doesn't look like, as an example, english text or json in terms of how the entropy is distributed and what the expected symbols even are.

Do we even find this has an impact for the compression algorithms we anticipate being used for transport and/or storage?

I'm voting to defer this to a follow-up PR. We don't have enough information to inform the design right now but we do have the need to produce payloads that are compressible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#196 created to track

src/workload/mod.rs Outdated Show resolved Hide resolved
src/workload/mod.rs Outdated Show resolved Hide resolved

fn estimate_random_bytes_needed(length: usize, compression_ratio: f64) -> usize {
// if compression ratio is low, all bytes should be random
if compression_ratio <= 1.0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to short circuit exit if length == 0 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed. Initializing the PRNG shouldn't be too expensive, and it happens only once per keyspace/topic-space.

@brayniac brayniac merged commit 22604c6 into iopsystems:main Apr 11, 2024
14 checks passed
@brayniac brayniac deleted the compression-ratio branch April 11, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants