Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on compression changes #102

Closed
kruskall opened this issue Jan 10, 2024 · 6 comments
Closed

Discussion on compression changes #102

kruskall opened this issue Jan 10, 2024 · 6 comments
Assignees

Comments

@kruskall
Copy link
Member

Goals:

What changed ?

Before gzip changes After gzip changes
before after

Why ?

If events are compressed in batches of size compressThreshold we can store the offsets and decompress that specific "container" to avoid decompressing the whole request in memory.
A buffer can also skip the compression completely if the flush timeout is hit and there's not enough data for compression.

Additional optimization technique: if we know there's only 1 event in a "container" we don't even need to decompress at all, we can just copy the compressed bytes since we have the offsets.

On the compression ratio/efficiency

There was some discussion on how compression worked before. To futher clarify the picture above:
The compression logic happened inside the bulk indexer:

func (b *bulkIndexer) add(item bulkIndexerItem) error {
b.writeMeta(item.Index, item.DocumentID)
if _, err := io.CopyBuffer(b.writer, item.Body, b.copybuf[:]); err != nil {
return err
}
if _, err := b.writer.Write([]byte("\n")); err != nil {
return err
}
b.itemsAdded++
return nil
}

The event was written to the gzip.Writer immediately and we were not buffering and then compressing all the events.
(The gzip.writer has buffers data internally but it's arbitrary and can't be configured)

Now we write to a buffer compress if we reach a certain size (configurable).

Testing

The open PR for document retries (#99) has testing to ensure the feature is working as intended.
Additional testing can be performed, see #98

Infinite retries

IMO this is an issue with the PR so it should be discussed there but we can discuss it more holistically.

In the current implementation events are submitted to the buffer and flushed in the next flush request. There's no max retry setting.

@axw
Copy link
Member

axw commented Jan 10, 2024

The event was written to the gzip.Writer immediately and we were not buffering and then compressing all the events.
(The gzip.writer has buffers data internally but it's arbitrary and can't be configured)

Now we write to a buffer compress if we reach a certain size (configurable).

gzip uses DEFLATE, which is block-based, and uses a 32KB sliding window dictionary. So why does it matter if we stream data in, compared to buffering and then compressing? If there's an improvement, can you please provide some benchmark comparisons?

@simitt
Copy link

simitt commented Jan 17, 2024

@kruskall could you provide an update on this?

@kruskall
Copy link
Member Author

We had a discussion and decided to revert the compression changes and stop compressing events in batches.

We've also decided to discard the optimization to send uncompressed data if the request is too small.

The new approach doesn't need offsets and just iterates over the array. It does not decompress the whole request in memory but read chunks of data.
Albeit slower, the new approach is a lot simpler.

@simitt
Copy link

simitt commented Jan 18, 2024

Please provide a bit more details on some of the decisions and the rationale behind them:

implement document retries without decompressing the whole payload/request

Do I understand correctly that this can be solved by iterating over an array and reading chunks of data without decompressing the whole body into memory?

We've also decided to discard the optimization to send uncompressed data if the request is too small.

Can you motivate the decision please? I'm not against it, but the rationale should be documented since it had been an open discussion before.

Albeit slower, the new approach is a lot simpler.

What does slower mean and can this be quantified?

The event was written to the gzip.Writer immediately and we were not buffering and then compressing all the events.
(The gzip.writer has buffers data internally but it's arbitrary and can't be configured)
Now we write to a buffer compress if we reach a certain size (configurable).

Are you reverting back to this solution, or are you looking into optimizing, given that you showed that the previous approach wasn't efficient with respect to compression ratio?

@kruskall
Copy link
Member Author

Sorry about that!

Do I understand correctly that this can be solved by iterating over an array and reading chunks of data without decompressing the whole body into memory?

Yes, see #99, we're reading batches of data to avoid decompressing the whole request.

Can you motivate the decision please? I'm not against it, but the rationale should be documented since it had been an open discussion before.

Avoiding compression if there is not enough data was an optimization that could be triggered if the flush interval was hit and there was not enough data. If that happens the application has low throughput and there is not a lot of benefits in saving some cpu time.

What does slower mean and can this be quantified?

We have to iterate over the array compared to looking up an offset at a specific point like we did before. This approach is gonna be slower.
The issue is not performance, as discussed in the team meeting, complexity was the concern and we simplified the approach.

Are you reverting back to this solution, or are you looking into optimizing, given that you showed that the previous approach wasn't efficient with respect to compression ratio?

This is unrelated to the retries, I've opened a new issue to discuss the issue and share a proof of concept: #108
Initial benchmarks in apm-server showed that writing a single byte (\n) took more time than writing the whole event to the gzip.Writer.

@kruskall
Copy link
Member Author

kruskall commented Feb 8, 2024

Compression changes have been reverted.

Buffering before compressing can be discussed in #108

@kruskall kruskall closed this as completed Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants