Discussion on compression changes #102

kruskall · 2024-01-10T01:35:20Z

Goals:

avoid aggressive gzip compression (see perf: flush timeout might trigger aggressive compression #94)
implement document retries without decompressing the whole payload/request
more efficient compression (optional)

What changed ?

Before gzip changes	After gzip changes

Why ?

If events are compressed in batches of size compressThreshold we can store the offsets and decompress that specific "container" to avoid decompressing the whole request in memory.
A buffer can also skip the compression completely if the flush timeout is hit and there's not enough data for compression.

Additional optimization technique: if we know there's only 1 event in a "container" we don't even need to decompress at all, we can just copy the compressed bytes since we have the offsets.

On the compression ratio/efficiency

There was some discussion on how compression worked before. To futher clarify the picture above:
The compression logic happened inside the bulk indexer:

go-docappender/bulk_indexer.go

Lines 174 to 184 in 0de03a5

    
           func (b *bulkIndexer) add(item bulkIndexerItem) error { 
        
           	b.writeMeta(item.Index, item.DocumentID) 
        
           	if _, err := io.CopyBuffer(b.writer, item.Body, b.copybuf[:]); err != nil { 
        
           		return err 
        
           	} 
        
           	if _, err := b.writer.Write([]byte("\n")); err != nil { 
        
           		return err 
        
           	} 
        
           	b.itemsAdded++ 
        
           	return nil 
        
           }

The event was written to the gzip.Writer immediately and we were not buffering and then compressing all the events.
(The gzip.writer has buffers data internally but it's arbitrary and can't be configured)

Now we write to a buffer compress if we reach a certain size (configurable).

Testing

The open PR for document retries (#99) has testing to ensure the feature is working as intended.
Additional testing can be performed, see #98

Infinite retries

IMO this is an issue with the PR so it should be discussed there but we can discuss it more holistically.

In the current implementation events are submitted to the buffer and flushed in the next flush request. There's no max retry setting.

The text was updated successfully, but these errors were encountered:

axw · 2024-01-10T02:14:48Z

The event was written to the gzip.Writer immediately and we were not buffering and then compressing all the events.
(The gzip.writer has buffers data internally but it's arbitrary and can't be configured)

Now we write to a buffer compress if we reach a certain size (configurable).

gzip uses DEFLATE, which is block-based, and uses a 32KB sliding window dictionary. So why does it matter if we stream data in, compared to buffering and then compressing? If there's an improvement, can you please provide some benchmark comparisons?

simitt · 2024-01-17T07:45:49Z

@kruskall could you provide an update on this?

kruskall · 2024-01-18T11:36:32Z

We had a discussion and decided to revert the compression changes and stop compressing events in batches.

We've also decided to discard the optimization to send uncompressed data if the request is too small.

The new approach doesn't need offsets and just iterates over the array. It does not decompress the whole request in memory but read chunks of data.
Albeit slower, the new approach is a lot simpler.

simitt · 2024-01-18T13:23:25Z

Please provide a bit more details on some of the decisions and the rationale behind them:

implement document retries without decompressing the whole payload/request

Do I understand correctly that this can be solved by iterating over an array and reading chunks of data without decompressing the whole body into memory?

We've also decided to discard the optimization to send uncompressed data if the request is too small.

Can you motivate the decision please? I'm not against it, but the rationale should be documented since it had been an open discussion before.

Albeit slower, the new approach is a lot simpler.

What does slower mean and can this be quantified?

The event was written to the gzip.Writer immediately and we were not buffering and then compressing all the events.
(The gzip.writer has buffers data internally but it's arbitrary and can't be configured)
Now we write to a buffer compress if we reach a certain size (configurable).

Are you reverting back to this solution, or are you looking into optimizing, given that you showed that the previous approach wasn't efficient with respect to compression ratio?

kruskall · 2024-01-18T13:45:16Z

Sorry about that!

Do I understand correctly that this can be solved by iterating over an array and reading chunks of data without decompressing the whole body into memory?

Yes, see #99, we're reading batches of data to avoid decompressing the whole request.

Can you motivate the decision please? I'm not against it, but the rationale should be documented since it had been an open discussion before.

Avoiding compression if there is not enough data was an optimization that could be triggered if the flush interval was hit and there was not enough data. If that happens the application has low throughput and there is not a lot of benefits in saving some cpu time.

What does slower mean and can this be quantified?

We have to iterate over the array compared to looking up an offset at a specific point like we did before. This approach is gonna be slower.
The issue is not performance, as discussed in the team meeting, complexity was the concern and we simplified the approach.

Are you reverting back to this solution, or are you looking into optimizing, given that you showed that the previous approach wasn't efficient with respect to compression ratio?

This is unrelated to the retries, I've opened a new issue to discuss the issue and share a proof of concept: #108
Initial benchmarks in apm-server showed that writing a single byte (\n) took more time than writing the whole event to the gzip.Writer.

kruskall · 2024-02-08T11:22:25Z

Compression changes have been reverted.

Buffering before compressing can be discussed in #108

simitt assigned kruskall Jan 19, 2024

kruskall closed this as completed Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion on compression changes #102

Discussion on compression changes #102

kruskall commented Jan 10, 2024

axw commented Jan 10, 2024

simitt commented Jan 17, 2024

kruskall commented Jan 18, 2024

simitt commented Jan 18, 2024

kruskall commented Jan 18, 2024

kruskall commented Feb 8, 2024

Discussion on compression changes #102

Discussion on compression changes #102

Comments

kruskall commented Jan 10, 2024

What changed ?

Why ?

On the compression ratio/efficiency

Testing

Infinite retries

axw commented Jan 10, 2024

simitt commented Jan 17, 2024

kruskall commented Jan 18, 2024

simitt commented Jan 18, 2024

kruskall commented Jan 18, 2024

kruskall commented Feb 8, 2024