[testbed] Add batcher performance tests #36206

swiatekm · 2024-11-05T12:13:37Z

Description

Add basic batching benchmarks. The primary intent of these is to help verify that we're not introducing performance regressions with open-telemetry/opentelemetry-collector#8122.

I've taken the the 10k DPS benchmark for logs, and ran it in different configurations:

Batching either enabled or disabled
In-memory queue enabled or disabled
Using the batch processor at the start of the pipeline or the new exporter batcher
All of this either with no processors or with some basic filtering and transformation

I've reduced the input batch size to 10 to better capture the effect of having no batching at the start of the pipeline on processor performance, which is one of the concerns with moving batching to the exporter.

For now, I'd like to get some comments on whether this is sufficient, or if I should expand my scope to other signal types and/or different parameters for the benchmarks.

Current results

Test	Result	Duration	CPU Avg%	CPU Max%	RAM Avg MiB	RAM Max MiB	Sent Items	Received Items
Log10kDPSNoProcessors/No_batching,_no_queue	PASS	15s	20.7	22.7	73	103	150100	150100
Log10kDPSNoProcessors/No_batching,_queue	PASS	15s	19.1	21.7	73	103	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	13.1	14.3	77	109	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	12.7	14.0	75	108	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	15s	15.5	17.3	72	101	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	14.3	15.7	72	102	150100	150100
Log10kDPSWithProcessors/No_batching,_no_queue	PASS	15s	21.5	23.0	72	103	150100	150100
Log10kDPSWithProcessors/No_batching,_queue	PASS	15s	22.2	23.0	72	102	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	22.5	26.0	75	107	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	18.0	19.7	73	104	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	16s	18.3	20.7	75	106	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	17.1	17.7	73	102	150100	150100

It looks like the new batcher is a bit less performant if the pipeline doesn't contain any processors, but is in fact faster if processors are present, which is surprising to me. But this does assuage our fears that we'd tank processor performance by moving batching to the end of the pipeline.

Link to tracking issue

Fixes open-telemetry/opentelemetry-collector#10836

swiatekm · 2024-11-05T12:16:10Z

@dmitryax @sfc-gh-sili I'd love some feedback as to whether this is useful for your purposes, and if I should broaden the scope.

jmacd · 2024-11-07T17:21:16Z

By the way, I'm not convinced that the testbed is appropriately sophisticated to measure and describe the differences between the two batchers. The amount of work being done is the same, so we expect the same amount of compute resource. There are still a few salient differences between the two forms of batching that would not be teased apart by this benchmark, for example the way that exporter-batching has to serialize multiple requests and can't benefit from the queue for concurrency when batches are reduced in size. The effect is on individual request latency, which the testbed doesn't measure.

I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data.

I also claim we'll never get rid of the batch processor and should fix it. As a shining example, the groupbyattr processor absolutely benefits from the batch processor for the reason stated above (in addition to other benefits). https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor

swiatekm · 2024-11-10T15:34:32Z

I'm working on an RFC to describe ideal batching behavior, and there are performance arguments you could test here. When there are processors that do CPU-intensive work, there is likely to be an positive impact from the batch processor because larger batches will perform better due to CPU and memory caches. I.e., if there is compute-intensive work, the batch processor is likely to lead to an optimization because we will invoke the subsequent processors fewer times w/ the same amount of data.

@jmacd any idea on how I could simulate a CPU-heavy processor here? The processors I did add to my benchmark were intended to simulate the average use case, but I'd be happy to have a more skewed benchmark there as well.

atoulme · 2024-11-12T16:53:14Z

Please rebase, fix conflicts and add changelog.

sfc-gh-sili · 2024-11-22T23:36:13Z

@swiatekm Thank you, Mikołaj! This is great.
Here are a couple of things that I'd be curious to know:

Sanity check that batching in exporter is about the same resource consumption as batching in processor, which this PR does already.
Like Joshua mentioned, I am too curious if not batching earlier causes higher resource usage for more expensive processing down in the pipeline.
(3. I'd be curious how size of the goroutine pool used by exporter would affects the performance. seems it's not something we've looked into so far)

swiatekm · 2024-12-03T16:45:51Z

Can we merge this PR as-is, and I'll add another benchmark in a follow-up? @MovieStoreGuy

@jmacd @sfc-gh-sili I can see the argument that we could get better performance on some processing tasks because batch size can affect CPU cache saturation. I'm not 100% sure how to verify this though. Maybe spawn 100 transform processors doing the same thing? In that case though, I think we'd see the overhead of nested function calls more than anything else though.

# Conflicts: # cmd/oteltestbedcol/builder-config.yaml # Conflicts: # cmd/oteltestbedcol/builder-config.yaml

swiatekm · 2024-12-06T14:50:12Z

Reran the benchmark on current main:

Test	Result	Duration	CPU Avg%	CPU Max%	RAM Avg MiB	RAM Max MiB	Sent Items	Received Items
Log10kDPSNoProcessors/No_batching,_no_queue	PASS	15s	19.9	25.7	75	104	150100	150100
Log10kDPSNoProcessors/No_batching,_queue	PASS	15s	16.4	17.3	71	100	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	12.6	13.7	76	108	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	11.9	12.7	74	106	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	15s	13.8	15.3	70	100	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	13.2	13.7	72	104	150100	150100
Log10kDPSWithProcessors/No_batching,_no_queue	PASS	15s	19.3	20.3	73	103	150100	150100
Log10kDPSWithProcessors/No_batching,_queue	PASS	15s	19.6	21.0	73	102	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	20.0	21.7	73	102	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	19.8	20.7	73	104	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	15s	17.1	19.7	74	104	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	17.7	20.3	74	104	150100	150100

I also ran it with the exporter.UsePullingBasedExporterQueueBatcher feature gate introduced in open-telemetry/opentelemetry-collector#11637 enabled:

Test	Result	Duration	CPU Avg%	CPU Max%	RAM Avg MiB	RAM Max MiB	Sent Items	Received Items
Log10kDPSNoProcessors/No_batching,_no_queue	PASS	15s	16.8	18.0	74	103	150100	150100
Log10kDPSNoProcessors/No_batching,_queue	PASS	15s	16.5	16.7	73	103	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	11.5	12.3	77	109	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	12.2	13.0	76	109	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	15s	11.7	12.3	73	104	150100	150100
Log10kDPSNoProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	12.8	14.0	76	107	150100	150100
Log10kDPSWithProcessors/No_batching,_no_queue	PASS	15s	25.5	27.0	73	104	150100	150100
Log10kDPSWithProcessors/No_batching,_queue	PASS	15s	21.6	23.7	74	106	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_no_queue	PASS	15s	21.0	24.0	73	105	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_batch_processor,_queue	PASS	15s	20.7	21.0	72	103	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_no_queue	PASS	15s	16.0	20.0	77	109	150100	150100
Log10kDPSWithProcessors/Batch_size_1000_with_exporter_batcher,_queue	PASS	15s	15.5	16.0	78	112	150100	150100

Looks like the new batcher is indeed slightly more performant in these simple scenarios.

ChrsMark

lgtm

github-actions bot added cmd/oteltestbedcol testbed labels Nov 5, 2024

swiatekm force-pushed the test/testbed-batcher branch from 35c4618 to 89f3353 Compare November 5, 2024 14:07

jmacd approved these changes Nov 6, 2024

View reviewed changes

swiatekm force-pushed the test/testbed-batcher branch from 89f3353 to ace990a Compare November 13, 2024 14:00

swiatekm marked this pull request as ready for review November 13, 2024 14:00

swiatekm requested a review from a team as a code owner November 13, 2024 14:00

swiatekm requested a review from ChrsMark November 13, 2024 14:00

github-actions bot assigned MovieStoreGuy Nov 13, 2024

swiatekm force-pushed the test/testbed-batcher branch 2 times, most recently from c57c82c to e6a49ca Compare November 13, 2024 15:04

MovieStoreGuy approved these changes Nov 14, 2024

View reviewed changes

swiatekm force-pushed the test/testbed-batcher branch from e6a49ca to 7061eeb Compare November 14, 2024 10:54

swiatekm force-pushed the test/testbed-batcher branch from 7061eeb to f872ba4 Compare December 3, 2024 16:37

swiatekm force-pushed the test/testbed-batcher branch 2 times, most recently from 8fff77a to be0715a Compare December 4, 2024 12:20

[testbed] Add batcher performance tests

f08d866

# Conflicts: # cmd/oteltestbedcol/builder-config.yaml # Conflicts: # cmd/oteltestbedcol/builder-config.yaml

swiatekm force-pushed the test/testbed-batcher branch from be0715a to f08d866 Compare December 6, 2024 14:18

ChrsMark approved these changes Dec 6, 2024

View reviewed changes

djaglowski merged commit 64384a0 into open-telemetry:main Dec 6, 2024
158 checks passed

github-actions bot added this to the next release milestone Dec 6, 2024

ZenoCC-Peng pushed a commit to ZenoCC-Peng/opentelemetry-collector-contrib that referenced this pull request Dec 6, 2024

[testbed] Add batcher performance tests (open-telemetry#36206)

683d0f3

swiatekm deleted the test/testbed-batcher branch December 9, 2024 11:46

sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024

[testbed] Add batcher performance tests (open-telemetry#36206)

7e1cf38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[testbed] Add batcher performance tests #36206

[testbed] Add batcher performance tests #36206

swiatekm commented Nov 5, 2024

swiatekm commented Nov 5, 2024

jmacd commented Nov 7, 2024 •

edited

Loading

swiatekm commented Nov 10, 2024

atoulme commented Nov 12, 2024 •

edited

Loading

sfc-gh-sili commented Nov 22, 2024

swiatekm commented Dec 3, 2024 •

edited

Loading

swiatekm commented Dec 6, 2024 •

edited

Loading

ChrsMark left a comment

[testbed] Add batcher performance tests #36206

[testbed] Add batcher performance tests #36206

Conversation

swiatekm commented Nov 5, 2024

Description

Current results

Link to tracking issue

swiatekm commented Nov 5, 2024

jmacd commented Nov 7, 2024 • edited Loading

swiatekm commented Nov 10, 2024

atoulme commented Nov 12, 2024 • edited Loading

sfc-gh-sili commented Nov 22, 2024

swiatekm commented Dec 3, 2024 • edited Loading

swiatekm commented Dec 6, 2024 • edited Loading

ChrsMark left a comment

Choose a reason for hiding this comment

jmacd commented Nov 7, 2024 •

edited

Loading

atoulme commented Nov 12, 2024 •

edited

Loading

swiatekm commented Dec 3, 2024 •

edited

Loading

swiatekm commented Dec 6, 2024 •

edited

Loading