[Proposal] Move batching to `exporterhelper` #4646

bogdandrutu · 2022-01-05T23:46:52Z

Right now batching capability is offered as a processor, and this is extremely generic in our current pipeline model, which is a great thing, but also comes with limitations:

Batch processor uses a queue (implemented as a channel), which is not the same as the one used in the queue retry, that can lose data when the binary crashes even after the client was informed that data were accepted. There is already work to offer a persistent queue mechanism which will not be available in the processor.
If users want to use batching with the routing processor: will need to fix the Change batch processor to take client info into consideration #4544 issue, AND this will batch data that later will be split again. Very unnecessary.
SignalFx/SplunkHec receiver/exporter uses a "hack" that adds one header from the incoming request to the resource, batches things, then has to split based on that header. Batching inside the exporter helper can be configured with possible custom logic much easier, by making the batching library accepting a custom "batcher" func.
Allow customization of the "size" function. There were requests to support batching by the serialized size, we can offer this in the exporter helper much nicer since the custom sizer can actually work using the exporter wire format. So batching can happen after data serialization.
When multiple exporters configured, they may have different batching requirements. This can be achieved today with multiple pipelines, but that causes code duplicate between the pipelines.

This will also fix some more problems:

Change batch processor to take client info into consideration #4544
otelcol can't get cpu limit and may cause performance issues. #4459 - by using the same workers defined in the queue_retry for this, hence we allow users to control this and we can still default on NumCpu.
Global sampling service telemetry logging configuration options #4554 - nothing to fix :)

The proposal is to look into offer the "batching" capability similar to timeout/queue/retry logic that we have in the exporterhelper.

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2022-01-06T00:23:43Z

A couple quick counterarguments:

Dedicated batch processor keeps the batch in memory once even when there are multiple exporters in the pipeline. If we move the equivalently sized batch to queuedretry of every exporter we will slightly increase memory requirements (not by much since most of pdata is going to be shared assuming exporters' output throughput more or less keeps up with the input).
Virtually the same batching work needs to happen for every exporter, which adds some CPU work per exporter. It would be useful to benchmark to see how much this difference is when there is more than one exporter.
Batching probably helps to improve the performance of subsequent processors in the pipeline. We haven't benchmarked this, but it is possible that there is non-zero per-pdata cost that is amortized better if the batches are larger. This may be negligible, but I would like to see some benchmarks measuring it.

These are not necessarily strong enough to block the proposal, but I think it is worth thinking about these.

There is already work to offer a persistent queue mechanism which will not be available in the processor.

I think we discussed the possibility of adding persistency to the batch processor as well. Admittedly it may result in awkward user experience though since now you need to configure persistency in 2 different places.

tigrannajaryan · 2022-01-06T00:24:20Z

Another benefit of eliminating batch processor is that it is one less thing to configure. This will simplify the user experience.

bogdandrutu · 2022-01-06T00:42:22Z

While responding to your comments I realize that a possible thing is to offer both, do you think that is too confusing?

Dedicated batch processor keeps the batch in memory once even when there are multiple exporters in the pipeline. If we move the equivalently sized batch to queuedretry of every exporter we will slightly increase memory requirements (not by much since most of pdata is going to be shared assuming exporters' output throughput more or less keeps up with the input).

This is a good argument to possibly keep the processor as well, maybe using the same shared library that the exporterhelper uses. This comes with another downside that I haven't thought, if we want to have a fast batching we need to mutate the pdata (move them to the batch) which causes N (num exporter) copies needed even if the exporter may not need to mutate the data itself.

Virtually the same batching work needs to happen for every exporter, which adds some CPU work per exporter. It would be useful to benchmark to see how much this difference is when there is more than one exporter.

Indeed, I expect thought the work to be minimal, but definitely need to think about.

Batching probably helps to improve the performance of subsequent processors in the pipeline. We haven't benchmarked this, but it is possible that there is non-zero per-pdata cost that is amortized better if the batches are larger. This may be negligible, but I would like to see some benchmarks measuring it.

Need to think/propose of 1-2 components to benchmark for this, maybe the new "transform" processor since will be one of the most used? Any other idea?

tigrannajaryan · 2022-01-06T21:59:20Z

While responding to your comments I realize that a possible thing is to offer both, do you think that is too confusing?

Yes, I think it will be confusing if we do batching in 2 places, with 2 different overlapping sets of configuration options available.

tigrannajaryan · 2022-01-06T22:00:38Z

if the exporter may not need to mutate the data itself.

The exporters are currently not allowed to mutate data: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#data-ownership

tigrannajaryan · 2022-01-06T22:02:22Z

Need to think/propose of 1-2 components to benchmark for this, maybe the new "transform" processor since will be one of the most used? Any other idea?

Since "transform" doesn't exist yet perhaps test with a couple basic ones for now, like attributesprocessor or filterprocessor.

bogdandrutu · 2022-01-07T18:26:40Z

The exporters are currently not allowed to mutate data: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#data-ownership

you know that nobody reads documentation :)) One of them is the signalfx exporter which uses https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/batchperresourceattr/batchperresourceattr.go. Luckily after my change to move "capability" to Consumer, we now do the right thing for exporters because they also have capabilities and we clone if we need there as well.

So the documentation is out-dated unfortunately.

mx-psi · 2022-01-10T10:47:03Z

I think supporting both would be confusing. The 'customization of the "size" function' feature would be really helpful for us: we want batches to be below our intake's API max size, but currently we can't really give a precise number for the max batch size, since what we care about is the number of bytes in the payload we export, which can't be recovered from the number of points in a batch.

bogdandrutu · 2022-02-19T02:32:25Z

@dashpole would be good if you drive this to resolution, see open-telemetry/opentelemetry-collector-contrib#7134 which you also need :)

pmm-sumo · 2022-02-21T12:59:31Z

Batching probably helps to improve the performance of subsequent processors in the pipeline. We haven't benchmarked this, but it is possible that there is non-zero per-pdata cost that is amortized better if the batches are larger. This may be negligible, but I would like to see some benchmarks measuring it.

There's an edge case in which batch + groupbyattrs can be used for compacting traces and has a huge impact (as measured by @pkositsyn). Moving batching to exporterhelper would make it no longer possible but I think we could also consider making it a more generic helper, which could be used in processors as well?

swiatekm · 2022-02-21T15:36:17Z

Batch processor uses a queue (implemented as a channel), which is not the same as the one used in the queue retry, that can lose data when the binary crashes even after the client was informed that data were accepted. There is already work to offer a persistent queue mechanism which will not be available in the processor.

Correct me if I'm wrong, but we can lose data this way even without a crash, if we have an exporter with a queue and a batch processor, the data can be dropped due to the exporter queue being full, and the batch processor makes it impossible to propagate the error back to the receiver.

I don't know that I have an informed opinion on whether batching should happen in a separate processor or in an exporter, but it definitely shouldn't happen in both, for the above reason, unless we're willing to accept this kind of data loss and abandon any hope of applying backpressure to the receiver from any component after the batch processor.

dashpole · 2022-02-23T16:35:12Z

For now, we've implemented batching within the gcp exporter. There is only one "correct" way to batch for GCM, so it seemed sensible to make it work correctly without requiring configuration. It is also needed to work around other quirks of the GCM api, such as #4442. open-telemetry/opentelemetry-collector-contrib#7134 isn't blocking us--we've implemented our conversion in our exporter. That proposal would only be valuable if other exporters that need summary splitting also wanted to use it.

I agree with others that supporting both processor and exporterhelper would be confusing. For now, I think it is reasonable to ask unusual use-cases like ours to re-implement batching themselves. Making it available as a library would be nice, but probably not worth the dependency unless we wanted to mark it stable.

swiatekm · 2022-06-16T18:45:06Z

@bogdandrutu what do you think about making the batch processor conduct backpressure better in the meantime?

Currently, batch processor simply ignores consumer errors. If a transient error happens later in the pipeline, the data is effectively dropped and this information isn't propagated backwards. For example, this means that if we have an exporter with a full queue at the end of the pipeline, we'll keep batching and dropping data, whereas we really should return a transient error to the receiver so it can make its own flow control decisions.

For example, we could:

Retry transient consumer errors
Reject incoming data if there's backpressure - could be via a timeout or by switching to a buffered channel

Right now, it's impossible to have batching and reasonable error propagation. Fixing this should be possible right now, and wouldn't prevent us from moving batching to exporterhelper later.

tigrannajaryan · 2022-06-17T13:59:54Z

what do you think about making the batch processor conduct backpressure better in the meantime?

@swiatekm-sumo if you want to help with this it would be great if you could do the benchmarking to prove that we don't need the separate batch processor to have reasonable performance. If we can demonstrate this then we should eliminate the batch processor and make batching an exporterhelper feature. This would be more preferable than spending time improving the batch processor which we end up retiring any way.

swiatekm · 2022-06-20T09:54:44Z

@tigrannajaryan the reason I'd prefer to start with improving batch processor is that the change to it would be fairly straightforward and would improve user experience right now, as opposed to several months from now after we're done with the migration involving several breaking changes.

I'm also not entirely convinced this is the right thing to do. Even putting efficiency aside, removing the batch processor would have various knock-on effects - for example it would make https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor worthless.

I can take up the benchmarking either way; I don't think these activities are mutually exclusive. But it seems like a much more open-ended project, as I'd need to come up with a bunch of representative configurations on my own.

tigrannajaryan · 2022-06-20T17:23:30Z

@swiatekm-sumo Sure, if there is a maintainer/approver who wants to review the batch processor improvements please feel free to go ahead with what you suggest (I won't be able to help review the improvements).

If you can help with benchmarking it will be an added bonus :-)

bogdandrutu · 2022-07-26T23:28:18Z

@swiatekm-sumo I am a bit reluctant because:

For example, we could:

Retry transient consumer errors
Reject incoming data if there's backpressure - could be via a timeout or by switching to a buffered channel

These are already implemented in the exporter helper as "`queue retry" :) that is yet another argument to unify them.

dmitryax · 2023-02-06T22:56:18Z

I like the idea. Another reason to do this is that the batch processor removes the client context and has to be put in a particular place in the pipeline to avoid causing problems to the processors that use the context which can be easily misconfigured

aabmass mentioned this issue Jan 14, 2022

Processor: Splitting summary metrics into timeseries. open-telemetry/opentelemetry-collector-contrib#7134

Open

This was referenced Apr 11, 2022

[extension/filestorage] Mark as Beta open-telemetry/opentelemetry-collector-contrib#9135

Closed

[Proposal] Include compaction in exporterhelper #5201

Closed

swiatekm mentioned this issue Jan 11, 2023

pkg/stanza: add backpropagation of number of processed entries open-telemetry/opentelemetry-collector-contrib#16452

Closed

This was referenced Jan 18, 2023

Change batch processor to take client info into consideration #4544

Closed

Problem with Auth extension API and streaming request metadata #6965

Open

tigrannajaryan mentioned this issue Apr 3, 2023

Ensure reliable data delivery in erroneous situations #7460

Open

tigrannajaryan changed the title ~~[Proposal] Move batching in exportehelper~~ [Proposal] Move batching to exportehelper Apr 3, 2023

dmitryax mentioned this issue Apr 5, 2023

batchprocessor: send_batch_max_size_bytes limit #6046

Open

mx-psi mentioned this issue May 4, 2023

Create tool or subcommand that allows migrating configuration #7631

Open

2 tasks

dmitryax mentioned this issue May 17, 2023

[pkg/stanza] Make batching behavior configurable open-telemetry/opentelemetry-collector-contrib#21184

Closed

dmitryax self-assigned this May 22, 2023

dmitryax mentioned this issue Jun 12, 2023

[exporterhelper] New exporter helper for custom requests #7874

Closed

dmitryax mentioned this issue Jul 22, 2023

Introduce new exporter helper with batching option #8122

Open

7 tasks

atoulme changed the title ~~[Proposal] Move batching to exportehelper~~ [Proposal] Move batching to exporterhelper Jul 28, 2023

atoulme added the release:required-for-ga label Dec 18, 2023

atoulme added this to Collector: v1 Mar 6, 2024

codeboten added this to the go.opentelemetry.io/exporter 1.0 milestone Apr 17, 2024

mx-psi moved this to In Progress in Collector: v1 Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Move batching to `exporterhelper` #4646

[Proposal] Move batching to `exporterhelper` #4646

bogdandrutu commented Jan 5, 2022 •

edited

Loading

tigrannajaryan commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

bogdandrutu commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

bogdandrutu commented Jan 7, 2022

mx-psi commented Jan 10, 2022

bogdandrutu commented Feb 19, 2022

pmm-sumo commented Feb 21, 2022

swiatekm commented Feb 21, 2022

dashpole commented Feb 23, 2022

swiatekm commented Jun 16, 2022

tigrannajaryan commented Jun 17, 2022

swiatekm commented Jun 20, 2022

tigrannajaryan commented Jun 20, 2022

bogdandrutu commented Jul 26, 2022

dmitryax commented Feb 6, 2023

[Proposal] Move batching to exporterhelper #4646

[Proposal] Move batching to exporterhelper #4646

Comments

bogdandrutu commented Jan 5, 2022 • edited Loading

tigrannajaryan commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

bogdandrutu commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

tigrannajaryan commented Jan 6, 2022

bogdandrutu commented Jan 7, 2022

mx-psi commented Jan 10, 2022

bogdandrutu commented Feb 19, 2022

pmm-sumo commented Feb 21, 2022

swiatekm commented Feb 21, 2022

dashpole commented Feb 23, 2022

swiatekm commented Jun 16, 2022

tigrannajaryan commented Jun 17, 2022

swiatekm commented Jun 20, 2022

tigrannajaryan commented Jun 20, 2022

bogdandrutu commented Jul 26, 2022

dmitryax commented Feb 6, 2023

[Proposal] Move batching to `exporterhelper` #4646

[Proposal] Move batching to `exporterhelper` #4646

bogdandrutu commented Jan 5, 2022 •

edited

Loading