Beats pipeline doesn't respect configured batch sizes on startup under agent #34703

faec · 2023-02-28T20:42:25Z

When an output worker is created, it specifies the maximum size of event batches it should receive from the pipeline. This value is ultimately propagated back to eventConsumer, the routine that assembles batches for the output workers, which uses it for its queue requests. Most outputs accept this batch size as a configuration parameter e.g. bulk_max_size.

Under Elastic Agent, the Beats startup is more complicated since Agent sends the beat configuration in multiple stages, and there will generally not be an output on the first initialization. Currently, this leads to eventConsumer receiving four separate calls to update the batch size (in each beat) -- three setting it to zero, and one setting it to the actual value requested by the output.

While the final value is correct, the inputs may have already started up by that point. Since a value of 0 indicates to the queue that it should send as many events as are available, this can cause the pipeline to be primed with batches containing multiple thousands of events before the output is initialized, even if the output itself requests a relatively small value (e.g. the shipper output defaults to a batch size of 50).

This is notably a problem for the Elasticsearch and Shipper outputs (and possibly others), which can have upstream caps on batch size, causing them to either drop the entire batch or to enter a retry loop that stalls the ingestion pipeline (#29778 #34695).

We need to correct the initialization process so eventConsumer doesn't begin creating batches until a valid output is configured; this will still allow incoming data to accumulate in the queue, but no explicit batches should be created until we know what the output workers can accept.

(This issue currently causes repeatable pipeline deadlocks for me when targeting the shipper.)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-02-28T20:42:27Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

faec · 2023-02-28T21:20:06Z

After talking over agent/beats initialization with @fearful-symmetry this may not really be unique to the Agent startup process. It might be present but unnoticed in vanilla beats, since we're doing things with agent that are particularly sensitive to batch size -- needs followup to determine full scope/cause.

faec added bug Team:Elastic-Agent Label for the Agent team labels Feb 28, 2023

faec mentioned this issue Feb 28, 2023

[Meta] Elastic Agent Shipper Project elastic/elastic-agent-shipper#16

Open

100 tasks

faec mentioned this issue Mar 6, 2023

Prevent pipeline from queueing large batches before an output is set #34741

Merged

6 tasks

faec self-assigned this Mar 6, 2023

faec closed this as completed in #34741 Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beats pipeline doesn't respect configured batch sizes on startup under agent #34703

Beats pipeline doesn't respect configured batch sizes on startup under agent #34703

faec commented Feb 28, 2023

elasticmachine commented Feb 28, 2023

faec commented Feb 28, 2023

Beats pipeline doesn't respect configured batch sizes on startup under agent #34703

Beats pipeline doesn't respect configured batch sizes on startup under agent #34703

Comments

faec commented Feb 28, 2023

elasticmachine commented Feb 28, 2023

faec commented Feb 28, 2023