Implement more efficient output tuning parameters to manage throughput #28

nimarezainia · 2022-04-20T22:06:14Z

Beats have many knobs and whistles that allow the user to modify output related parameters in order to increase throughput. These parameters are extremely convoluted and sometimes contradict one another. With the new shipper design we have the opportunity to simplify and create more meaningful parameters for users to use.

Performance Tuning Proposal

Change bulk_max_size to maximum_batch_size to be more meaningful. maximum_batch_size is the total batch size in bytes
Allow the user to modify the maximum_batch_size in the UI. Specify maximum_batch_size to be in bytes rather than events.
a. Bytes are easier to mentally consume
b. It’s also easier to map to data seen on the wire
c. On the Elasticsearch ingest, the max document size is configured in bytes
Introduce a NEW variable output_queue_flush_timeout
a. Upon expiry the output queue is flushed and data written to the output
b. Users can lower this timeout to reduce the delay in collecting data

In summary for tuning the output we now will have 2 variables: maximum_batch_size and output_queue_flush_timeout

jlind23 · 2022-06-01T06:59:01Z

@nimarezainia did you have a chance to work on the requirements for this?

nimarezainia · 2022-06-03T18:44:21Z

@nimarezainia did you have a chance to work on the requirements for this?

i'm still working on defining these

joshdover · 2022-07-05T18:44:41Z

Interested to see the list of what we want to support, but we also need to consider that we need to avoid any breaking changes here. Today, we allow the user to use any Elasticsearch output setting from the UI, kibana.yml configuration, and API (though this isn't GA yet).

Migrating this would be quite painful, mostly because we need to send a valid configuration to any agents that are not running the shipper (we support any agent >= 7.17.0 to work with any 8.x version of the Stack). Otherwise, Kibana does have the ability to run migrations during upgrades, which would allow us to do transformations to the user's YAML. This would also need to be done on the API and kibana.yml configuration code.

joshdover · 2022-07-14T11:48:15Z

Thinking about breaking changes some more, I'm curious if we need to consider the following when switching from beats outputs to the shipper:

User configures worker: 2 in the Elasticsearch output
User runs a single logs integration and a single metrics integration
With beats-specific outputs, Filebeat and Metricbeat would each create 2 workers, totaling 4 workers
With shipper output, the shipper creates 2 workers, totaling 2 workers

This is just one example, there could be other related configs like queue that don't translate exactly 1-1 and could result in degraded performance or higher resource usage when switching to the shipper. I don't anticipate this impacting customers who haven't touched these settings, but for those who have carefully tuned them, this will likely cause problems.

cmacknz · 2022-07-14T14:02:47Z

@joshdover yes that is definitely a possible problem when enabling the shipper, users may need to retune their worker and max_bulk_size configurations if they were using them before.

Even if we tried to apply the same configuration as before it may not behave equivalently as the data flowing through each worker will have changed from before. Filebeat workers would likely only write to log-* datastreams and the shipper will write to every data stream defined by an active integration for example.

There is no way to configure the underlying beat queue from an agent policy right now so that at least isn't a concern.

jlind23 · 2022-07-18T08:20:39Z

@nimarezainia Do we have a requirements doc for that? Otherwise it is going to be hard to design.

nimarezainia · 2022-07-19T17:41:20Z

@jlind23 i'll share the requirements doc shortly.

cmacknz · 2022-10-13T18:59:22Z

I've updated the description here to reflect the proposed changes to the output configuration, which I believe are the most impactful.

We will likely want follow up issues about:

Load balancing configuration.
Queue configuration in the agent policy and UI. The memory queue parameters can already be specified in the shipper configuration file, the disk queue configuration will be available after add disk queue configuration to shipper configuration #119 is implemented.

cmacknz · 2022-10-18T14:08:01Z

We will also need to consider how to handle existing agent policies that specify the existing worker and bulk_max_size parameters as advanced YAML configuration. We will likely need to handle both the old and new set of parameters. Fleet could migrate the policy for us, but that won't help standalone agents.

cmacknz · 2022-10-20T16:30:34Z

Given this will affect the agent policy and the Fleet UI, we should probably convert this (or create) a cross team feature issue for this work. We will likely want to break each of the changes in the proposal into individual issues so they can be investigated and implemented incrementally.

cmacknz · 2022-10-31T18:09:10Z

If we were to switch to using the go-elasticsearch client's BulkIndexer we would get this change essentially for free. BulkIndexer allows specifying a flush threshold in bytes and a minimum flush duration. https://pkg.go.dev/github.com/elastic/go-elasticsearch/v8/esutil#BulkIndexerConfig

jlind23 · 2022-11-03T09:38:15Z

@cmacknz shouldn't we for good swithc to the go-elasticsearch client then?

cmacknz · 2022-11-03T14:51:09Z

Yes I have prioritized the switch with #14 as the next task for the shipper.

amitkanfer · 2022-11-20T17:41:15Z

@alexsapran - put this one your radar

jlind23 · 2022-11-21T14:06:06Z

@cmacknz shouldn't I close this issue as @faec is currently working on the migration to the go elasticsearch client?

cmacknz · 2022-11-21T14:08:15Z

I would close this once we have proven the go-elasticsearch client behaves the way we want, and that there will be no additional changes required.

I'll also have to confirm that we have the Fleet UI changes tracked separately since they are mentioned here.

jlind23 · 2023-01-04T14:06:26Z

@cmacknz @leehinman shall we keep this one in next sprint or we had enough time to double checked that go-elasticsearch behaviour was as expected?

nimarezainia added the 8.4-candidate label Apr 20, 2022

jlind23 added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 27, 2022

jlind23 added estimation:Week Task that represents a week of work. v8.4.0 and removed 8.4-candidate labels May 24, 2022

jlind23 changed the title ~~Implementing more efficient output queue parameters to manage throughput~~ [DESIGN] Implementing more efficient output queue parameters to manage throughput Jun 1, 2022

jlind23 assigned nimarezainia Jun 1, 2022

nimarezainia added 8.5-candidate and removed v8.4.0 labels Jun 29, 2022

jlind23 added v8.5.0 and removed 8.5-candidate labels Jul 7, 2022

jlind23 assigned faec and unassigned nimarezainia Jul 13, 2022

joshdover mentioned this issue Jul 22, 2022

Allow configuration of Agent(+)Beats Internal queue (on disk queue) elastic/elastic-agent#284

Closed

pierrehilbert added the 8.6-candidate label Sep 12, 2022

cmacknz changed the title ~~[DESIGN] Implementing more efficient output queue parameters to manage throughput~~ Implement more efficient output tunring parameters to manage throughput Oct 13, 2022

cmacknz changed the title ~~Implement more efficient output tunring parameters to manage throughput~~ Implement more efficient output tuning parameters to manage throughput Oct 13, 2022

cmacknz removed the estimation:Week Task that represents a week of work. label Oct 20, 2022

cmacknz mentioned this issue Oct 31, 2022

Use the go-elasticsearch client for the Elasticsearch output #14

Closed

This was referenced Oct 31, 2022

Allow configuring the Elasticsearch output's flush and worker parameters #149

Closed

[Meta] Elastic Agent Shipper Project #16

Open

cmacknz added 8.7-candidate and removed 8.6-candidate labels Oct 31, 2022

faec mentioned this issue Jan 23, 2023

Enable batch config settings in the ES output #227

Merged

6 tasks

faec closed this as completed in #227 Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement more efficient output tuning parameters to manage throughput #28

Implement more efficient output tuning parameters to manage throughput #28

nimarezainia commented Apr 20, 2022 •

edited by cmacknz

Loading

jlind23 commented Jun 1, 2022

nimarezainia commented Jun 3, 2022

joshdover commented Jul 5, 2022

joshdover commented Jul 14, 2022

cmacknz commented Jul 14, 2022

jlind23 commented Jul 18, 2022

nimarezainia commented Jul 19, 2022

cmacknz commented Oct 13, 2022

cmacknz commented Oct 18, 2022 •

edited

Loading

cmacknz commented Oct 20, 2022

cmacknz commented Oct 31, 2022

jlind23 commented Nov 3, 2022

cmacknz commented Nov 3, 2022 •

edited

Loading

amitkanfer commented Nov 20, 2022

jlind23 commented Nov 21, 2022

cmacknz commented Nov 21, 2022

jlind23 commented Jan 4, 2023

Implement more efficient output tuning parameters to manage throughput #28

Implement more efficient output tuning parameters to manage throughput #28

Comments

nimarezainia commented Apr 20, 2022 • edited by cmacknz Loading

Performance Tuning Proposal

jlind23 commented Jun 1, 2022

nimarezainia commented Jun 3, 2022

joshdover commented Jul 5, 2022

joshdover commented Jul 14, 2022

cmacknz commented Jul 14, 2022

jlind23 commented Jul 18, 2022

nimarezainia commented Jul 19, 2022

cmacknz commented Oct 13, 2022

cmacknz commented Oct 18, 2022 • edited Loading

cmacknz commented Oct 20, 2022

cmacknz commented Oct 31, 2022

jlind23 commented Nov 3, 2022

cmacknz commented Nov 3, 2022 • edited Loading

amitkanfer commented Nov 20, 2022

jlind23 commented Nov 21, 2022

cmacknz commented Nov 21, 2022

jlind23 commented Jan 4, 2023

nimarezainia commented Apr 20, 2022 •

edited by cmacknz

Loading

cmacknz commented Oct 18, 2022 •

edited

Loading

cmacknz commented Nov 3, 2022 •

edited

Loading