[Meta][Feature] Implement the memory queue and output pipeline #7

cmacknz · 2022-03-16T19:53:16Z

This is a feature meta issue to implement the memory queue to output pipeline in the shipper. The scope is restricted to implementation of the memory queue and an output with no external dependencies (the console or file output for example). The disk queue, Elasticsearch/Kafka/Logstash outputs, and processors are explicitly out of scope.

This feature is considered complete when at least the following criteria are satisfied:

A test exists to prove that data written to the shipper event gRPC interface is publishes to the output. The test should write single events and batches, including batches that are as large as the configured size of the queue to prove it does not block.
A test exists to prove that the shipper will backpressure the producer when the queue has been filled. Ideally this means the producer will block until there is enough space in the queue. The backpressure should stop once the queue begins to drain.

The assignee of this issue is expected to create the development plan with all child issues for this feature. The following set of tasks should be included in the initial issues at a minimum:

Implementing the event publishing RPC and have it write to the queue.
Add queue and output sections to the shipper configuration file. The format must match the format used in agent policy output sections today.
Create the queue and output pipeline based on the provided configuration. Allow the configuration to be refreshed.
Creation of an integration test suite for the shipper process.

Important milestones:

Adapt the memory queue to accept shipper types (Make the memory queue work with types other than publisher.Event beats#31307)
Create a memory queue in the shipper binary and propagate input events through it
Create a test output that can confirm events received from the queue
Create an integration test that invokes the gRPC publishing interface and verifies its handling via the test output

zez3 · 2022-04-04T19:47:09Z

A test exists to prove that the shipper will backpressure the producer when the queue has been filled. Ideally this means the producer will block until the queue is drained. The backpressure should stop once the queue begins to drain.

So what will happen when the source/producer is syslog or any normal udp stream for that matter?
Or is my understanding wrong on what producer means in this context?

cmacknz · 2022-04-04T20:21:12Z

The only time the queue should fill is when the output (Elasticsearch/Logstash/Kafka) is unavailable or can't keep up with the data volume. If that situation persists for long enough eventually there will be data loss. When that point is depends on the data volume, queue size, and duration of the problem causing the queue to fill.

cmacknz · 2022-05-11T15:44:01Z

Create an integration test that invokes the gRPC publishing interface and verifies its handling via the test output

This now depends on #34. Both the client (beat) and server side implementations will be done together as part of #8.

zez3 · 2022-05-12T11:16:50Z

The only time the queue should fill is when the output (Elasticsearch/Logstash/Kafka) is unavailable or can't keep up with the data volume. If that situation persists for long enough eventually there will be data loss. When that point is depends on the data volume, queue size, and duration of the problem causing the queue to fill.

That sounds exactly right. On our older system(Graylog) we've had a 600Gb queue(disk journal) that allowed us to survive a 24h ElasticSearch downtime. When one such queue got ~95% full we where declaring that node dead and the loadbalancer in front moved the stream to a second node.
Ideal the new shipper should anounce this somewhere like:
https://www.elastic.co/guide/en/beats/filebeat/master/http-endpoint.html

My initial question was:

How would you anounce/backpressure to the source producer when this is syslog or any normal udp stream for that matter?
I suppose it does not matter anyway

cmacknz · 2022-07-13T12:10:06Z

Closing this as completed, queue+output work will continue with separate issues.

This was referenced Mar 16, 2022

[Meta][Project] Implement the Elastic Agent Data Shipper #3

Closed

Implement an MVP of the Elasticsearch output #10

Closed

cmacknz changed the title ~~[META][Feature] Implement the memory queue and output pipeline~~ [Meta][Feature] Implement the memory queue and output pipeline Mar 18, 2022

jlind23 assigned faec Mar 21, 2022

jlind23 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.3.0 labels Mar 21, 2022

This was referenced Mar 22, 2022

[Meta] Shipper 8.5 - Experimental integration with Filebeat and Metricbeat #15

Closed

Create a skeleton shipper executable #6

Closed

jlind23 mentioned this issue Mar 31, 2022

Spooling to disk GA elastic/beats#6859

Closed

39 tasks

zez3 mentioned this issue Apr 4, 2022

Allow configuration of Agent(+)Beats Internal queue (on disk queue) elastic/elastic-agent#284

Closed

faec mentioned this issue Apr 19, 2022

[shipper] Make the memory queue accept opaque pointers elastic/beats#31356

Merged

6 tasks

cmacknz added the v8.4.0 label Apr 28, 2022

jlind23 added the 8.4-candidate label May 3, 2022

jlind23 added estimation:Week Task that represents a week of work. and removed v8.3.0 8.4-candidate labels May 18, 2022

cmacknz closed this as completed Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta][Feature] Implement the memory queue and output pipeline #7

[Meta][Feature] Implement the memory queue and output pipeline #7

cmacknz commented Mar 16, 2022 •

edited

Loading

zez3 commented Apr 4, 2022 •

edited

Loading

cmacknz commented Apr 4, 2022 •

edited

Loading

cmacknz commented May 11, 2022

zez3 commented May 12, 2022

cmacknz commented Jul 13, 2022

[Meta][Feature] Implement the memory queue and output pipeline #7

[Meta][Feature] Implement the memory queue and output pipeline #7

Comments

cmacknz commented Mar 16, 2022 • edited Loading

zez3 commented Apr 4, 2022 • edited Loading

cmacknz commented Apr 4, 2022 • edited Loading

cmacknz commented May 11, 2022

zez3 commented May 12, 2022

cmacknz commented Jul 13, 2022

cmacknz commented Mar 16, 2022 •

edited

Loading

zez3 commented Apr 4, 2022 •

edited

Loading

cmacknz commented Apr 4, 2022 •

edited

Loading