Skip to content
This repository has been archived by the owner on Sep 21, 2023. It is now read-only.

[Meta][Feature] Implement the memory queue and output pipeline #7

Closed
4 tasks done
Tracked by #15
cmacknz opened this issue Mar 16, 2022 · 5 comments
Closed
4 tasks done
Tracked by #15

[Meta][Feature] Implement the memory queue and output pipeline #7

cmacknz opened this issue Mar 16, 2022 · 5 comments
Assignees
Labels
estimation:Week Task that represents a week of work. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.4.0

Comments

@cmacknz
Copy link
Member

cmacknz commented Mar 16, 2022

This is a feature meta issue to implement the memory queue to output pipeline in the shipper. The scope is restricted to implementation of the memory queue and an output with no external dependencies (the console or file output for example). The disk queue, Elasticsearch/Kafka/Logstash outputs, and processors are explicitly out of scope.

image

This feature is considered complete when at least the following criteria are satisfied:

  • A test exists to prove that data written to the shipper event gRPC interface is publishes to the output. The test should write single events and batches, including batches that are as large as the configured size of the queue to prove it does not block.
  • A test exists to prove that the shipper will backpressure the producer when the queue has been filled. Ideally this means the producer will block until there is enough space in the queue. The backpressure should stop once the queue begins to drain.

The assignee of this issue is expected to create the development plan with all child issues for this feature. The following set of tasks should be included in the initial issues at a minimum:

  • Implementing the event publishing RPC and have it write to the queue.
  • Add queue and output sections to the shipper configuration file. The format must match the format used in agent policy output sections today.
  • Create the queue and output pipeline based on the provided configuration. Allow the configuration to be refreshed.
  • Creation of an integration test suite for the shipper process.

Important milestones:

  • Adapt the memory queue to accept shipper types (Make the memory queue work with types other than publisher.Event beats#31307)
  • Create a memory queue in the shipper binary and propagate input events through it
  • Create a test output that can confirm events received from the queue
  • Create an integration test that invokes the gRPC publishing interface and verifies its handling via the test output
@cmacknz cmacknz changed the title [META][Feature] Implement the memory queue and output pipeline [Meta][Feature] Implement the memory queue and output pipeline Mar 18, 2022
@jlind23 jlind23 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.3.0 labels Mar 21, 2022
@zez3
Copy link

zez3 commented Apr 4, 2022

A test exists to prove that the shipper will backpressure the producer when the queue has been filled. Ideally this means the producer will block until the queue is drained. The backpressure should stop once the queue begins to drain.

So what will happen when the source/producer is syslog or any normal udp stream for that matter?
Or is my understanding wrong on what producer means in this context?

@cmacknz
Copy link
Member Author

cmacknz commented Apr 4, 2022

The only time the queue should fill is when the output (Elasticsearch/Logstash/Kafka) is unavailable or can't keep up with the data volume. If that situation persists for long enough eventually there will be data loss. When that point is depends on the data volume, queue size, and duration of the problem causing the queue to fill.

@cmacknz
Copy link
Member Author

cmacknz commented May 11, 2022

Create an integration test that invokes the gRPC publishing interface and verifies its handling via the test output

This now depends on #34. Both the client (beat) and server side implementations will be done together as part of #8.

@zez3
Copy link

zez3 commented May 12, 2022

The only time the queue should fill is when the output (Elasticsearch/Logstash/Kafka) is unavailable or can't keep up with the data volume. If that situation persists for long enough eventually there will be data loss. When that point is depends on the data volume, queue size, and duration of the problem causing the queue to fill.

That sounds exactly right. On our older system(Graylog) we've had a 600Gb queue(disk journal) that allowed us to survive a 24h ElasticSearch downtime. When one such queue got ~95% full we where declaring that node dead and the loadbalancer in front moved the stream to a second node.
Ideal the new shipper should anounce this somewhere like:
https://www.elastic.co/guide/en/beats/filebeat/master/http-endpoint.html

My initial question was:

How would you anounce/backpressure to the source producer when this is syslog or any normal udp stream for that matter?
I suppose it does not matter anyway

@jlind23 jlind23 added estimation:Week Task that represents a week of work. and removed v8.3.0 8.4-candidate labels May 18, 2022
@cmacknz
Copy link
Member Author

cmacknz commented Jul 13, 2022

Closing this as completed, queue+output work will continue with separate issues.

@cmacknz cmacknz closed this as completed Jul 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
estimation:Week Task that represents a week of work. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.4.0
Projects
None yet
Development

No branches or pull requests

4 participants