[Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper #8

cmacknz · 2022-03-16T20:09:49Z

This is a feature meta issue to allow filebeat and metricbeat to publish data to the shipper when run under Elastic agent. All other beats are out of scope.

An output for existing beats should be implemented that publishes to the shipper gRPC interface. When the shipper gRPC output is used, the beat output pipeline should be configured to be as simple as possible. Using a per beat disk queue with the shipper is forbidden. A memory queue may be used with the shipper output, but how it should be configured by users will require careful consideration. Ideally any necessary queue configuration can be made automatic.

Removing processors from beats is out of scope for this issue. Processors will be removed in a later issue.

This feature is considered complete when at least the following criteria are satisfied for both filebeat and metricbeat:

A test exists proving data ingested by the beat is published to the shipper.
A test exists proving there is no data loss when the shipper process restarts while the beat is publishing.
A test exists proving there is no data loss when the shipper backpressures the beat (because the shipper queue is full for example).

The assignee of this issue is expected to create the development plan with all child issues for this feature. The following set of tasks should be included in the initial issues at a minimum:

Creating a beats output that that publishes to the shipper gRPC interface.
Defining a standard configuration for using a beat with the shipper that the control plane can easily apply: processors disabled, queues disabled, etc.
Creating an integration test suite for the beat and shipper interactions.

UPD by @rdner

I split this in the following steps:

Implement and document the basic experimental shipper output type #22
Add integration tests for the shipper output #23
- An event batch is published from an input to the shipper gRPC server
- An event batch is not dropped when the gRPC server is not available but starts later
- An event batch is not dropped when ResourceExhausted code is returned from the gRPC server, TTL does not decrease in this case
Add support for input and data stream identification when sending events #24
Implement the shipper's gRPC API #34

The text was updated successfully, but these errors were encountered:

rdner · 2022-03-30T14:02:08Z

Creating a beats output that that publishes to the shipper gRPC interface.

@cmacknz I'm a bit confused about this sentence.

When we talked 1 on 1, we agreed that during the very first iteration the gRPC server will be one of the output options along with Elasticsearch, File output, Kafka, etc.

Later at the team call I asked the same question to widen the discussion circle but then you answered something different about having a feature flag and switching some logic in the code.

I think we have some miscommunication about this.

I see 2 options how to approach this task:

Option 1

We have it as a new experimental output type which we could configure like this:

output:
  shipper:
    server: "localhost:50051" # The server address in the format of host:port
    tls: true # Connection uses TLS if true, else plain TC
    ca_file: "/home/cert" # The file containing the CA root cert file
    server_host_override: "x.test.example.com" # The server name used to verify the hostname returned by the TLS handshake

This can be achieved with the following steps:

We create a new package shipper in here https://github.com/elastic/beats/tree/main/libbeat/outputs (or perhaps in elastic-agent-libs)
We implement the Client interface
We implement the new shipper output type factory.
We use the existing pipeline without any changes.

In this case, changes of the existing code are none or minimal and we can start working with the new setup, debug and perform tests. The new output type can be excluded from the documentation if needed. Later we can just replace the whole pipeline implementation when we feel the shipper is ready.

Option 2

We have a feature flag to switch the pipeline to a separate implementation that starts sending events to the shipper instead of configured outputs.

This will require us:

Refactor the current pipeline implementation so it's an interface that can have 2 different implementations instead of a struct
Create a new configuration section support at the root level where we can configure a shipper, e.g.:

shipper:
  server: "localhost:50051" # The server address in the format of host:port
  tls: true # Connection uses TLS if true, else plain TC
  ca_file: "/home/cert" # The file containing the CA root cert file
  server_host_override: "x.test.example.com" # The server name used to verify the hostname returned by the TLS handshake

If the configuration section exists, the pipeline implementation is switched to the ShipperPipeline and the beat's output configuration is ignored

The major drawback here is that we would need more time and to make a lot of changes to the existing code instead of just adding new that can affect stability. On the other hand, we would need to do that at some point too.

cmacknz · 2022-03-30T15:09:03Z

I recommend option 1 as it will be simpler to implement and maintain in the long term. It follows the model currently used by Elastic agent to configure outputs for beats.

ph · 2022-03-30T15:09:54Z

I prefer also option 1, so we don't have a special case or transformation to do.

faec · 2022-03-30T17:40:16Z

I'm not sure how option 1 fits with the other pending pieces. I think perhaps there's been some confusion with the "output" language that is being used for two different stages of processing: (1) sending data from the input to the processor / shipper before it enters the queue, and (2) sending final event data from the shipper to the upstream target (elasticsearch, logstash etc) after it exits the queue.

So I'm not sure how option 1 would fit right now -- the Client interface is the final link of the Beats pipeline that hands off to the upstream, so if we connect this output there, then events would go through the whole current pipeline (including processors and the memory queue) before being sent to the shipper, which is also supposed to handle the memory queue. So to me, option 2 makes more sense, since it diverts to the shipper before hitting the queue.

I wonder if the confusion about approaches comes from the use of "output" to refer to both of those components? Because option 1 sounds to me like a reasonable sketch of the output of the shipper, but as I understand it in the first pass we're just handling that with a placeholder raw-file output.

cmacknz · 2022-03-30T18:03:08Z

Yes, the language isn't precise enough, neither does the fact that the beat pipeline and the shipper will have overlapping functionality.

My view is that the development needs to be an iterative process where we start with some duplication between the beat and shipper just to get them connected to each other, and then slowly migrate functionality from the beat side into the shipper when run under agent.

I think initially we start with option 1, where we just make it possible for a beat to communicate with the shipper over gRPC. Both the beat and the shipper at this stage have a memory queue, and the processors only exist on the beat side. This is what the diagram in the issue description is trying to show :)

Once we have that, we next work on trying to remove the queuing from the beat side, followed by processing. At this point we may need to consider something like option 2 to try to strip down what the beat/input needs to run.

I like starting with Denis' option 1 to get a faster end to end prototype. Once we have that and can test the interaction between the beats and shipper we will likely need to consider something like option 2. I think we'll be better positioned to make design adjustments after we have a quick prototype than pursuing larger changes from the beginning. I could be convinced otherwise though.

faec · 2022-03-30T18:04:15Z

Ah ok, so the redundancy in the memory queue is an intentional temporary workaround? In that case fair enough, let's continue :-)

kvch · 2022-04-06T13:22:37Z

Does adding a feature flag make sense in beats? It is just basically a setting that enables or disables features. How is that different from setting output.elasticsearch instead of output.shipper (by Agent) if we want to fallback to the old way of sending events?

rdner · 2022-04-13T13:30:38Z

I've updated the description and added a checklist for tracking the progress.

One thing which is not 100% clear to me is input and data stream options. I could not find a simple way to propagate these parameters through the event batches so I'm going to address this as a separate issue after the initial implementation is there, so it's not blocking any experiments with the new shipper architecture.

The same goes about the integration tests, they will be implemented separately.

cmacknz · 2022-04-13T13:36:11Z

Thanks! I have a separate issue already for returning acknowledgements from the shipper: #9. I expected that would be too much work to fold into this issue.

The input and data stream will have to be propagated from the agent policy, which we may not do yet. We may not need the data stream until we implement processors in the shipper, at which point we'll need a way to apply the correct processors to events based on the input and data stream.

cmacknz · 2022-05-11T15:43:09Z

Added #34 as part of this work.

cmacknz · 2022-09-14T14:54:05Z

All tasks complete, closing.

cmacknz mentioned this issue Mar 16, 2022

[Meta][Project] Implement the Elastic Agent Data Shipper #3

Closed

3 tasks

cmacknz changed the title ~~[META][Feature] Enable filebeat and metricbeat to publish data to the shipper~~ [Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper Mar 18, 2022

jlind23 assigned rdner Mar 21, 2022

jlind23 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.3.0 labels Mar 21, 2022

cmacknz mentioned this issue Mar 22, 2022

[Meta] Shipper 8.5 - Experimental integration with Filebeat and Metricbeat #15

Closed

29 tasks

rdner mentioned this issue Apr 13, 2022

Implement and document the basic experimental shipper output type #22

Closed

cmacknz added the v8.4.0 label Apr 28, 2022

jlind23 added the 8.4-candidate label May 3, 2022

cmacknz mentioned this issue May 11, 2022

[Meta][Feature] Implement the memory queue and output pipeline #7

Closed

4 tasks

jlind23 added estimation:Week Task that represents a week of work. and removed v8.3.0 8.4-candidate labels May 24, 2022

rdner assigned leehinman and unassigned rdner Sep 6, 2022

cmacknz closed this as completed Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper #8

[Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper #8

cmacknz commented Mar 16, 2022 •

edited

Loading

rdner commented Mar 30, 2022

cmacknz commented Mar 30, 2022

ph commented Mar 30, 2022

faec commented Mar 30, 2022 •

edited

Loading

cmacknz commented Mar 30, 2022

faec commented Mar 30, 2022

kvch commented Apr 6, 2022

rdner commented Apr 13, 2022 •

edited

Loading

cmacknz commented Apr 13, 2022

cmacknz commented May 11, 2022

cmacknz commented Sep 14, 2022

[Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper #8

[Meta][Feature] Enable filebeat and metricbeat to publish data to the shipper #8

Comments

cmacknz commented Mar 16, 2022 • edited Loading

rdner commented Mar 30, 2022

Option 1

Option 2

cmacknz commented Mar 30, 2022

ph commented Mar 30, 2022

faec commented Mar 30, 2022 • edited Loading

cmacknz commented Mar 30, 2022

faec commented Mar 30, 2022

kvch commented Apr 6, 2022

rdner commented Apr 13, 2022 • edited Loading

cmacknz commented Apr 13, 2022

cmacknz commented May 11, 2022

cmacknz commented Sep 14, 2022

cmacknz commented Mar 16, 2022 •

edited

Loading

faec commented Mar 30, 2022 •

edited

Loading

rdner commented Apr 13, 2022 •

edited

Loading