Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting host.* in Beats that forward data #13920

Open
andrewkroh opened this issue Oct 4, 2019 · 18 comments
Open

Setting host.* in Beats that forward data #13920

andrewkroh opened this issue Oct 4, 2019 · 18 comments

Comments

@andrewkroh
Copy link
Member

andrewkroh commented Oct 4, 2019

There are several use cases in Beats where the data reported by a Beat did not originate on that Beat host. Some examples are syslog, windows forwarded events, router netflow data, and cloud watch logs. In these cases it would be appropriate to set the host.* field to information about the originating machine.

From ECS:

ECS host.* fields should be populated with details about the host on which the event happened, or from which the measurement was taken.

Some issues related to this:

I think we need way for inputs and modules to be able to "designate" that host.* should not be set by default. The output pipeline and also the add_host_metadata processor will need to honor this "designation".

@andrewkroh
Copy link
Member Author

@webmat Would it be appropriate to populate observer.* in any of these above use cases? If so which host's data would go there?

@webmat
Copy link
Contributor

webmat commented Oct 4, 2019

Here are cases that should populate observer:

  • A Beat is actively monitoring a different host. E.g. Metricbeat collecting MySQL metrics from another host, like an AWS RDS instance.
    • In this case, the host running Metricbeat should go to observer.hostname & so on with other observer fields
    • In this case, the host being monitored should go into host.hostname & so on with other host fields.
  • A Beat is collecting logs locally from another agent acting as an observer (e.g. Zeek monitoring a network tap, or a Beat installed on a network appliance).
    • In this case, since both the Beat and the software doing the monitoring are on the same host, observer.hostname & other fields should be populated with that host's detail
    • If the data being collected contains information about the monitored hosts, then this goes to host.*
    • If the data being collected does not contain information about the monitored hosts (e.g. network flow stats), then host.* cannot be populated.

Here are cases that should not populate observer:

  • A Beat is receiving Syslog events and passing them along.
  • A Beat is receiving Windows Event Logs and passing them along.

In both of these cases, if the data stream contains the source host's detail, it should to go to host.*.

@webmat
Copy link
Contributor

webmat commented Oct 4, 2019

The case of monitoring containers may require a chat, I see a few cases, there may be more:

  • An agent can be installed on the host running Docker directly (not using Docker)
    • It can then monitor containers
    • It can monitor the host itself
  • An agent can be installed as a sidecar container, with the purpose of monitoring other containers local to the host
  • An agent can be installed in a container, with the purpose of monitoring the host

I'm not sure the current semantics as defined by ECS are clear nor sufficient to fully capture this. Maybe I'm wrong. But I'd be more than happy to have a chat with folks specialized with monitoring Docker, and hash out ideas here. Let me know if that would be useful.

@missnebun
Copy link

I just installed filebeat 7.4 and I am using the netflow module. I have the same problem. agent.hostname is the agent.hostname for the hostname of netflow input server instead of the sender. I had to update some visualization for Kibana and replace agent.hostname with observer.ip.

@urso
Copy link

urso commented Oct 7, 2019

It is not only the processors, but also libbeat directly adding some fields. See: https://github.com/elastic/beats/blob/master/libbeat/publisher/processing/default.go#L78

I'm in favor of not automatically modifying any event once it hits libbeat. All modification should be opt-in via processors.

@urso urso added discussion and removed meta labels Oct 8, 2019
@exekias
Copy link
Contributor

exekias commented Oct 9, 2019

I think we need way for inputs and modules to be able to "designate" that host.* should not be set by default. The output pipeline and also the add_host_metadata processor will need to honor this "designation".

I totally agree, we have this same problem with cloud metadata and cloud monitoring modules (AWS, Azure, etc). Something we have with add_cloud_metadata is that it won't override any info sent by the input/module itself:

if !p.initData.overwrite {
cloudValue, _ := event.GetValue("cloud")
if cloudValue != nil {
return event, nil
}
}

In this case, cloud metadata for the agent is not sent, which may not be ideal (it could make sense to have it under observer.cloud?)

Something like this could make sense for add_host_metadata, where it could decide to put the metadata somewhere else (as you said, maybe observer)

The case of monitoring containers may require a chat, I see a few cases, there may be more:

  • An agent can be installed on the host running Docker directly (not using Docker)

    • It can then monitor containers
    • It can monitor the host itself
  • An agent can be installed as a sidecar container, with the purpose of monitoring other containers local to the host

  • An agent can be installed in a container, with the purpose of monitoring the host

I'm not sure the current semantics as defined by ECS are clear nor sufficient to fully capture this. Maybe I'm wrong. But I'd be more than happy to have a chat with folks specialized with monitoring Docker, and hash out ideas here. Let me know if that would be useful.

Happy to participate in this conversation @webmat!

@exekias exekias added the Metricbeat Metricbeat label Oct 9, 2019
@urso
Copy link

urso commented Oct 9, 2019

To me it feels like the overall problem is that we have a limited set of namespaces, but yet there is potentially a trail of subsystems an event might have been passed through. Ultimately one might want an array of system descriptors the event has passed.

For now I want to solve the issue at hand, as this has come up a few times already. The host and agent fields are always overwritten. Currently host and other fields are enforced, potentially overwriting fields no matter where values come from. This is not really ECS problem itself, but a general beats one, as we also mess with users already using ECS for their own data.

Overall ECS discussions are maybe better handled in the github.com/elastic/ecs repository.

For example in filebeat we introduced a kafka input, to allow architectures like Beats->Kafka->Filebeat->Elasticsearch. The problem becomes even more apparent in this situation. In the simplest case Beats would be the host, and the second Filebeat should be the collector. We also pass some fields via @metadata from the inputs like pipeline name, document id, or index name. Once the event reaches the 'collector' filebeat, it should be treated by default as if Beats->Elasticsearch has been configured directly. And this is currently not the case.

We also do not want to introduce "heavy" breaking changes in 7.x. So for 7.x I'm planning these changes (independent of ECS):

  • libbeat changes (these are currently enforced and can't be disabled by the user):
    • Do not overwrite ecs.version, if it's already present
    • Do not change the host field or any of its subfields if already present
    • Do not update agent fields, if agent is already present
    • Do not update observer fields, if observer is already present
  • Update processors:
    • Add overwrite setting to selected list of processors that affect cloud, host, observer fields
      • Default value is `false
      • If false, the namespaces are protected. e.g. if host is already available in the event, the processor will not add any host.X fields (as these might represent the wrong host)
    • Add target or namespace setting, so users can overwrite the the field names (e.g. set target to 'collector' when add_host_metadata is used)
    • processors to be adapted: add_cloud_metadata, add_host_metadata, add_observer_metadata
    • add_locale: do not overwrite event.timezone if present, but allow user to configure alternative target field
    • combine add_host_metadata and add_observer_metadata into a common processor (it's mostly a copy and paste right now)

For 8.x I would like to remove setting host, agent or observer from within libbeat. Libbeat should not enforce fields, but allow solutions/users to opt-in. All these fields are already available via processors, and I'd prefer to provide default configs with these being enabled. Moving more functionality to processors and removing default behavior will also simplify the setup in libbeat itself.

I will create issues for individual tasks, if we agree on the plan.

@webmat
Copy link
Contributor

webmat commented Oct 9, 2019

This plan makes a lot of sense. Thanks for putting this together, @urso!

@simitt
Copy link
Contributor

simitt commented Oct 24, 2019

FWIW this also concerns APM, we are setting observer information.

@andrewkroh
Copy link
Member Author

@urso, regarding an overwrite option in the processors, you wrote:

If false, the namespaces are protected. e.g. if host is already available in the event, the processor will not add any host.X fields.

This sounds good but I don't think it cannot work as long as libbeat continues to set host.name because then add_host_metadata would never get added because host would always exist. We could move setting the host.name field into the add_host_metadata processor. I think that would address this problem, but would probably cause some level of breaking change (haven't thought through all the consequences yet). WDYT?

For reference this is the code that adds host.name followed by where it runs the global add_host_metadata:

// setup 6: add beats and host metadata
if meta := builtin; len(meta) > 0 {
processors.add(actions.NewAddFields(meta, needsCopy, false))
}
// setup 8: pipeline processors list
processors.add(b.processors)

andrewkroh added a commit that referenced this issue Jun 26, 2020
When {{ .tags }} is evaluated in the module config it not written in the correct format.
This fixes that issue and also conditionally enables `publisher_pipeline.disable_host`
based on whether tags contains `forwarded` to be consistent with every other module
that allows for `var.tags` to be set (relates: #13920).

For example (https://play.golang.org/p/LUr-X94msd1):

    var.tags: [foo, bar]

will be written into the config as

    tags: [foo bar]

which is a single value array containing the string "foo bar" rather than two tags.

(cherry picked from commit b48c388)
andrewkroh added a commit to andrewkroh/beats that referenced this issue Jun 29, 2020
For the Checkpoint module when data is forwarded to Fortinet from another host/device (this is most of the time) you don't want Filebeat to add `host`. So by default this modules add a `forwarded` tag to events. If you configure the module to not include the `forwarded` tag (e.g. `var.tags: [my_tag]`) then Filebeat will add the `host.*` fields.

Relates: elastic#13920
(cherry picked from commit ff0d22b)
andrewkroh added a commit that referenced this issue Jun 30, 2020
For the Checkpoint module when data is forwarded to Fortinet from another host/device (this is most of the time) you don't want Filebeat to add `host`. So by default this modules add a `forwarded` tag to events. If you configure the module to not include the `forwarded` tag (e.g. `var.tags: [my_tag]`) then Filebeat will add the `host.*` fields.

Relates: #13920
(cherry picked from commit ff0d22b)
adriansr added a commit to adriansr/beats that referenced this issue Jul 2, 2020
Update Filebeat's test_modules.py integration test to not strip the
`host.name` field in events marked as forwarded.

Relates elastic#13920
adriansr added a commit that referenced this issue Jul 6, 2020
Update Filebeat's test_modules.py integration test to not strip the
`host.name` field in events marked as forwarded.

Relates #13920
adriansr added a commit to adriansr/beats that referenced this issue Jul 6, 2020
Update Filebeat's test_modules.py integration test to not strip the
`host.name` field in events marked as forwarded.

Relates elastic#13920

(cherry picked from commit 156c87b)
adriansr added a commit that referenced this issue Jul 14, 2020
Update Filebeat's test_modules.py integration test to not strip the
`host.name` field in events marked as forwarded.

Relates #13920

(cherry picked from commit 156c87b)
andrewkroh added a commit to andrewkroh/beats that referenced this issue Jul 29, 2020
Add an example to packetbeat.yml of using the `forwarded` tag to disable `host` metadata fields when processing network data from network tap or mirror port.

Relates elastic#13920

(cherry picked from commit 28cb613)
andrewkroh added a commit that referenced this issue Jul 29, 2020
Add an example to packetbeat.yml of using the `forwarded` tag to disable `host` metadata fields when processing network data from network tap or mirror port.

Relates #13920

(cherry picked from commit 28cb613)
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
For the netflow module when data is forwarded to Filebeat from another host/device you don't want Filebeat to add `host`. So by default this modules add a `forwarded` tag to events. If you configure the module to not include the `forwarded` tag (e.g. `var.tags: [my_tag]`) then Filebeat will add the `host.*` fields.

Relates: elastic#13920
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
If `forwarded` as configured as a tag (e.g. `var.tags: [forwarded]`) for the Suricata module then Filebeat will not add `host` fields to events. This is for use cases where Suricata is analyzing forwarded data (like from a network tap or mirror port).

Relates: elastic#13920
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
If `forwarded` as configured as a tag (e.g. `var.tags: [forwarded]`) for the Zeek module then Filebeat will not add `host` fields to events. This is for use cases where Zeek is analyzing forwarded data (like from a network tap or mirror port).

Relates: elastic#13920
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
For the CrowdStrike module when data is forwarded to Filebeat from another host/device you don't want Filebeat to add `host`. So by default this modules add a `forwarded` tag to events. If you configure the module to not include the `forwarded` tag (e.g. `var.tags: [my_tag]`) then Filebeat will add the `host.*` fields.

Relates: elastic#13920
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
When {{ .tags }} is evaluated in the module config it not written in the correct format.
This fixes that issue and also conditionally enables `publisher_pipeline.disable_host`
based on whether tags contains `forwarded` to be consistent with every other module
that allows for `var.tags` to be set (relates: elastic#13920).

For example (https://play.golang.org/p/LUr-X94msd1):

    var.tags: [foo, bar]

will be written into the config as

    tags: [foo bar]

which is a single value array containing the string "foo bar" rather than two tags.
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
For the Checkpoint module when data is forwarded to Fortinet from another host/device (this is most of the time) you don't want Filebeat to add `host`. So by default this modules add a `forwarded` tag to events. If you configure the module to not include the `forwarded` tag (e.g. `var.tags: [my_tag]`) then Filebeat will add the `host.*` fields.

Relates: elastic#13920
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
Add an example to packetbeat.yml of using the `forwarded` tag to disable `host` metadata fields when processing network data from network tap or mirror port.

Relates elastic#13920
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this issue Oct 14, 2020
Update Filebeat's test_modules.py integration test to not strip the
`host.name` field in events marked as forwarded.

Relates elastic#13920
@MarcusCaepio
Copy link
Contributor

Hi all,
just want to mention, it is also a problem with all the cisco modules:
#14933

@jlind23 jlind23 removed the Team:Services (Deprecated) Label for the former Integrations-Services team label Mar 31, 2022
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 31, 2022
@jlind23 jlind23 added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 31, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@jens-rabe
Copy link

Hi all,

I'm using Custom Journald logs from the Elastic Integrations (https://www.elastic.co/docs/current/integrations/journald), is it also possible to disable the host.* replacement?

@jens-rabe
Copy link

Hi all,

I'm using Custom Journald logs from the Elastic Integrations (https://www.elastic.co/docs/current/integrations/journald), is it also possible to disable the host.* replacement?

oh, sorry, i've added the tag "forwarded" and it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants