Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes metadata enhancements #22189

Merged
merged 16 commits into from
Nov 11, 2020

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Oct 27, 2020

What does this PR do?

  1. adds kubernetes.host.hostname on node metadata,
  2. attach node metadata on pod metadata by default (without add_resource_metadata required to enabled) for both events coming from autodiscovered pods but also for events enriched by add_kubernetes_metadata
  3. attach namespace metadata on pod metadata by default (without add_resource_metadata required to enabled) for both events coming from autodiscovered pods but also for events enriched by add_kubernetes_metadata
  4. Enables always add_resource_metadata section for autodiscover and add_kubernetes_metadata.

Why is it important?

This information is useful in various ways. See more on the respective issue: #20434

How to test this PR manually

  1. Use autodiscover config like:

Test with autodiscover provider

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints.enabled: true
      hints.default_config:
        type: container
        paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log
  1. Make sure that Redis module is enabled and that kubernetes.node.hostname field is populated.

Test with add_kubernetes_metadata processor

filebeat.inputs:
- type: container
  paths:
    - /var/log/containers/*.log
  processors:
    - add_kubernetes_metadata:
        host: ${NODE_NAME}
        matchers:
        - logs_path:
            logs_path: "/var/log/containers/"
  1. Make sure that kubernetes.node.hostname field is populated as well as kubernetes.node.* and kubernetes.namespace.*.

Related issues

Screenshots

Filebeat & add_kubernetes_metadata:

Screenshot 2020-11-04 at 12 58 08

Autodiscover:

Screenshot 2020-11-04 at 13 07 30

@ChrsMark ChrsMark added the Team:Platforms Label for the Integrations - Platforms team label Oct 27, 2020
@ChrsMark ChrsMark self-assigned this Oct 27, 2020
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 27, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 27, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #22189 updated]

  • Start Time: 2020-11-11T05:22:06.978+0000

  • Duration: 75 min 33 sec

Test stats 🧪

Test Results
Failed 0
Passed 16413
Skipped 1344
Total 17757

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark force-pushed the add_kubernetes_hostname branch from 930933e to f9a64cb Compare October 29, 2020 14:13
@ChrsMark ChrsMark marked this pull request as ready for review October 29, 2020 14:15
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark requested a review from a team as a code owner October 29, 2020 15:32
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 29, 2020

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 16413
Skipped 1344
Total 17757

@ChrsMark ChrsMark changed the title Add hostname info for Kubernetes node Kubernetes metadata enhancements Nov 4, 2020
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark
Copy link
Member Author

ChrsMark commented Nov 5, 2020

@jsoriano @exekias this is in a reviewable state right now so feel free to have a look and comment.

- name: node.hostname
type: keyword
description: >
Kubernetes hostname as reported by the node’s kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this comment #20434 (comment), perhaps we should store this value directly in host.name (instead of storing it in kubernetes.node.hostname). What I am not sure is if then we should also populate host.id or other inventory fields and with what values.

Other option can be to postpone the decision and copy the field if we decide to use host.name.

@exekias @kaiyan-sheng wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using host.name sounds like a good idea to me, this complies with the current definition we use. It would be good to do some testing to see if what add_host_metadata reports is the same thing reported by Kubernetes. For instance, when you change the hostname by hand, with and without setting a fqdn.

I think we need a deeper discussion around where host.id should be set, what will happen when add_host_metadata kicks in but host.name is already set by add_kubernetes_metadata? What will happen if add_host_metadata happens before add_kubernetes_metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me! How about keeping the scope of this PR to just adding the node's metadata and opening a follow up issue so as to investigate further about host.name in separate story?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping the scope of this PR to just adding the node's metadata and opening a follow up issue so as to investigate further about host.name in separate story?

LGTM, worst case we will duplicate this field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a deeper discussion around where host.id should be set, what will happen when add_host_metadata kicks in but host.name is already set by add_kubernetes_metadata? What will happen if add_host_metadata happens before add_kubernetes_metadata?

Agree. This is definitely worth some time to test and discuss. When users have more than one processor enabled for Metricbeat, overwriting the same fields will be a problem. We also have this issue to discuss moving adding host.name into add_host_metadata. But then both host.id and host.name will have this same issue.

@@ -46,6 +48,7 @@ func (n *node) Generate(obj kubernetes.Resource, opts ...FieldOptions) common.Ma

meta := n.resource.Generate("node", obj, opts...)
// TODO: Add extra fields in here if need be
meta.Put("node.hostname", n.hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the hostname in obj.Status.Addresses[...].Address? I am concerned that when using cluster scope we can be setting here the hostname of the observer, and not the one the event belongs to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ good catch!

nodeWatcher kubernetes.Watcher,
namespaceWatcher kubernetes.Watcher,
metaConf *AddResourceMetadataConfig,
hostname string) MetaGen {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are passing the hostname in many places, I wonder if we may be using the wrong hostname when using cluster scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ good catch!

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark
Copy link
Member Author

ChrsMark commented Nov 9, 2020

@jsoriano thanks for the suggestion! I made a change to this direction so feel free to have another look :)

@exekias @kaiyan-sheng any thoughts about #22189 (comment)?

@ChrsMark
Copy link
Member Author

ChrsMark commented Nov 10, 2020

Looks good, only a comment about docs. As tnode and namespace metadata is always added, we should probably remove mentions to add_resource_metadata.*.enabled. Not sure if other options in add_resource_metadata are still used.

👍🏼 Thanks for the review @jsoriano! I ended up removing add_resource_metadata completely. The only concern about this could be the volume of labels and annotations that might be too big. With this change we IncludeLabels & IncludeAnnotations by default for node and namespace of a Pod. @exekias do you think this is something we can do?
Maybe this is a case of add_resource_metadata being useful and maybe we could keep it only for cases where we want to restrict the volume of labels/annotations that are added. I'm in two minds here 😅 so I would love your feedback.

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@jsoriano
Copy link
Member

With this change we IncludeLabels & IncludeAnnotations by default for node and namespace of a Pod.

I think we need to keep something to selectively include annotations, maybe we can keep add_resource_metadata only for this. Or, to have more simple options, what about using for that the same config as in the provider and the processor?

Signed-off-by: chrismark <chrismarkou92@gmail.com>
@ChrsMark
Copy link
Member Author

++ Aggreed @jsoriano ! Tried to take it to this direction in my last commit (75cb108)

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
@ChrsMark ChrsMark requested review from a team as code owners November 11, 2020 05:17
@botelastic botelastic bot added the Team:Automation Label for the Observability productivity team label Nov 11, 2020
@ChrsMark ChrsMark force-pushed the add_kubernetes_hostname branch from c80b612 to 8002c9c Compare November 11, 2020 05:20
@ChrsMark ChrsMark removed the Team:Automation Label for the Observability productivity team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan v7.11.0 [zube]: In Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add kubernetes.node.hostname field with the actual hostname of the node
5 participants