Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #41585) filebeat: input v2 compat uses random ID for CheckConfig #41641

Merged
merged 2 commits into from
Nov 18, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Nov 14, 2024

Proposed commit message

filebeat: input v2 compat uses random ID for CheckConfig

The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes.

Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning.

Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

  • N/A

How to test this PR locally

Run a kind cluster:

  • kind create cluster
  • select it: kubectl config use-context kind-kind
  • build filebeat docker image: DEV=true SNAPSHOT=true TEST_PLATFORMS="linux/amd64" TEST_PACKAGES="docker" mage package
  • add the docker image to kind: kind load docker-image docker.elastic.co/beats/filebeat:9.0.0-SNAPSHOT
  • start filebeat in the cluster kubectl apply -f k8s.yaml.
k8s.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: kube-system
  labels:
    k8s-app: filebeat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
  labels:
    k8s-app: filebeat
rules:
  - apiGroups: [""] # "" indicates the core API group
    resources:
      - namespaces
      - pods
      - nodes
    verbs:
      - get
      - watch
      - list
  - apiGroups: ["apps"]
    resources:
      - replicasets
    verbs: ["get", "list", "watch"]
  - apiGroups: ["batch"]
    resources:
      - jobs
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat
  # should be the namespace where filebeat is running
  namespace: kube-system
  labels:
    k8s-app: filebeat
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat-kubeadm-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: Role
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat-kubeadm-config
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: Role
  name: filebeat-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints.enabled: true
          hints.default_config:
            type: filestream
            prospector.scanner.symlinks: true
            id: filestream-kubernetes-pod-${data.kubernetes.container.id}
            take_over: true
            paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
            parsers:
              - container: ~
    processors:
      - add_cloud_metadata:
      - add_host_metadata:
    logging:
      level: debug
    output.elasticsearch:
      hosts: ["https://localhost:9200"] # you can let it failing, no need for a real ES
      protocol: "https"
      allow_older_versions: true
    #      username: "elastic"
    #      password: "changeme"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: kube-system
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: filebeat
          image: docker.elastic.co/beats/filebeat:9.0.0-SNAPSHOT
          args: [
            "-c", "/etc/filebeat.yml",
            "-e",
          ]
          env:
            - name: ELASTICSEARCH_HOST
              value: elasticsearch
            - name: ELASTICSEARCH_PORT
              value: "9200"
            - name: ELASTICSEARCH_USERNAME
              value: elastic
            - name: ELASTICSEARCH_PASSWORD
              value: changeme
            - name: ELASTIC_CLOUD_ID
              value: "cliud-id"
            - name: ELASTIC_CLOUD_AUTH
              value: "aa:bb"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            #privileged: true
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 100Mi
          volumeMounts:
            - name: config
              mountPath: /etc/filebeat.yml
              readOnly: true
              subPath: filebeat.yml
            - name: data
              mountPath: /usr/share/filebeat/data
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: varlog
              mountPath: /var/log
              readOnly: true
      volumes:
        - name: config
          configMap:
            defaultMode: 0640
            name: filebeat-config
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: varlog
          hostPath:
            path: /var/log
        # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
        - name: data
          hostPath:
            # When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
            path: /var/lib/filebeat-data
            type: DirectoryOrCreate
  • start 2 busybox: kubectl run busybox1 --image=busybox, kubectl run busybox2 --image=busybox
  • wait, you should not see an error like:
filestream input with ID 'filestream-kubernetes-pod-3f834ee8c5d20f465e3a75433c9bd58a9968e154c32c92d1ef1948797820a64a' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input.

{"log.level":"error","@timestamp":"2024-11-08T14:14:53.021Z","log.logger":"input","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*InputManager).Create","file.name":"input-logfile/manager.go","file.line":174},"message":"filestream input with ID 'filestream-kubernetes-pod-3f834ee8c5d20f465e3a75433c9bd58a9968e154c32c92d1ef1948797820a64a' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input.","service.name":"filebeat","ecs.version":"1.6.0"}
  • repeat the process with the current filebeat release, you'll see the warning

Related issues

Use cases

filebeat on kubernetes using sutodiscover


This is an automatic backport of pull request #41585 done by [Mergify](https://mergify.com).

The CheckConfig function validates a configuration by creating and immediately discarding an input. However, a potential conflict arises when CheckConfig is used with autodiscover in Kubernetes.

Autodiscover accumulates configuration changes and applies them in batches. This can be problematic if a stop event for a pod is closely followed by a start event for the same pod (e.g., during a pod restart) before the inputs are reloaded. In this scenario, autodiscover might attempt to validate the configuration for the start event while the input for the pod is already running. This would lead to filestream input manager to see two inputs with the same ID, triggering a log warning.

Although this situation generates a warning, it doesn't result in data duplication. As the second input is only created to validate the configuration and later discarded. Also the reload process will ensure only new inputs are created, any input already running won't be duplicated.

(cherry picked from commit 697ede4)
@mergify mergify bot requested a review from a team as a code owner November 14, 2024 13:10
@mergify mergify bot added the backport label Nov 14, 2024
@mergify mergify bot requested review from AndersonQ and rdner and removed request for a team November 14, 2024 13:10
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 14, 2024
@AndersonQ AndersonQ removed the request for review from rdner November 14, 2024 13:16
@AndersonQ AndersonQ enabled auto-merge (squash) November 14, 2024 13:17
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 14, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 14, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Copy link
Contributor Author

mergify bot commented Nov 18, 2024

This pull request has not been merged yet. Could you please review and merge it @AndersonQ? 🙏

@AndersonQ
Copy link
Member

buildkite test this

@AndersonQ
Copy link
Member

/test

@AndersonQ AndersonQ merged commit 214807a into 8.x Nov 18, 2024
31 checks passed
@AndersonQ AndersonQ deleted the mergify/bp/8.x/pr-41585 branch November 18, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants