Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes logs and metrics are not being collected #1894

Closed
MichaelKatsoulis opened this issue Dec 6, 2022 · 11 comments
Closed

Kubernetes logs and metrics are not being collected #1894

MichaelKatsoulis opened this issue Dec 6, 2022 · 11 comments
Assignees
Labels
blocker bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.6.0

Comments

@MichaelKatsoulis
Copy link
Contributor

Description

Using latest elastic-agent, Kubernetes Integration is not able to collect metrics and logs from Kubernetes pods.

Versions

stack version: latest 8.6.0-SNAPSHOT
kubernetes package version: 1.29.2

Steps to reproduce

  1. Bring up the elastic-stack with elastic-package elastic-package stack up --version=8.6.0-SNAPSHOT -v -d
  2. Create a kubernetes cluster with kind kind create cluster
  3. Connect kubernetes and elastic-package networks
for i in $(docker ps | grep kindest | awk '{ print $1 }'); do
    docker network connect elastic-package-stack_default   "$i"
done
  1. Edit Fleet settings and create new elasticsearch output with https://elasticsearch:9200 and Advanced YAML configuration ssl.verification_mode: "none"
  2. Create a new policy with only kubernetes integration included (metrics and container logs enabled). Don't add agent yet.
  3. In the policy settings , set the Output for integrations and for monitoring to the new one created in step 4.
  4. Follow steps to add a new agent on k8s with the new policy and apply the manifest
  5. Agent is unhealthy, metrics and logs are not collected from k8s cluster

Possible Logs showing the problem

{"log.level":"error","@timestamp":"2022-12-06T09:56:04.861Z","message":"Error fetching data for metricset beat.state: error making http request: Get \"http://unix/state\": dial unix /tmp/elastic-agent/c5a0b2d1b450271b74baae9ce0ca8aa9422f780455266eb071232abadc074f68.sock: connect: no such file or directory","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log.origin":{"file.line":256,"file.name":"module/wrapper.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
Exiting: could not start the HTTP server for the API: listen unix /tmp/elastic-agent/c5a0b2d1b450271b74baae9ce0ca8aa9422f780455266eb071232abadc074f68.sock: bind: no such file or directory

{"log.level":"error","@timestamp":"2022-12-06T09:56:04.900Z","message":"Error fetching data for metricset beat.stats: monitored beat is using Elasticsearch output but cluster UUID cannot be determined","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log.origin":{"file.line":256,"file.name":"module/wrapper.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-12-06T09:56:03.068Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa","state":"FAILED","message":"Failed: pid '5770' exited with code '1'","inputs":[{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kube-proxy-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-events-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kubelet-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kube-state-metrics-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kube-apiserver-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"}],"output":{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"}},"ecs.version":"1.6.0"}

If we remove the metrics collection from the policy and leave only the container_logs then most of the error messages disappear and the only constant one is

Error fetching data for metricset beat.stats: monitored beat is using Elasticsearch output but cluster UUID cannot be determined

This is related to #1860. Nothing else suspicious is logged. But no pod logs are collected. So something prevents filebeat from running.

@cmacknz
Copy link
Member

cmacknz commented Dec 6, 2022

Hmm, there is definitely nothing obviously causing in the log snippers here. If you have the complete set of agent logs or even better diagnostics collected with elastic-agent diagnostics collect it would help.

It may be worth retesting this after elastic/beats#33921 is merged, as the bugs fixed there have been causing several problems with the wrong input type being started or the wrong data streams being used.

@MichaelKatsoulis
Copy link
Contributor Author

MichaelKatsoulis commented Dec 7, 2022

I packaged elastic agent from the latest 8.6 branch which includes the fix you mention as well and run it inside kubernetes with kubernetes integration. Still agent is unhealthy. One more error I see is that right after it creates new components for logs and metrics it logs:

{"log.level":"info","@timestamp":"2022-12-07T15:04:04.311Z","message":"Exiting: error loading config file: stat filebeat.yml: no such file or directory","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-a75bb440-7611-11ed-a1bd-3d459848770d","type":"filestream"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-12-07T15:04:05.024Z","message":"Exiting: error loading config file: stat metricbeat.yml: no such file or directory","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-a75bb440-7611-11ed-a1bd-3d459848770d","type":"kubernetes/metrics"},"ecs.version":"1.6.0"}

Diagnostics file
elastic-agent-diagnostics-2022-12-07T15-14-29Z-00.zip

@cmacknz
Copy link
Member

cmacknz commented Dec 8, 2022

Hmm, error loading config file: stat filebeat.yml suggests something is still wrong with the container paths. @michalpristas is probably the best person to take a look a this since he has fixed a few related issue already.

@MichaelKatsoulis
Copy link
Contributor Author

Latest try.

  1. With docker.elastic.co/beats/elastic-agent:8.6.0-SNAPSHOT agent image. Still no logs and metrics collected
    elastic-agent-diagnostics-2022-12-13T09-11-31Z-00.zip

  2. With custom agent image built with Dockerfile create by DEV=true EXTERNAL=true PLATFORMS=linux/amd64 TYPES=docker mage package in latest 8.6 branch. Still nothing is collected.
    elastic-agent-diagnostics-2022-12-13T09-22-22Z-00.zip

@MichaelKatsoulis
Copy link
Contributor Author

After #1936 I tested latest agent image at https://snapshots.elastic.co/8.6.0-6aa72183/summary-8.6.0-SNAPSHOT.html#elastic-agent.

diagnostics:
elastic-agent-diagnostics-2022-12-14T10-29-31Z-00.zip

Metrics and Logs are collected but there is another kind of problem.
Variables substitution is not happening so Logs do not contain any kubernetes metadata.
So the add_fields processor is not working.
I run inside the elastic-agent container the elastic-agent inspect command.

id: filestream-container-logs-e36ece52-2118-4351-bd1c-878476fb78ce
  meta:
    package:
      name: kubernetes
      version: 1.29.2
  name: kubernetes-1
  package_policy_id: e36ece52-2118-4351-bd1c-878476fb78ce
  revision: 2
  streams:
  - data_stream:
      dataset: kubernetes.container_logs
      type: logs
    id: kubernetes-container-logs-${kubernetes.pod.name}-${kubernetes.container.id}
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*${kubernetes.container.id}.log
    prospector.scanner.symlinks: true
  type: filestream
  use_output: 92408750-7b86-11ed-8e60-1ff6d9b1b365

Same for metrics:

  streams:
  - add_metadata: true
    condition: ${kubernetes_leaderelection.leader} == true
    data_stream:
      dataset: kubernetes.event
      type: metrics

Troubleshooting in the code with @ChrsMark we found out that the kubernetes provider adds the mappings and processors in the emitted events to the controller.

We tried out locally with this config:

providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: "kind-control-plane"

inputs:
  - name: filestream-container_logs
    type: filestream
    use_output: default
    streams:
        data_stream:
          dataset: container_logs
          type: logs
        exclude_files:
          - .gz$
        exclude_lines:
          - ^\s+[\-`('.|_]
        parsers:
          - container:
              format: auto
              stream: 'all'
        paths:
          - /var/log/containers/*${kubernetes.container.id}.log

and the result of inspect is

./elastic-agent -v inspect --variables         
agent:
  logging:
    to_stderr: true
inputs: []
outputs:
  default:
    api-key: example-key
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch
providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: kind-control-plane

which means that variable resolution is not working.

@ChrsMark
Copy link
Member

Hey! Here is another test, more realistic:

agent.yml: (change the kube_config to point to your config)

providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: "kind-control-plane"

inputs:
  - name: container-log
    id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
    type: filestream
    use_output: default
    meta:
      package:
        name: kubernetes
        version: 1.9.0
    data_stream:
      namespace: default
    streams:
      - data_stream:
          dataset: kubernetes.container_logs
          type: logs
        condition: ${kubernetes.labels.app} == 'redis'
        prospector.scanner.symlinks: true
        parsers:
          - container: ~
          # - ndjson:
          #     target: json
          # - multiline:
          #     type: pattern
          #     pattern: '^\['
          #     negate: true
          #     match: after
        paths:
          - /var/log/containers/*${kubernetes.container.id}.log
  - name: redis
    type: redis/metrics
    use_output: default
    meta:
      package:
        name: redis
        version: 0.3.6
    data_stream:
      namespace: default
    streams:
      - data_stream:
          dataset: redis.info
          type: metrics
        metricsets:
          - info
        hosts:
          - '${kubernetes.pod.ip}:6379'
        idle_timeout: 20s
        maxconn: 10
        network: some/${kubernetes.container.id}
        period: 10s
        condition: ${kubernetes.labels.app} == 'redis'

Then deploy the target Pod:

redis.yml

apiVersion: v1
kind: Pod
metadata:
  name: redis
  labels:
    k8s-app: redis
    app: redis
spec:
  containers:
  - image: redis
    imagePullPolicy: IfNotPresent
    name: redis
    ports:
    - name: redis
      containerPort: 6379
      protocol: TCP

Then from the latest main (5fa81b59c5dc2fee74ff5c456e87466e658bd135) I run the ./elastic-agent -v inspect --variables and the inputs are not populated at all.

Running for 8.4 and 8.5 the ./elastic-agent -v inspect output -o default I see the inputs being populated properly.

Is this something that we miss when running the new form of inspect command?

@michalpristas
Copy link
Contributor

it may be that inspect command is broken. what do you see when you run diagnostics command unzip archive and check computed_config.yaml

@MichaelKatsoulis
Copy link
Contributor Author

MichaelKatsoulis commented Dec 14, 2022

Good catch. I checked the computed_config.yaml and I can see the processors being added but they are at the higher level of input. Example:

- data_stream:
    namespace: default
  id: filestream-container-logs-e36ece52-2118-4351-bd1c-878476fb78ce-kubernetes-54075aa3-aa9b-46f0-8fe9-d0cc4317b012.kindnet-cni
  meta:
    package:
      name: kubernetes
      version: 1.29.2
  name: kubernetes-1
  original_id: filestream-container-logs-e36ece52-2118-4351-bd1c-878476fb78ce
  package_policy_id: e36ece52-2118-4351-bd1c-878476fb78ce
  policy:
    revision: 4
  processors:
  - add_fields:
      fields:
        id: c699fc238f42bc2453bcc9c2f6638f7b3d73b80a23cc7f6d17c0e6faad84709b
        image:
          name: docker.io/kindest/kindnetd:v20210326-1e038dc5
        runtime: containerd
      target: container
  - add_fields:
      fields:
        cluster:
          name: kind
          url: kind-control-plane:6443
      target: orchestrator
  - add_fields:
      fields:
        container:
          name: kindnet-cni
        daemonset:
          name: kindnet
        labels:
          app: kindnet
          controller-revision-hash: 5b547684d9
          k8s-app: kindnet
          pod-template-generation: "1"
          tier: node
        namespace: kube-system
        namespace_labels:
          kubernetes_io/metadata_name: kube-system
        namespace_uid: 441aaf27-f736-4860-b07e-896c63e61fd4
        node:
          hostname: kind-worker2
          labels:
            beta_kubernetes_io/arch: amd64
            beta_kubernetes_io/os: linux
            kubernetes_io/arch: amd64
            kubernetes_io/hostname: kind-worker2
            kubernetes_io/os: linux
          name: kind-worker2
          uid: 43854694-93ab-422e-8f30-0e20fb2e7b85
        pod:
          ip: 172.18.0.3
          name: kindnet-8b884
          uid: 54075aa3-aa9b-46f0-8fe9-d0cc4317b012
      target: kubernetes
  revision: 2
  streams:
  - data_stream:
      dataset: kubernetes.container_logs
      type: logs
    id: kubernetes-container-logs-kindnet-8b884-c699fc238f42bc2453bcc9c2f6638f7b3d73b80a23cc7f6d17c0e6faad84709b
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*c699fc238f42bc2453bcc9c2f6638f7b3d73b80a23cc7f6d17c0e6faad84709b.log
    prospector.scanner.symlinks: true
  type: filestream

But the fields added by the processors are not populated into elasticsearch. Maybe the higher lever processors are not respected in the lower level streams? Should I open a new issue for this?

@michalpristas
Copy link
Contributor

michalpristas commented Dec 14, 2022

i'd go with a separate issue to track this
@cmacknz is this something known/seen before?

@cmacknz
Copy link
Member

cmacknz commented Dec 14, 2022

Not known, thanks for creating a new issue.

@michalpristas
Copy link
Contributor

as metrics and logs are present but without meta, can we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.6.0
Projects
None yet
Development

No branches or pull requests

4 participants