Kubernetes logs and metrics are not being collected #1894

MichaelKatsoulis · 2022-12-06T10:30:38Z

Description

Using latest elastic-agent, Kubernetes Integration is not able to collect metrics and logs from Kubernetes pods.

Versions

stack version: latest 8.6.0-SNAPSHOT
kubernetes package version: 1.29.2

Steps to reproduce

Bring up the elastic-stack with elastic-package elastic-package stack up --version=8.6.0-SNAPSHOT -v -d
Create a kubernetes cluster with kind kind create cluster
Connect kubernetes and elastic-package networks

for i in $(docker ps | grep kindest | awk '{ print $1 }'); do
    docker network connect elastic-package-stack_default   "$i"
done

Edit Fleet settings and create new elasticsearch output with https://elasticsearch:9200 and Advanced YAML configuration ssl.verification_mode: "none"
Create a new policy with only kubernetes integration included (metrics and container logs enabled). Don't add agent yet.
In the policy settings , set the Output for integrations and for monitoring to the new one created in step 4.
Follow steps to add a new agent on k8s with the new policy and apply the manifest
Agent is unhealthy, metrics and logs are not collected from k8s cluster

Possible Logs showing the problem

{"log.level":"error","@timestamp":"2022-12-06T09:56:04.861Z","message":"Error fetching data for metricset beat.state: error making http request: Get \"http://unix/state\": dial unix /tmp/elastic-agent/c5a0b2d1b450271b74baae9ce0ca8aa9422f780455266eb071232abadc074f68.sock: connect: no such file or directory","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log.origin":{"file.line":256,"file.name":"module/wrapper.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

Exiting: could not start the HTTP server for the API: listen unix /tmp/elastic-agent/c5a0b2d1b450271b74baae9ce0ca8aa9422f780455266eb071232abadc074f68.sock: bind: no such file or directory


{"log.level":"error","@timestamp":"2022-12-06T09:56:04.900Z","message":"Error fetching data for metricset beat.stats: monitored beat is using Elasticsearch output but cluster UUID cannot be determined","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log.origin":{"file.line":256,"file.name":"module/wrapper.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

{"log.level":"info","@timestamp":"2022-12-06T09:56:03.068Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa","state":"FAILED","message":"Failed: pid '5770' exited with code '1'","inputs":[{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kube-proxy-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-events-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kubelet-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kube-state-metrics-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"},{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa-kubernetes/metrics-kube-apiserver-b11c396f-7f02-4b3f-8a72-9defd1888d28","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"}],"output":{"id":"kubernetes/metrics-f13b52e0-754a-11ed-bbcc-4118767788aa","state":"FAILED","message":"Failed: pid '5770' exited with code '1'"}},"ecs.version":"1.6.0"}

If we remove the metrics collection from the policy and leave only the container_logs then most of the error messages disappear and the only constant one is

Error fetching data for metricset beat.stats: monitored beat is using Elasticsearch output but cluster UUID cannot be determined

This is related to #1860. Nothing else suspicious is logged. But no pod logs are collected. So something prevents filebeat from running.

The text was updated successfully, but these errors were encountered:

cmacknz · 2022-12-06T18:58:20Z

Hmm, there is definitely nothing obviously causing in the log snippers here. If you have the complete set of agent logs or even better diagnostics collected with elastic-agent diagnostics collect it would help.

It may be worth retesting this after elastic/beats#33921 is merged, as the bugs fixed there have been causing several problems with the wrong input type being started or the wrong data streams being used.

MichaelKatsoulis · 2022-12-07T15:23:10Z

I packaged elastic agent from the latest 8.6 branch which includes the fix you mention as well and run it inside kubernetes with kubernetes integration. Still agent is unhealthy. One more error I see is that right after it creates new components for logs and metrics it logs:

{"log.level":"info","@timestamp":"2022-12-07T15:04:04.311Z","message":"Exiting: error loading config file: stat filebeat.yml: no such file or directory","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-a75bb440-7611-11ed-a1bd-3d459848770d","type":"filestream"},"ecs.version":"1.6.0"}

{"log.level":"info","@timestamp":"2022-12-07T15:04:05.024Z","message":"Exiting: error loading config file: stat metricbeat.yml: no such file or directory","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-a75bb440-7611-11ed-a1bd-3d459848770d","type":"kubernetes/metrics"},"ecs.version":"1.6.0"}

Diagnostics file
elastic-agent-diagnostics-2022-12-07T15-14-29Z-00.zip

cmacknz · 2022-12-08T20:20:28Z

Hmm, error loading config file: stat filebeat.yml suggests something is still wrong with the container paths. @michalpristas is probably the best person to take a look a this since he has fixed a few related issue already.

MichaelKatsoulis · 2022-12-13T09:26:53Z

Latest try.

With docker.elastic.co/beats/elastic-agent:8.6.0-SNAPSHOT agent image. Still no logs and metrics collected
elastic-agent-diagnostics-2022-12-13T09-11-31Z-00.zip
With custom agent image built with Dockerfile create by DEV=true EXTERNAL=true PLATFORMS=linux/amd64 TYPES=docker mage package in latest 8.6 branch. Still nothing is collected.
elastic-agent-diagnostics-2022-12-13T09-22-22Z-00.zip

MichaelKatsoulis · 2022-12-14T11:05:06Z

After #1936 I tested latest agent image at https://snapshots.elastic.co/8.6.0-6aa72183/summary-8.6.0-SNAPSHOT.html#elastic-agent.

diagnostics:
elastic-agent-diagnostics-2022-12-14T10-29-31Z-00.zip

Metrics and Logs are collected but there is another kind of problem.
Variables substitution is not happening so Logs do not contain any kubernetes metadata.
So the add_fields processor is not working.
I run inside the elastic-agent container the elastic-agent inspect command.

id: filestream-container-logs-e36ece52-2118-4351-bd1c-878476fb78ce
  meta:
    package:
      name: kubernetes
      version: 1.29.2
  name: kubernetes-1
  package_policy_id: e36ece52-2118-4351-bd1c-878476fb78ce
  revision: 2
  streams:
  - data_stream:
      dataset: kubernetes.container_logs
      type: logs
    id: kubernetes-container-logs-${kubernetes.pod.name}-${kubernetes.container.id}
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*${kubernetes.container.id}.log
    prospector.scanner.symlinks: true
  type: filestream
  use_output: 92408750-7b86-11ed-8e60-1ff6d9b1b365

Same for metrics:

  streams:
  - add_metadata: true
    condition: ${kubernetes_leaderelection.leader} == true
    data_stream:
      dataset: kubernetes.event
      type: metrics

Troubleshooting in the code with @ChrsMark we found out that the kubernetes provider adds the mappings and processors in the emitted events to the controller.

We tried out locally with this config:

providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: "kind-control-plane"

inputs:
  - name: filestream-container_logs
    type: filestream
    use_output: default
    streams:
        data_stream:
          dataset: container_logs
          type: logs
        exclude_files:
          - .gz$
        exclude_lines:
          - ^\s+[\-`('.|_]
        parsers:
          - container:
              format: auto
              stream: 'all'
        paths:
          - /var/log/containers/*${kubernetes.container.id}.log

and the result of inspect is

./elastic-agent -v inspect --variables         
agent:
  logging:
    to_stderr: true
inputs: []
outputs:
  default:
    api-key: example-key
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch
providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: kind-control-plane

which means that variable resolution is not working.

ChrsMark · 2022-12-14T11:39:55Z

Hey! Here is another test, more realistic:

agent.yml: (change the kube_config to point to your config)

providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: "kind-control-plane"

inputs:
  - name: container-log
    id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
    type: filestream
    use_output: default
    meta:
      package:
        name: kubernetes
        version: 1.9.0
    data_stream:
      namespace: default
    streams:
      - data_stream:
          dataset: kubernetes.container_logs
          type: logs
        condition: ${kubernetes.labels.app} == 'redis'
        prospector.scanner.symlinks: true
        parsers:
          - container: ~
          # - ndjson:
          #     target: json
          # - multiline:
          #     type: pattern
          #     pattern: '^\['
          #     negate: true
          #     match: after
        paths:
          - /var/log/containers/*${kubernetes.container.id}.log
  - name: redis
    type: redis/metrics
    use_output: default
    meta:
      package:
        name: redis
        version: 0.3.6
    data_stream:
      namespace: default
    streams:
      - data_stream:
          dataset: redis.info
          type: metrics
        metricsets:
          - info
        hosts:
          - '${kubernetes.pod.ip}:6379'
        idle_timeout: 20s
        maxconn: 10
        network: some/${kubernetes.container.id}
        period: 10s
        condition: ${kubernetes.labels.app} == 'redis'

Then deploy the target Pod:

redis.yml

apiVersion: v1
kind: Pod
metadata:
  name: redis
  labels:
    k8s-app: redis
    app: redis
spec:
  containers:
  - image: redis
    imagePullPolicy: IfNotPresent
    name: redis
    ports:
    - name: redis
      containerPort: 6379
      protocol: TCP

Then from the latest main (5fa81b59c5dc2fee74ff5c456e87466e658bd135) I run the ./elastic-agent -v inspect --variables and the inputs are not populated at all.

Running for 8.4 and 8.5 the ./elastic-agent -v inspect output -o default I see the inputs being populated properly.

Is this something that we miss when running the new form of inspect command?

michalpristas · 2022-12-14T11:51:22Z

it may be that inspect command is broken. what do you see when you run diagnostics command unzip archive and check computed_config.yaml

MichaelKatsoulis · 2022-12-14T11:56:34Z

Good catch. I checked the computed_config.yaml and I can see the processors being added but they are at the higher level of input. Example:

- data_stream:
    namespace: default
  id: filestream-container-logs-e36ece52-2118-4351-bd1c-878476fb78ce-kubernetes-54075aa3-aa9b-46f0-8fe9-d0cc4317b012.kindnet-cni
  meta:
    package:
      name: kubernetes
      version: 1.29.2
  name: kubernetes-1
  original_id: filestream-container-logs-e36ece52-2118-4351-bd1c-878476fb78ce
  package_policy_id: e36ece52-2118-4351-bd1c-878476fb78ce
  policy:
    revision: 4
  processors:
  - add_fields:
      fields:
        id: c699fc238f42bc2453bcc9c2f6638f7b3d73b80a23cc7f6d17c0e6faad84709b
        image:
          name: docker.io/kindest/kindnetd:v20210326-1e038dc5
        runtime: containerd
      target: container
  - add_fields:
      fields:
        cluster:
          name: kind
          url: kind-control-plane:6443
      target: orchestrator
  - add_fields:
      fields:
        container:
          name: kindnet-cni
        daemonset:
          name: kindnet
        labels:
          app: kindnet
          controller-revision-hash: 5b547684d9
          k8s-app: kindnet
          pod-template-generation: "1"
          tier: node
        namespace: kube-system
        namespace_labels:
          kubernetes_io/metadata_name: kube-system
        namespace_uid: 441aaf27-f736-4860-b07e-896c63e61fd4
        node:
          hostname: kind-worker2
          labels:
            beta_kubernetes_io/arch: amd64
            beta_kubernetes_io/os: linux
            kubernetes_io/arch: amd64
            kubernetes_io/hostname: kind-worker2
            kubernetes_io/os: linux
          name: kind-worker2
          uid: 43854694-93ab-422e-8f30-0e20fb2e7b85
        pod:
          ip: 172.18.0.3
          name: kindnet-8b884
          uid: 54075aa3-aa9b-46f0-8fe9-d0cc4317b012
      target: kubernetes
  revision: 2
  streams:
  - data_stream:
      dataset: kubernetes.container_logs
      type: logs
    id: kubernetes-container-logs-kindnet-8b884-c699fc238f42bc2453bcc9c2f6638f7b3d73b80a23cc7f6d17c0e6faad84709b
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*c699fc238f42bc2453bcc9c2f6638f7b3d73b80a23cc7f6d17c0e6faad84709b.log
    prospector.scanner.symlinks: true
  type: filestream

But the fields added by the processors are not populated into elasticsearch. Maybe the higher lever processors are not respected in the lower level streams? Should I open a new issue for this?

michalpristas · 2022-12-14T11:57:56Z

i'd go with a separate issue to track this
@cmacknz is this something known/seen before?

cmacknz · 2022-12-14T15:19:35Z

Not known, thanks for creating a new issue.

michalpristas · 2022-12-14T15:32:48Z

as metrics and logs are present but without meta, can we close this?

MichaelKatsoulis added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.6.0 blocker labels Dec 6, 2022

MichaelKatsoulis mentioned this issue Dec 6, 2022

Input IDs are not unique at runtime when dynamic providers are used #1751

Closed

michalpristas mentioned this issue Dec 13, 2022

Fixed Monitoring.Prepare for IDs exceeding filepath length limit #1936

Merged

6 tasks

cmacknz assigned michalpristas Dec 13, 2022

MichaelKatsoulis mentioned this issue Dec 14, 2022

Fields added by processors in filestream input are not populated into Elasticsearch #1943

Closed

MichaelKatsoulis closed this as completed Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes logs and metrics are not being collected #1894

Kubernetes logs and metrics are not being collected #1894

MichaelKatsoulis commented Dec 6, 2022

cmacknz commented Dec 6, 2022

MichaelKatsoulis commented Dec 7, 2022 •

edited

Loading

cmacknz commented Dec 8, 2022

MichaelKatsoulis commented Dec 13, 2022

MichaelKatsoulis commented Dec 14, 2022

ChrsMark commented Dec 14, 2022

michalpristas commented Dec 14, 2022

MichaelKatsoulis commented Dec 14, 2022 •

edited

Loading

michalpristas commented Dec 14, 2022 •

edited

Loading

cmacknz commented Dec 14, 2022

michalpristas commented Dec 14, 2022

Kubernetes logs and metrics are not being collected #1894

Kubernetes logs and metrics are not being collected #1894

Comments

MichaelKatsoulis commented Dec 6, 2022

Description

Versions

Steps to reproduce

Possible Logs showing the problem

cmacknz commented Dec 6, 2022

MichaelKatsoulis commented Dec 7, 2022 • edited Loading

cmacknz commented Dec 8, 2022

MichaelKatsoulis commented Dec 13, 2022

MichaelKatsoulis commented Dec 14, 2022

ChrsMark commented Dec 14, 2022

michalpristas commented Dec 14, 2022

MichaelKatsoulis commented Dec 14, 2022 • edited Loading

michalpristas commented Dec 14, 2022 • edited Loading

cmacknz commented Dec 14, 2022

michalpristas commented Dec 14, 2022

MichaelKatsoulis commented Dec 7, 2022 •

edited

Loading

MichaelKatsoulis commented Dec 14, 2022 •

edited

Loading

michalpristas commented Dec 14, 2022 •

edited

Loading