Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elastic-agent standalone not collecting logs from multiples containers at the same pod #2375

Open
framsouza opened this issue Mar 15, 2023 · 15 comments
Labels
bug Something isn't working Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@framsouza
Copy link

framsouza commented Mar 15, 2023

We're currently having some issues running elastic-agent standalone with Hints annotations-based autodiscovery, the agent is currently collecting logs only from one container inside the pod.

To test that, you can spin up Elasticseach with ECK and enable the hints annotations, you'll see something similar the following:
image-20230228-131823

I can see below we're getting only the elastic-internal-init-filesystem which is a sidecar container, but we don't get any logs from the elasticsearch container itself.

Checking the logs inside the container, you'll see that the logs are being properly generated but not being shipped:

> kubectl -n elastic-agent exec --stdin --tty elastic-agent-pcrsj -- /bin/bash
> ls /var/log/containers/ | grep dep-c81e0127a37f4d6a90492a46fb4dd04c
es-es-index-75c7b5848f-dt7bj_dep-c81e0127a37f4d6a90492a46fb4dd04c_elastic-internal-init-filesystem-9aa7ad9b000247c36857e9da8c1999caca25ee32efb3def9a978ddc66b195270.log
es-es-index-75c7b5848f-dt7bj_dep-c81e0127a37f4d6a90492a46fb4dd04c_elasticsearch-a8f4b9c70b318fed12ff46ba08f270f8e769332fac58415237698112a93c8703.log
es-es-search-55f95d9584-grcgz_dep-c81e0127a37f4d6a90492a46fb4dd04c_elastic-internal-init-filesystem-527502045dbf631c2331eb4f69d55a248c1c9b2f04a8727e972e174898d12687.log
es-es-search-55f95d9584-grcgz_dep-c81e0127a37f4d6a90492a46fb4dd04c_elasticsearch-f23ece635ee7e04eb5d60491662e6ded539446352c71df39190b1603a14eb591.log
kb-ui-kb-b86c99f6f-nrxjs_dep-c81e0127a37f4d6a90492a46fb4dd04c_elastic-internal-init-config-9bd46a48daa2ba5f37a54a4ca4f698dc0598fbd66ffa31e51f9842b542b15b92.log
kb-ui-kb-b86c99f6f-nrxjs_dep-c81e0127a37f4d6a90492a46fb4dd04c_elastic-internal-init-keystore-a0d5c67083d680a7eeb8b7ae33a92084663120b061e26d98b19adf5b6d3376b1.log
kb-ui-kb-b86c99f6f-nrxjs_dep-c81e0127a37f4d6a90492a46fb4dd04c_kibana-7b6bcbba08af4924610cbb57f4c2d5d944bb4f5dd53f9487378b7258aac4648e.log

I had a look at the hints.go code and it seems the function responsible for collecting those logs has acontainerID parameter mapped as string and I think it should be a struct instead to be able to collect more than one containerID, see

More insights about the issue here

@framsouza framsouza added the bug Something isn't working label Mar 15, 2023
@cmacknz cmacknz added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Mar 21, 2023
@ChrsMark
Copy link
Member

Hey @framsouza , thanks for filing this issue.

The function you mention is called once per container, since we loop over the Pod's containers at

.

We will need to investigate this a bit to check what's going wrong here.

@ChrsMark
Copy link
Member

ChrsMark commented Mar 21, 2023

So I tried to reproduce the issue and it seems that it's not an issue on the hints codebase as I mentioned already in the previous comment.

Here is what I try:

target sample Pod with 2 containers:

apiVersion: v1
kind: Pod
metadata:
  name: redis
  annotations:
    co.elastic.hints/package: redis
    co.elastic.hints/data_streams: info, log
    co.elastic.hints/host: '${kubernetes.pod.ip}:6379'
    co.elastic.hints/info.period: 1m
  labels:
    k8s-app: redis
    app: redis
spec:
  containers:
  - image: redis
    imagePullPolicy: IfNotPresent
    name: redis
    ports:
    - name: redis
      containerPort: 6379
      protocol: TCP
    command:
      - redis-server
      - "--requirepass 'myred1sp@ss'"
  - image: redis
    imagePullPolicy: IfNotPresent
    name: redis2
    ports:
      - name: redis
        containerPort: 6379
        protocol: TCP
    command:
      - redis-server
      - "--requirepass 'myred1sp@ss'"

agent's config:

providers:
  kubernetes:
    kube_config: /home/chrismark/.kube/config
    node: "kind-control-plane"
    hints.enabled: true

inputs:
  - name: filestream-redis
    type: filestream
    use_output: default
    streams:
      - condition: ${kubernetes.hints.redis.log.enabled} == true or ${kubernetes.hints.redis.enabled} == true
        data_stream:
          dataset: redis.log
          type: logs
        exclude_files:
          - .gz$
        exclude_lines:
          - ^\s+[\-`('.|_]
        parsers:
          - container:
              format: auto
              stream: ${kubernetes.hints.redis.log.stream|'all'}
        paths:
          - /var/log/containers/*${kubernetes.hints.container_id}.log
        prospector:
          scanner:
            symlinks: true
        tags:
          - redis-log

Then running the inspect command to evaluate the produced variables:

./elastic-agent inspect -v --variables --variables-wait 2s

the produced output:

agent:
  logging:
    to_stderr: true
inputs:
- id: kubernetes-dd54c599-cf20-40db-8394-e1f4458c05a7.redis
  name: filestream-redis
  processors:
  - add_fields:
      fields:
        id: db404a606cf0e00b5fb596b645a87d5197155bb903e3b8f8389dabac5245c262
        image:
          name: redis
        runtime: containerd
      target: container
  - add_fields:
      fields:
        container:
          name: redis
        labels:
          app: redis
          k8s-app: redis
        namespace: default
        namespace_labels:
          kubernetes_io/metadata_name: default
        namespace_uid: a5692398-e32f-4332-979f-97f6f040faa3
        node:
          hostname: kind-control-plane
          labels:
            beta_kubernetes_io/arch: amd64
            beta_kubernetes_io/os: linux
            kubernetes_io/arch: amd64
            kubernetes_io/hostname: kind-control-plane
            kubernetes_io/os: linux
            node-role_kubernetes_io/control-plane: ""
            node_kubernetes_io/exclude-from-external-load-balancers: ""
          name: kind-control-plane
          uid: 1e3c36d2-f3e7-4d1f-ba66-75f72102cf22
        pod:
          ip: 10.244.0.7
          name: redis
          uid: dd54c599-cf20-40db-8394-e1f4458c05a7
      target: kubernetes
  - add_fields:
      fields:
        cluster:
          name: kind-kind
          url: https://127.0.0.1:35435
      target: orchestrator
  streams:
  - data_stream:
      dataset: redis.log
      type: logs
    exclude_files:
    - .gz$
    exclude_lines:
    - ^\s+[\-`('.|_]
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*db404a606cf0e00b5fb596b645a87d5197155bb903e3b8f8389dabac5245c262.log
    prospector:
      scanner:
        symlinks: true
    tags:
    - redis-log
  type: filestream
  use_output: default
- id: kubernetes-dd54c599-cf20-40db-8394-e1f4458c05a7.redis2
  name: filestream-redis
  processors:
  - add_fields:
      fields:
        id: 4a8630bdf661ddb23a697f1acb547a8487803a66a61cdeace1550ea7169a6f8e
        image:
          name: redis
        runtime: containerd
      target: container
  - add_fields:
      fields:
        container:
          name: redis2
        labels:
          app: redis
          k8s-app: redis
        namespace: default
        namespace_labels:
          kubernetes_io/metadata_name: default
        namespace_uid: a5692398-e32f-4332-979f-97f6f040faa3
        node:
          hostname: kind-control-plane
          labels:
            beta_kubernetes_io/arch: amd64
            beta_kubernetes_io/os: linux
            kubernetes_io/arch: amd64
            kubernetes_io/hostname: kind-control-plane
            kubernetes_io/os: linux
            node-role_kubernetes_io/control-plane: ""
            node_kubernetes_io/exclude-from-external-load-balancers: ""
          name: kind-control-plane
          uid: 1e3c36d2-f3e7-4d1f-ba66-75f72102cf22
        pod:
          ip: 10.244.0.7
          name: redis
          uid: dd54c599-cf20-40db-8394-e1f4458c05a7
      target: kubernetes
  - add_fields:
      fields:
        cluster:
          name: kind-kind
          url: https://127.0.0.1:35435
      target: orchestrator
  streams:
  - data_stream:
      dataset: redis.log
      type: logs
    exclude_files:
    - .gz$
    exclude_lines:
    - ^\s+[\-`('.|_]
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*4a8630bdf661ddb23a697f1acb547a8487803a66a61cdeace1550ea7169a6f8e.log
    prospector:
      scanner:
        symlinks: true
    tags:
    - redis-log
  type: filestream
  use_output: default
outputs:
  default:
    api-key: example-key
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch
providers:
  kubernetes:
    hints:
      enabled: true
    kube_config: /home/chrismark/.kube/config
    node: kind-control-plane

As we can see there are 2 inputs populated with the proper paths accordingly:

/var/log/containers/*db404a606cf0e00b5fb596b645a87d5197155bb903e3b8f8389dabac5245c262.log and
/var/log/containers/*4a8630bdf661ddb23a697f1acb547a8487803a66a61cdeace1550ea7169a6f8e.log.

I will also try with init containers just in case we miss something here. In the meantime @framsouza it would help if you could provide Agent's diagnostics so as to check the status of your populated inputs.

Last but not least it seems that we need to update our templates for hints like the one at https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-standalone/templates.d/redis.yml#L3 to also include the id similarly to what we have at https://github.com/elastic/integrations/blob/main/packages/kubernetes/data_stream/container_logs/agent/stream/stream.yml.hbs#L1 for reasons explained at elastic/integrations#3672.

@gizas @mlunadia raising this with you since all these would need some extra capacity from the team.

@framsouza
Copy link
Author

@ChrsMark I'll provide the elastic-agent diagnostic shortly. Just for clarity, this problem is happening when there's an InitContainer and not a sidecar, if you spin up ECK you'll see an initContainer called elastic-internal-init-filesystem , only the logs from this initContainer are being ingested, we're missing the logs from the main container

@ChrsMark
Copy link
Member

Thanks for clarifying @framsouza . Could you also provide the annotations you use and how you configure Elastic Agent (the k8s manifest would be enough) ?

@framsouza
Copy link
Author

framsouza commented Mar 21, 2023

@ChrsMark the test was performed on the MKI environment, you can check the labels that were being used here, see the configmap

@framsouza
Copy link
Author

framsouza commented Mar 21, 2023

@ChrsMark I'm running this configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-inputs
  labels:
    k8s-app: elastic-agent
data:
  inputs.yml: |-
    inputs:
      - name: container-log
        condition: startsWith(${kubernetes.pod.name}, "elastic-agent") != true && ${kubernetes.hints.logs.enabled} == true
        type: filestream
        processors:
          - add_fields:
              target: orchestrator.cluster
              fields:
                name: {{ .Values.cluster.name }}
        use_output: logs
        meta:
          package:
            name: kubernetes
            version: 1.9.0
        data_stream:
          namespace: default
        streams:
          - data_stream:
              dataset: kubernetes.container_logs
              type: logs
            prospector.scanner.symlinks: true
            parsers:
              - container: ~
              - ndjson:
                  target: json
                  add_error_key: true
                  message_key: message
                  overwrite_keys: true
                  ignore_decoding_error: true
              # - multiline:
              #     type: pattern
              #     pattern: '^\['
              #     negate: true
              #     match: after
            paths:
              - /var/log/containers/*${kubernetes.container.id}.log

And,

apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-datastreams
  labels:
    k8s-app: elastic-agent
data:
  elastic-agent.yml: |-  
    outputs:
      metrics:
        type: elasticsearch
        hosts:
          - >-
            ${ELASTICSEARCH_METRICS_HOST}
        username: ${ELASTICSEARCH_METRICS_USERNAME}
        password: ${ELASTICSEARCH_METRICS_PASSWORD}
      logs:
        type: elasticsearch
        allow_older_versions: true
        hosts:
          - >-
            ${ELASTICSEARCH_LOGS_HOST}
        username: ${ELASTICSEARCH_LOGS_USERNAME}
        password: ${ELASTICSEARCH_LOGS_PASSWORD}
      monitoring:
        type: elasticsearch
        allow_older_versions: true
        hosts:
          - >-
            ${ELASTICSEARCH_MONITORING_HOST}
        username: ${ELASTICSEARCH_MONITORING_USERNAME}
        password: ${ELASTICSEARCH_MONITORING_PASSWORD}
    agent:
      monitoring:
        enabled: true
        logs: true
        metrics: true
        use_output: monitoring
      logging:
        level: {{ .Values.agent.log_level | default "info" }}
        to_stderr: true
        json: true
        use_output: logs
    providers:
      kubernetes_leaderelection:
        enabled: false
      kubernetes:
        node: ${NODE_NAME}
        scope: node
        hints.enabled: true

With the following daemonset:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: elastic-agent
  labels:
    app: elastic-agent
spec:
  selector:
    matchLabels:
      app: elastic-agent
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 34%
  template:
    metadata:
      labels:
        app: elastic-agent
    spec:    
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      serviceAccountName: elastic-agent
      hostNetwork: false
      dnsPolicy: ClusterFirstWithHostNet
      initContainers:
#      - name: k8s-templates-downloader
#        image: busybox:1.28
#        command: ['sh']
#        args:
#          - -c
#          - >-
#            mkdir -p /etc/elastic-agent/inputs.d &&
#            wget -O - https://github.com/elastic/elastic-agent/archive/main.tar.gz | tar xz -C /etc/elastic-agent/inputs.d --strip=5 "elastic-agent-main/deploy/kubernetes/elastic-agent-standalone/templates.d"
#        volumeMounts:
#          - name: external-inputs
#            mountPath: /etc/elastic-agent/inputs.d       
      containers:       
        - name: elastic-agent
          image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
          args: [
              "-c", "/etc/elastic-agent/elastic-agent.yml",
              "-e",
          ]
          envFrom:
            - secretRef:
                name: elastic-agent-secrets
{{- if .Values.extraEnvsFrom }}
{{ toYaml .Values.extraEnvsFrom | nindent 12 }}
{{- end }}
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: CONFIG_PATH
              value: "/etc/elastic-agent/"
{{- if .Values.extraEnvs }}
{{ toYaml .Values.extraEnvs | nindent 12 }}
{{- end }}
          securityContext:
            runAsUser: 0
          resources:
            limits:
              memory: {{ .Values.agent.daemonset.resources.limits.memory }}
            requests:
              cpu: {{ .Values.agent.daemonset.resources.requests.cpu }}
              memory: {{ .Values.agent.daemonset.resources.requests.memory }}
          volumeMounts:
            - name: datastreams
              mountPath: /etc/elastic-agent/elastic-agent.yml
              readOnly: true
              subPath: elastic-agent.yml
            - name: inputs
              mountPath: /etc/elastic-agent/inputs.d/
              readOnly: true
            - name: proc
              mountPath: /hostfs/proc
              readOnly: true
            - name: cgroup
              mountPath: /hostfs/sys/fs/cgroup
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: journald
              mountPath: /var/log/journal
              readOnly: true
            - name: dockersock
              mountPath: /var/run/docker.sock          
      volumes:
#        - name: external-inputs
#          emptyDir: {}
        - name: datastreams
          configMap:
            defaultMode: 0640
            name:  elastic-agent-datastreams
        - name: inputs
          projected:
            sources:
            - configMap:
                name:  elastic-agent-inputs
        - name: proc
          hostPath:
            path: /proc
        - name: cgroup
          hostPath:
            path: /sys/fs/cgroup
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: varlog
          hostPath:
            path: /var/log
        - name: journald
          hostPath:
            path: /var/log/journal
        - name: dockersock
          hostPath:
            path: /var/run/docker.sock

Against the vault-ci-dev GKE cluster, there's a vault statefulset running on vault-ci-dev namespace, the logs outputs are being sent to the logging cluster (RLC). If you check access the cluster, you'll see the following output for the vault-ci-dev namespace

kubectl get pods -n vault-ci-dev            
NAME             READY   STATUS    RESTARTS   AGE
vault-ci-dev-0   3/3     Running   0          27d
vault-ci-dev-1   3/3     Running   0          27d
vault-ci-dev-2   3/3     Running   0          27d

vault container has 3 pods (filebeat, logrotate and vault), if you check the Saved search, you'll see we only see the logs from the filebeat container , we're missing logs from vault and logrotate.

See elastic-agent inject output:
inspect.txt

I didn't find a container.name called vault-ci-dev on the file mentioned above, only filebeat. The logs file exist inside the elastic-agent pod, see

ls -la /var/log/containers/
total 108
drwxr-xr-x 2 root root 4096 Mar 21 19:53 .
drwxr-xr-x 7 root root 4096 Mar 21 00:00 ..
lrwxrwxrwx 1 root root   98 Mar 20 09:04 auditbeat-b64tp_firelink-system_auditbeat-f617e4ccad8d8b1e48c57c99d4692d768227f5bffe2b5f82e54df700fd374f09.log -> /var/log/pods/firelink-system_auditbeat-b64tp_91ece8a1-232a-4a0f-bc94-8ec585ea32c0/auditbeat/0.log
lrwxrwxrwx 1 root root  112 Mar 20 09:04 auditbeat-b64tp_firelink-system_disable-journald-auditd-31e2d23834872a1beac77a43098f8e5958d1517103ab3df6f09864b5283eeaa2.log -> /var/log/pods/firelink-system_auditbeat-b64tp_91ece8a1-232a-4a0f-bc94-8ec585ea32c0/disable-journald-auditd/0.log
lrwxrwxrwx 1 root root  107 Mar 21 17:45 ea-2411e-vgs4-filebeat-ctgls_elastic-apps_filebeat-d4d11dfdba73ad9319e231e08747deb4bf3fe4f041e78fc9537bfaaee61dbc56.log -> /var/log/pods/elastic-apps_ea-2411e-vgs4-filebeat-ctgls_3632eeec-71f1-4dc4-afce-28229d16f97c/filebeat/0.log
lrwxrwxrwx 1 root root  113 Mar 21 17:45 ea-2411e-vgs4-filebeat-ctgls_elastic-apps_filebeat-setup-67eb7f808ab1a126b44292ddde9fabe2c49571d398bbe2b18fdb0aef06fbb921.log -> /var/log/pods/elastic-apps_ea-2411e-vgs4-filebeat-ctgls_3632eeec-71f1-4dc4-afce-28229d16f97c/filebeat-setup/0.log
lrwxrwxrwx 1 root root  111 Mar 15 11:31 ea-2411e-vgs4-metricbeat-pdt5z_elastic-apps_metricbeat-ffcac7b440032e1436bfd8033e6106b30456a16e2ee5e82fc48a3a4ff0b9e2c8.log -> /var/log/pods/elastic-apps_ea-2411e-vgs4-metricbeat-pdt5z_aaa49e9e-a064-4804-805e-46547f59452c/metricbeat/0.log
lrwxrwxrwx 1 root root  117 Mar 15 11:31 ea-2411e-vgs4-metricbeat-pdt5z_elastic-apps_metricbeat-setup-720796229636a6da99fa7b4393672e0abee6bdd1d70f509c1f96056967fb04b9.log -> /var/log/pods/elastic-apps_ea-2411e-vgs4-metricbeat-pdt5z_aaa49e9e-a064-4804-805e-46547f59452c/metricbeat-setup/0.log
lrwxrwxrwx 1 root root  115 Jan 30 16:55 ea-cert-manager-6c6b8cf99b-6zldj_elastic-apps_cert-manager-0be54d3896e41879742c8040f11d5058a9340d23acb9973613c37a1f85521f24.log -> /var/log/pods/elastic-apps_ea-cert-manager-6c6b8cf99b-6zldj_6055a6b5-0760-468c-b938-f1c6b0fbe171/cert-manager/0.log
lrwxrwxrwx 1 root root  123 Jan 30 16:55 ea-cert-manager-webhook-7fb9557c7c-b7672_elastic-apps_cert-manager-c08c0020754a3499b79311d0a5ef45582913033f523c76707713ec82bc9608ea.log -> /var/log/pods/elastic-apps_ea-cert-manager-webhook-7fb9557c7c-b7672_49b3e8e6-3b5d-4637-a10a-9fcc1253af27/cert-manager/0.log
lrwxrwxrwx 1 root root   98 Mar 10 17:35 ea-llama-logstash-2_elastic-apps_logstash-57745bf54530c5788405104d233485f82c0faa19a6be39f632b6d4298ce35054.log -> /var/log/pods/elastic-apps_ea-llama-logstash-2_553760b2-a897-4153-8bb1-97779bc78c62/logstash/0.log
lrwxrwxrwx 1 root root  111 Jan 30 16:55 ea-llama-riemann-669f4dbff6-fhq5v_elastic-apps_riemann-a19507be1a54742044ea1752a5015737637fb7e17f6c56f18e06b7af2dec1c36.log -> /var/log/pods/elastic-apps_ea-llama-riemann-669f4dbff6-fhq5v_d6d53cd5-309c-4073-aa40-46fa7bd84520/riemann/0.log
lrwxrwxrwx 1 root root  133 Jan 30 16:56 ea-nginx-ingress-default-backend-69d5f86986-92tb8_elastic-apps_nginx-ingress-e170d1f1811f6bb2206145c521a182a1746c9b548c468cd0e535847560675a6f.log -> /var/log/pods/elastic-apps_ea-nginx-ingress-default-backend-69d5f86986-92tb8_64c76763-020b-41d4-87ff-a7e5684ff8cb/nginx-ingress/0.log
lrwxrwxrwx 1 root root  106 Mar  8 09:48 ea-nginx-ingress-r7rt8_elastic-apps_nginx-ingress-e910ba627dd6b5dad0fe25fe93512d79c93e4f51a9aa5a09bc578e5a1b8bf5b6.log -> /var/log/pods/elastic-apps_ea-nginx-ingress-r7rt8_ecfbe0c4-aade-427b-93b2-951a07a43573/nginx-ingress/0.log
lrwxrwxrwx 1 root root   98 Mar 21 19:28 elastic-agent-45knw_default_elastic-agent-42d179093245200959664103968dbe69d3fc24bce31819306c6929cb9fab2f35.log -> /var/log/pods/default_elastic-agent-45knw_343e2150-852c-4b80-a481-85fefa023b3f/elastic-agent/0.log
lrwxrwxrwx 1 root root  106 Mar 21 19:53 elastic-agent-4th4s_kube-system_elastic-agent-8a7c313bc1ab541725148cf3bd76d9afc588fed0ee10dd3d502354e99f911399.log -> /var/log/pods/kube-system_elastic-agent-4th4s_74cae20e-677b-4438-8432-75491ec12f24/elastic-agent/14089.log
lrwxrwxrwx 1 root root  123 Jan 30 16:57 konnectivity-agent-669b59dcbc-lvcvq_kube-system_konnectivity-agent-904c8df658485bcfd231bb50b321085ad5c5351ecc891f40eb973ef569c8fbab.log -> /var/log/pods/kube-system_konnectivity-agent-669b59dcbc-lvcvq_9408569d-b797-48bf-910e-96176489d65f/konnectivity-agent/0.log
lrwxrwxrwx 1 root root  102 Jan 30 16:56 kube-dns-674789b66b-f9tfj_kube-system_dnsmasq-b39b63125017c72ae7690bf87a6769dfd3a2a1d9b0b1c27fe170d4f549b7160c.log -> /var/log/pods/kube-system_kube-dns-674789b66b-f9tfj_6d0365d6-9111-4b56-af47-922114340fa3/dnsmasq/0.log
lrwxrwxrwx 1 root root  102 Jan 30 16:55 kube-dns-674789b66b-f9tfj_kube-system_kubedns-d15d70da2a324dd243a29bd4c742eafbceb45fe096abed25994dedc41bd3dd23.log -> /var/log/pods/kube-system_kube-dns-674789b66b-f9tfj_6d0365d6-9111-4b56-af47-922114340fa3/kubedns/0.log
lrwxrwxrwx 1 root root  102 Jan 30 16:56 kube-dns-674789b66b-f9tfj_kube-system_sidecar-48cc24733c3c4b9802af8db29cdcb72bf31c0e7b42e2162ffa743d4dc1040a6d.log -> /var/log/pods/kube-system_kube-dns-674789b66b-f9tfj_6d0365d6-9111-4b56-af47-922114340fa3/sidecar/0.log
lrwxrwxrwx 1 root root  138 Jan 30 16:54 kube-proxy-gke-elastic-apps-vau-elastic-apps-vau-0f62411e-vgs4_kube-system_kube-proxy-916e0aaed5311e7cecfdf674ef6e8a4b30b4801e07b1a6379841a692c38f427a.log -> /var/log/pods/kube-system_kube-proxy-gke-elastic-apps-vau-elastic-apps-vau-0f62411e-vgs4_cf7f2620e14b362714bec6419270d2ea/kube-proxy/0.log
lrwxrwxrwx 1 root root  106 Jan 30 16:54 pdcsi-node-jxxtc_kube-system_csi-driver-registrar-77c85163b8f2c83c89b22c3ae6c39f69528c2e15808fad435edd34a65d9ace75.log -> /var/log/pods/kube-system_pdcsi-node-jxxtc_6d978c72-88b4-4a43-9680-31219e058ff5/csi-driver-registrar/0.log
lrwxrwxrwx 1 root root   99 Jan 30 16:54 pdcsi-node-jxxtc_kube-system_gce-pd-driver-74b7b705d0b73d3bd4e782e7196fccbb3a9f969ee15a36f0ca7ec5386bc42f8a.log -> /var/log/pods/kube-system_pdcsi-node-jxxtc_6d978c72-88b4-4a43-9680-31219e058ff5/gce-pd-driver/0.log
lrwxrwxrwx 1 root root   93 Feb 22 18:27 vault-ci-dev-0_vault-ci-dev_filebeat-ca93190e1cf0f96624349b601bc2719a856cc088b517d32cf03df6cf1bc4f866.log -> /var/log/pods/vault-ci-dev_vault-ci-dev-0_e0378ddf-6dd6-4891-8835-31df4ad34305/filebeat/0.log
lrwxrwxrwx 1 root root   94 Feb 22 18:27 vault-ci-dev-0_vault-ci-dev_logrotate-3e335544916342d9b155f9174f25ea6cbe7683d56efc503a165267ec798f64a6.log -> /var/log/pods/vault-ci-dev_vault-ci-dev-0_e0378ddf-6dd6-4891-8835-31df4ad34305/logrotate/0.log
lrwxrwxrwx 1 root root   90 Feb 22 18:27 vault-ci-dev-0_vault-ci-dev_vault-7fbd4418cc9e7632e5a7fd7799773a629261c9237aab2e336ec95e902b096383.log -> /var/log/pods/vault-ci-dev_vault-ci-dev-0_e0378ddf-6dd6-4891-8835-31df4ad34305/vault/0.log
lrwxrwxrwx 1 root root  136 Jan 30 16:55 vertical-pod-autoscaler-recommender-94ddc765d-wslbq_firelink-system_recommender-498ecacff8d99cddb036b234def5f65bf1cc3abcb882ee2055111d702f7189bf.log -> /var/log/pods/firelink-system_vertical-pod-autoscaler-recommender-94ddc765d-wslbq_f2040b36-f16c-46fe-8be4-fd97a2cb8c7b/recommender/0.log

@framsouza
Copy link
Author

attaching the elastic-agent diagnostic,
elastic-agent-diagnostics-2023-03-21T21-37-37Z-00.zip

@ChrsMark
Copy link
Member

ChrsMark commented Mar 22, 2023

Thanks for the extra info @framsouza!

Let me try to put all these into an order.

So first of all we have the following input:

    inputs:
      - name: container-log
        condition: startsWith(${kubernetes.pod.name}, "elastic-agent") != true && ${kubernetes.hints.logs.enabled} == true
        type: filestream
        processors:
          - add_fields:
              target: orchestrator.cluster
              fields:
                name: {{ .Values.cluster.name }}
        use_output: logs
        meta:
          package:
            name: kubernetes
            version: 1.9.0
        data_stream:
          namespace: default
        streams:
          - data_stream:
              dataset: kubernetes.container_logs
              type: logs
            prospector.scanner.symlinks: true
            parsers:
              - container: ~
              - ndjson:
                  target: json
                  add_error_key: true
                  message_key: message
                  overwrite_keys: true
                  ignore_decoding_error: true
              # - multiline:
              #     type: pattern
              #     pattern: '^\['
              #     negate: true
              #     match: after
            paths:
              - /var/log/containers/*${kubernetes.container.id}.log

So here I see that you mix 2 different things. First you use the kubernetes.hints.logs.enabled in the condition which implies that you expect a mapping coming from hints and then you use - /var/log/containers/*${kubernetes.container.id}.log.
These 2 are 2 different things because the kubernetes provider from Agent will either emit kubernetes.hints.* mappings or kubernetes.* mappings without the hints prefix.
In the diagnostics you also shared at #2375 (comment) there is no kubernetes.hints mapping populated which is weird while at the same time you have kubernetes.* mappings.

Also I see that you have remove the init container that downloads the hints templates but that's ok since you try to define your own templates. However if you are defining your own templates you can skip the whole hints mechanism and just define your own conventions. The hints feature is only useful if you are willing to use the out of the box conventions coming the predefined templates at https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone/templates.d. And keep in mind that Hints feature is still in beta.

Having said this a working template should look the one at https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-standalone/templates.d/cassandra.yml#L21, see that the path is like /var/log/containers/*${kubernetes.hints.container_id}.log.

So in that case I wonder how the provided config was working but definitely the templates need to be fixed.

Can you please fix this and try again? It would also help if you can provide a minimal example with target Pods with multiple containers so as to have a specific example to reproduce. Apparently debugging against am active vault-ci-dev GKE cluster is not a good idea.

For reference I have spotted an issue in hints codebase. At

we only emit hints mappings for containers that have defined Ports. This means that containers with no Ports defined will not have an emitted mapping. This is an issue and I will push a fix for it, however I will wait to see if we can spot any additional issues here.

@framsouza
Copy link
Author

This PR seems to be a potential fix #2386

@framsouza
Copy link
Author

framsouza commented Mar 22, 2023

I've performed the changes as @ChrsMark suggested above, the daemonset now is using the initContainer to download the templates,

      initContainers:
      - name: k8s-templates-downloader
        image: busybox:1.28
        command: ['sh']
        args:
          - -c
          - >-
            mkdir -p /etc/elastic-agent/inputs.d &&
            wget -O - https://github.com/elastic/elastic-agent/archive/main.tar.gz | tar xz -C /etc/elastic-agent/inputs.d --strip=5 "elastic-agent-main/deploy/kubernetes/elastic-agent-standalone/templates.d"
        volumeMounts:
          - name: external-inputs
            mountPath: /etc/elastic-agent/inputs.d       

However, once I applied the new configuration the logs stopped flowing and as @ChrsMark mentioned, it might be because we missed a generic input for all the container on the templates.d/

@framsouza
Copy link
Author

@ChrsMark I was wondering if this PR might be a potential fix to collect logs from all the containers in the same pod if that fixes the issue we can use our own inputs until the generic template to collect logs from all the containers is implemented, what do you think?

@ChrsMark
Copy link
Member

@framsouza #2386 is sth we definitely need. So I will go ahead and try to have it merged soon. Let's see if after this the issue is fixed for you.

In general you can add in Agent's config a generic input for catching logs for all containers. Just make sure you follow the proper convention. However this is sth that we need to fix. Do you mind filing a new issue for this so as to prioritize it properly?

Last but not least, as I mentioned already I see that hints work properly when you use specific packages in the hints like the one at #2375 (comment). Having said this, if you can provide a minimal example (like the one I shared at #2375 (comment)) of what is not working for you it would help us to understand what is the case that we don't cover. So we would need specific target Pod with annotations attached and proper Agent configuration/template. Thanks.

@ChrsMark
Copy link
Member

Update: in #2386 I add functionality to collect logs from all of the non annotated Pods by using a specific fallback.

@framsouza
Copy link
Author

framsouza commented Mar 23, 2023

great stuff, thanks for that. I'll wait for the merge and test that out. I'll keep you in the loop (#2386)

@jlind23
Copy link
Contributor

jlind23 commented May 27, 2024

@framsouza were you able to test @ChrsMark PR and if yes can I close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

4 participants