otelcol_exporter_queue_size not working when using persistant queue #30608

chkp-yairt · 2024-01-16T15:25:20Z

Component(s)

extension/storage/filestorage

What happened?

Description

We are using the otelcol_exporter_queue_size metric to alert about our otel collector status.
Recently we moved to use the persistant queue to avoid losing information if the collector is restarted.
But once we did that the otelcol_exporter_queue_size no longer shows the current situation.
We tested this by blocking the receiver and observing the pvc attached to the otel pod.
The result was that we saw the folder size increase but the queue size metric did not change.

Expected Result

otelcol_exporter_queue_size should behave the same way as when running the queue in memory.

Actual Result

otelcol_exporter_queue_size does not change even when queue size is actually increasing.

Collector version

0.92.0

Environment information

Environment

OS: amazon linux

OpenTelemetry Collector configuration

receivers:
      prometheus:
        config:
          global:
            scrape_interval: 30s
            evaluation_interval: 30s
          scrape_configs:
            - job_name: "kubernetes-nodes"
              kubernetes_sd_configs:
                - role: node
              scheme: https
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              relabel_configs:
                - action: labelmap
                  regex: __meta_kubernetes_node_label_(.+)
                - target_label: __address__
                  replacement: kubernetes.default.svc:443
                - source_labels: [__meta_kubernetes_node_name]
                  regex: (.+)
                  target_label: __metrics_path__
                  replacement: /api/v1/nodes/$$1/proxy/metrics
                - source_labels: [__meta_kubernetes_node_name]
                  action: replace
                  target_label: node
                - source_labels: [__meta_kubernetes_node_name]
                  action: replace
                  target_label: machine
            - job_name: k8s-node-kube-system
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              honor_labels: false
              kubernetes_sd_configs:
              - role: node
              metrics_path: /metrics/cadvisor
              relabel_configs:
              - action: labelmap
                regex: __meta_kubernetes_service_label_(.+)
              - action: replace
                source_labels:
                - __meta_kubernetes_node_name
                target_label: kubernetes_node_name
              scheme: https
              tls_config:
                ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                insecure_skip_verify: true
extensions:
  health_check: {}
  memory_ballast:
    size_in_percentage: 40
  file_storage:
    timeout: 10s
    directory: /data/otelcol
processors:
  batch:
    send_batch_size: 1000
    timeout: 1s
    send_batch_max_size: 1500

  resource/standard:
    attributes:
      - key: ClusterName
        value: $CLUSTER_NAME
        action: upsert
      - key: k8s.cluster.name
        from_attribute: k8s-cluster
        action: upsert
exporters:
  logging:
    verbosity: detailed
  otlp:
    compression: zstd
    endpoint: internal-otel-collector:4317
    sending_queue: 
      enabled: true 
      queue_size: 1000
      storage: file_storage

service:
  telemetry:
    logs:
      level: "debug"
    metrics:
      address: 0.0.0.0:8888
  extensions: [health_check, memory_ballast, file_storage]
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [batch, resource/standard]
      exporters: [otlp]

extraVolumeMounts:
  - name: data-volume
    mountPath: /data/otelcol
initContainers: 
  - name: init
    image: busybox
    command: ['sh', '-c', 'chown -R 10001: /data/otelcol' ]
    volumeMounts:
      - name: data-volume
        mountPath: /data/otelcol
statefulset:
  # volumeClaimTemplates for a statefulset
  volumeClaimTemplates: 
    - metadata:
        name: data-volume
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi
  podManagementPolicy: "Parallel"

Log output

No response

Additional context

No response

github-actions · 2024-01-16T15:26:43Z

Pinging code owners:

extension/storage/filestorage: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax · 2024-01-16T19:00:19Z

@chkp-yairt What distribution of the collector do you use?

I cannot reproduce the issue with the contrib build. I see otelcol_exporter_queue_size reported correctly both for memory and persistent queue.

I run contrib collector build v0.92.0 with this config:

extensions:
  file_storage:
    directory: ./tmp

receivers:
  otlp:
    protocols:
      http:

exporters:
  otlphttp:
    endpoint: https://invalid-invalid.com
    sending_queue:
      enabled: true
      num_consumers: 1
      storage: file_storage

service:
  extensions: [file_storage]
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp]

Sent 100 dump otlp traces and can see

otelcol_exporter_queue_size{exporter="otlphttp",service_instance_id="24801a50-05a6-408a-9028-0e4c7ec5b26f",service_name="otelcontribcol",service_version="0.92.0-dev"} 99

Metrics use exactly the same logic.

Are you sure that metrics are not delivered? What logs do you see?

chkp-yairt · 2024-01-17T16:24:32Z

Hi @dmitryax thanks for the quick reply.
We changed our test and now we can see the metric working.
I think perhaps since we blocked the destination the data was dropped in the enqueue and never got to the sending queue itself.
Again thanks for your help and quick reply

chkp-yairt added bug Something isn't working needs triage New item requiring triage labels Jan 16, 2024

github-actions bot added the extension/storage/filestorage label Jan 16, 2024

chkp-yairt closed this as completed Jan 17, 2024

github-actions bot mentioned this issue Jan 23, 2024

Weekly Report: 2024-01-16 - 2024-01-23 #30711

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

otelcol_exporter_queue_size not working when using persistant queue #30608

otelcol_exporter_queue_size not working when using persistant queue #30608

chkp-yairt commented Jan 16, 2024

github-actions bot commented Jan 16, 2024

dmitryax commented Jan 16, 2024 •

edited

Loading

chkp-yairt commented Jan 17, 2024

otelcol_exporter_queue_size not working when using persistant queue #30608

otelcol_exporter_queue_size not working when using persistant queue #30608

Comments

chkp-yairt commented Jan 16, 2024

Component(s)

What happened?

Description

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jan 16, 2024

dmitryax commented Jan 16, 2024 • edited Loading

chkp-yairt commented Jan 17, 2024

dmitryax commented Jan 16, 2024 •

edited

Loading