[receiver/kubeletstats] k8s.node.network.io metric is missing #33993

alita1991 · 2024-07-09T17:50:10Z

Component(s)

receiver/kubeletstats

What happened?

Description

k8s.node.network.io metric is not collected, while others are (k8s.node.memory., k8s.node.filesystem., etc)

Steps to Reproduce

Provision the collector using the provided config in a K3S / OpenShift environment + ClusterRole RBAC with full access

Expected Result

k8s.node.network.io metric should be collected

Actual Result

k8s.node.network.io metric not found

Collector version

0.102.1

Environment information

3x AWS EC2 VMs + K3S (3 masters + 3 workers)
3x AWS EC2 VMS + OpenShift (3 masters + 3 workers)

OpenTelemetry Collector configuration

receivers:
  kubeletstats:
    templateEnabled: '{{ index .Values "mimir-distributed" "enabled" }}'
    collection_interval: 30s
    auth_type: "serviceAccount"
    endpoint: "${env:KUBELETSTATS_ENDPOINT}"
    extra_metadata_labels:
    - k8s.volume.type
    insecure_skip_verify: true
    metric_groups:
    - container
    - pod
    - volume
    - node
processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 50
    spike_limit_percentage: 10
  k8sattributes:
    auth_type: 'serviceAccount'
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.start_time
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.node.name
  resourcedetection/env:
    detectors:
    - env
  resource/remove_container_id:
    attributes:
    - action: delete
      key: container.id
    - action: delete
      key: container_id
exporters:
  logging:
    verbosity: detailed
  otlp:
    endpoint: '{{ template "central.collector.address" $ }}'
    tls:
      insecure: true
service:
  telemetry:
    metrics:
      address: "0.0.0.0:8888"
      level: detailed
  pipelines:
    metrics/kubeletstats:
      templateEnabled: '{{ index .Values "mimir-distributed" "enabled" }}'
      receivers: [kubeletstats]
      processors: [k8sattributes, resourcedetection/env, resource/remove_container_id, memory_limiter, batch]
      exporters: [otlp]

Log output

No errors were found in the log

Additional context

Before opening the ticket, I did some debugging, but I could not find any relevant information in debug mode, I'm trying to understand why this specific metric is not collected and what I can do to investigate the problem more.

Is important to mention that the k8s_pod_network_io_bytes_total metric was collected by the receiver.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-07-09T17:50:24Z

Pinging code owners:

receiver/kubeletstats: @dmitryax @TylerHelmuth @ChrsMark

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ChrsMark · 2024-07-10T09:38:42Z

Hey @alita1991! I tried to reproduce this but I wasn't able on GKE or EKS.

I'm using the following Helm chart values:

mode: daemonset
presets:
  kubeletMetrics:
    enabled: true

config:
  exporters:
    debug:
      verbosity: normal
  receivers:
    kubeletstats:
      collection_interval: 10s
      auth_type: 'serviceAccount'
      endpoint: '${env:K8S_NODE_NAME}:10250'
      insecure_skip_verify: true
      metrics:
        k8s.node.network.io:
          enabled: true

  service:
    pipelines:
      metrics:
        receivers: [kubeletstats]
        processors: [batch]
        exporters: [debug]

And deploy the Collector with helm install daemonset open-telemetry/opentelemetry-collector --set image.repository="otel/opentelemetry-collector-k8s" --set image.tag="0.104.0" --values ds_k8s_metrics.yaml

GKE

v1.29.4-gke.1043004

> k logs -f daemonset-opentelemetry-collector-agent-24x6f | grep k8s.node.network.io
k8s.node.network.io{interface=eth0,direction=receive} 2508490408
k8s.node.network.io{interface=eth0,direction=transmit} 1329730075
k8s.node.network.io{interface=eth0,direction=receive} 2541570721
k8s.node.network.io{interface=eth0,direction=transmit} 1330038333
k8s.node.network.io{interface=eth0,direction=receive} 2541728902
k8s.node.network.io{interface=eth0,direction=transmit} 1330216803
k8s.node.network.io{interface=eth0,direction=receive} 2541792120
k8s.node.network.io{interface=eth0,direction=transmit} 1330323914
k8s.node.network.io{interface=eth0,direction=receive} 2541974411
k8s.node.network.io{interface=eth0,direction=transmit} 1330557979

EKS

v1.30.0-eks-036c24b

> k logs -f daemonset-opentelemetry-collector-agent-58csx | grep k8s.node.network.io
k8s.node.network.io{interface=eth0,direction=receive} 7511134123
k8s.node.network.io{interface=eth0,direction=transmit} 21146466749
k8s.node.network.io{interface=eth0,direction=receive} 7545084343
k8s.node.network.io{interface=eth0,direction=transmit} 21146550460
k8s.node.network.io{interface=eth0,direction=receive} 7545094892
k8s.node.network.io{interface=eth0,direction=transmit} 21146552331

I suggest you verify what the /stats/summary endpoint provide. I suspect it gives no values for this metric to be exported or sth weird. You can run the following debug Pod to get this info. Note that you need to use the same service account that the Collector uses (if the Collector is already running) in order to get access to this endpoint (in my case it was named daemonset-opentelemetry-collector):

kubectl run my-shell --rm -i --tty --image=ubuntu --overrides='{ "apiVersion": "v1", "spec": { "serviceAccountName": "daemonset-opentelemetry-collector", "hostNetwork": true }  }' -- bash
apt update
apt-get install curl jq
export token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) && curl -H "Authorization: Bearer $token" https://$HOSTNAME:10250/stats/summary --insecure

In my case it gave:

{
  "time": "2024-07-10T09:34:01Z",
  "name": "eth0",
  "rxBytes": 3234464903,
  "rxErrors": 0,
  "txBytes": 1197870852,
  "txErrors": 0,
  "interfaces": [
    {
      "name": "eth0",
      "rxBytes": 3234464903,
      "rxErrors": 0,
      "txBytes": 1197870852,
      "txErrors": 0
    }
  ]
}

alita1991 · 2024-07-10T12:53:43Z

Hi,

I tested using your config and got 0 data points for k8s.node.network.io, what could it be? I don't have any RBAC-related issues in the logs.

kubectl logs daemonset-opentelemetry-collector-agent-dw7hf | grep k8s.node.network.io | wc -l
0

For k8s.pod.network.io, is working like expected:

kubectl logs daemonset-opentelemetry-collector-agent-dw7hf | grep k8s.pod.network.io | wc -l
2546

I also tested the scrape via curl, here is the result for one of the nodes:

"node":{
"network":{
"time":"2024-07-10T12:42:59Z",
"name":"",
"interfaces":[
{
"name":"ens5",
"rxBytes":481114242884,
"rxErrors":0,
"txBytes":715126064226,
"txErrors":0
},
{
"name":"ovs-system",
"rxBytes":0,
"rxErrors":0,
"txBytes":0,
"txErrors":0
},
{
"name":"ovn-k8s-mp0",
"rxBytes":5821168746,
"rxErrors":0,
"txBytes":47539598446,
"txErrors":0
},
{
"name":"genev_sys_6081",
"rxBytes":265742543652,
"rxErrors":0,
"txBytes":370984422928,
"txErrors":0
},

ChrsMark · 2024-07-10T13:45:51Z

Thank's @alita1991 for checking this!

It seems that in your case the top level info is missing compared to what I see:

{
  "time": "2024-07-10T09:34:01Z",
  "name": "eth0",
  "rxBytes": 3234464903,
  "rxErrors": 0,
  "txBytes": 1197870852,
  "txErrors": 0,
  "interfaces": [
    {
      "name": "eth0",
      "rxBytes": 3234464903,
      "rxErrors": 0,
      "txBytes": 1197870852,
      "txErrors": 0
    }
  ]
}

Also removing these lines from the testing sample at

opentelemetry-collector-contrib/receiver/kubeletstatsreceiver/testdata/stats-summary.json

Lines 78 to 81 in 8183bd9

    
           "rxBytes": 948305524, 
        
           "rxErrors": 0, 
        
           "txBytes": 12542266, 
        
           "txErrors": 0,

makes the unit tests to fail.

The missing information is about the default interface according to https://pkg.go.dev/k8s.io/kubelet@v0.29.3/pkg/apis/stats/v1alpha1#NetworkStats.

Indeed, checking the code it seems that we only extract the top level tx/rx metrics: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.104.0/receiver/kubeletstatsreceiver/internal/kubelet/network.go#L24-L42.

So the question here is if we should consider it as a bug/limitation and expand in order to collect metrics for all of the interfaces instead of just the default. Note that the Interfaces list includes the default, so just by iterating this we will have the default's interface metrics included.

I would like to hear what @TylerHelmuth and @dmitryax think here.

Update: I see it was already reported for pod's metrics at #30196

github-actions · 2024-09-11T03:31:37Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/kubeletstats: @dmitryax @TylerHelmuth @ChrsMark

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ChrsMark · 2024-09-11T07:26:43Z

This is covered by #30196. I'm going to close this one and we can continue on the other issue.

alita1991 added bug Something isn't working needs triage New item requiring triage labels Jul 9, 2024

github-actions bot added the receiver/kubeletstats label Jul 9, 2024

This was referenced Jul 12, 2024

k8s.pod.network.io gives data only from eth0 #30196

Open

[receiver/kubeletstats] collect network metrics for all interfaces #30626

Closed

This was referenced Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Closed

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Closed

ChrsMark mentioned this issue Jul 29, 2024

[receiver/kubeletstats] Collect network metrics from all interfaces #34287

Closed

This was referenced Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Closed

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Closed

This was referenced Aug 13, 2024

Weekly Report: 2024-08-06 - 2024-08-13 #34626

Closed

Weekly Report: 2024-08-13 - 2024-08-20 #34743

Closed

This was referenced Aug 27, 2024

Weekly Report: 2024-08-20 - 2024-08-27 #34856

Closed

Weekly Report: 2024-08-27 - 2024-09-03 #34966

Closed

github-actions bot mentioned this issue Sep 10, 2024

Weekly Report: 2024-09-03 - 2024-09-10 #35086

Closed

github-actions bot added the Stale label Sep 11, 2024

ChrsMark closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/kubeletstats] k8s.node.network.io metric is missing #33993

[receiver/kubeletstats] k8s.node.network.io metric is missing #33993

alita1991 commented Jul 9, 2024 •

edited

Loading

github-actions bot commented Jul 9, 2024

ChrsMark commented Jul 10, 2024

alita1991 commented Jul 10, 2024 •

edited

Loading

ChrsMark commented Jul 10, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024

ChrsMark commented Sep 11, 2024

[receiver/kubeletstats] k8s.node.network.io metric is missing #33993

[receiver/kubeletstats] k8s.node.network.io metric is missing #33993

Comments

alita1991 commented Jul 9, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jul 9, 2024

ChrsMark commented Jul 10, 2024

GKE

EKS

alita1991 commented Jul 10, 2024 • edited Loading

ChrsMark commented Jul 10, 2024 • edited Loading

github-actions bot commented Sep 11, 2024

ChrsMark commented Sep 11, 2024

alita1991 commented Jul 9, 2024 •

edited

Loading

alita1991 commented Jul 10, 2024 •

edited

Loading

ChrsMark commented Jul 10, 2024 •

edited

Loading