Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8sclusterreceiver] Some metric units don't follow Otel semantic conventions #10553

Closed
bertysentry opened this issue Jun 2, 2022 · 24 comments
Closed
Labels
bug Something isn't working help wanted Extra attention is needed priority:p3 Lowest receiver/k8scluster Stale

Comments

@bertysentry
Copy link
Contributor

Describe the bug
Some metrics don't follow Otel semantic conventions

Steps to reproduce
Nothing to reproduce. The problem lies in the specification of the receiver itself.

What did you expect to see?
Metrics that count objects should use units with brackets {} and not 1

Example:

  • k8s.container.restarts ==> Unit: "{restarts}"
  • k8s.cronjob.active_jobs ==> Unit: "{jobs}"
  • k8s.job.active_pods ==> Unit: "{pods}"
  • etc.

What did you see instead?
Many metrics specify unit 1 to report a count of objects:

  • k8s.container.restarts
  • k8s.cronjob.active_jobs
  • k8s.job.active_pods
  • etc.

Unit 1 must be used only for fractions and ratios.

What version did you use?
Main branch

What config did you use?
N/A

Environment
N/A

Additional context
N/A

@bertysentry bertysentry added the bug Something isn't working label Jun 2, 2022
@djaglowski djaglowski added the priority:p3 Lowest label Jun 10, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 30, 2022
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@povilasv
Copy link
Contributor

I plan to work on this once we finalize the #4367

For now the plan is to add a feature gate, which when enabled will change metric units.

crobert-1 added a commit to crobert-1/opentelemetry-collector-contrib that referenced this issue Jul 24, 2023
@dmitryax
Copy link
Member

For now the plan is to add a feature gate, which when enabled will change metric units.

I think we can go without a feature gate. It's unlikely that changing units will break many backends

dmitryax pushed a commit that referenced this issue Sep 12, 2023
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Fix Metrics that count objects should use units with brackets {} and not
1.

Marking it as bug_fix because of
#10553 (comment)

Tested with prometheus exporter:
Previously:
```
# HELP k8s_deployment_available Total number of available pods (ready for at least minReadySeconds) targeted by this deployment
# TYPE k8s_deployment_available gauge
k8s_deployment_available 1
# HELP k8s_deployment_desired Number of desired pods in this deployment
# TYPE k8s_deployment_desired gauge
k8s_deployment_desired 1
```

Now:

```
# HELP k8s_deployment_available Total number of available pods (ready for at least minReadySeconds) targeted by this deployment
# TYPE k8s_deployment_available gauge
k8s_deployment_available 1
# HELP k8s_deployment_desired Number of desired pods in this deployment
# TYPE k8s_deployment_desired gauge
k8s_deployment_desired 1
```

Atleast for this backend it's not breaking change?

**Link to tracking Issue:**  10553


#10553


**Testing:** <Describe what testing was performed and which tests were
added.>
- updated tests

**Documentation:** 
- generated
dmitryax pushed a commit that referenced this issue Sep 21, 2023
…#26708)

**Description:**  Refactor some metric units  to follow OTEL Semantic conventions

**Link to tracking Issue:** #10553
@povilasv
Copy link
Contributor

povilasv commented Sep 21, 2023

Hey! I've looked thru the exposed metrics and we are left with a couple of cases, which I'm not sure what to do.

First is the "state" metrics like:

  k8s.container.ready:
    enabled: true
    description: Whether a container has passed its readiness probe (0 for no, 1 for yes)
    unit: 1
    gauge:
      value_type: int

  k8s.pod.phase:
    enabled: true
    description: Current phase of the pod (1 - Pending, 2 - Running, 3 - Succeeded, 4 - Failed, 5 - Unknown)
    unit: 1
    gauge:
      value_type: int

  k8s.namespace.phase:
    enabled: true
    description: The current phase of namespaces (1 for active and 0 for terminating)
    unit: 1
    gauge:
      value_type: int

I think in this case unit "1" makes sense. WDYT?

And another case, is we have metrics like these:

  k8s.resource_quota.hard_limit:
    enabled: true
    description: The upper limit for a particular resource in a specific namespace. Will only be sent if a quota is specified. CPU requests/limits will be sent as millicores
    unit: 1
    gauge:
      value_type: int
    attributes:
      - resource

Where units depend on attribute. I think we will need to refactor these metrics. I am thinking doing something like we do with Node statuses (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/internal/node/nodes.go#L143-L154) move them outside mdatagen and do some custom logic.

Thoughts?

@bertysentry
Copy link
Contributor Author

Potentially we can tweek the mdatagen to explicitly require unit field while allowing it to be ""

If feasible, that would be the best way to handle this, thank you @dmitryax!

@povilasv
Copy link
Contributor

I've looked thru some mores docs and Otel Metrics API says that Instrument units are optional:

An optional unit of measure
...
A unit of measure.

Users can provide a unit, but it is up to their discretion. Therefore, this API needs to be structured to accept a unit, but > MUST NOT obligate a user to provide one.

So I think we should allow empty units.

I opened PR which allows to set units: "", but still fails validation if "unit" field is not set -> #27090

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

I've looked thru some mores docs and Otel Metrics API says that Instrument units are optional:

An optional unit of measure
...
A unit of measure.
Users can provide a unit, but it is up to their discretion. Therefore, this API needs to be structured to accept a unit, but > MUST NOT obligate a user to provide one.

So I think we should allow empty units.

Thanks for looking this up. Taking a quick look at opentelemetry-go it seems like the unit is left empty if not specified by the user so I agree we can leave this empty.

dmitryax pushed a commit that referenced this issue Sep 29, 2023
**Description:** <Describe what has changed.>

Allows setting empty metric units in mdatagen.

Example of using it
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27091/files


**Tracking Issues:**

#27089

#10553
dmitryax pushed a commit that referenced this issue Oct 10, 2023
**Description:** 

Change k8s.container.ready, k8s.pod.phase, k8s.pod.status_reason,
k8s.namespace.phase units to empty

**Link to tracking Issue:** #10553
@povilasv
Copy link
Contributor

povilasv commented Oct 11, 2023

I think only resourcequota and clusterresourcequota metrics are left with incorrect units.

We currently expose "k8s.resource_quota.hard_limit" and "k8s.resource_quota.used" metrics, with resource coming from quota.status.Hard.<resource-name> . Some resources are well known like pods| cpu| memory, but there could be custom extended resource configured (https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-for-extended-resources)

At the moment unit is the same for each resource and is configured to "1" in metadata.yaml

How should we fix this? I don't think mdatagen supports setting dynamic units?

Any ideas @TylerHelmuth / @dmitryax ? )

@bertysentry
Copy link
Contributor Author

@povilasv You could use either {resources} or empty unit.

dmitryax pushed a commit that referenced this issue Oct 17, 2023
)

change resourcequota and clusterquota metrics to use {resource} units.

**Link to tracking Issue:** #10553
@povilasv
Copy link
Contributor

Updated to use {resource} as unit. Will be released in next version.

I believe we are done with this task, all metrics seems to follow Otel semantic conventions.

@bertysentry feel free to validate and let me know if something isn't right.

JaredTan95 pushed a commit to openinsight-proj/opentelemetry-collector-contrib that referenced this issue Oct 18, 2023
…n-telemetry#27662)

change resourcequota and clusterquota metrics to use {resource} units.

**Link to tracking Issue:** open-telemetry#10553
sigilioso pushed a commit to carlossscastro/opentelemetry-collector-contrib that referenced this issue Oct 27, 2023
…n-telemetry#27662)

change resourcequota and clusterquota metrics to use {resource} units.

**Link to tracking Issue:** open-telemetry#10553
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…open-telemetry#26708)

**Description:**  Refactor some metric units  to follow OTEL Semantic conventions

**Link to tracking Issue:** open-telemetry#10553
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…en-telemetry#27051)

**Description:** 

Refactor k8s.container.restarts metric unit to be `{restart}`

**Link to tracking Issue:**
open-telemetry#10553
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…telemetry#27091)

**Description:** 

Change k8s.container.ready, k8s.pod.phase, k8s.pod.status_reason,
k8s.namespace.phase units to empty

**Link to tracking Issue:** open-telemetry#10553
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…n-telemetry#27662)

change resourcequota and clusterquota metrics to use {resource} units.

**Link to tracking Issue:** open-telemetry#10553
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 18, 2023
@povilasv
Copy link
Contributor

I believe this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed priority:p3 Lowest receiver/k8scluster Stale
Projects
None yet
Development

No branches or pull requests

7 participants