Tekton Taskrun Metrics are not accurate #3739

ibexmonj · 2021-02-01T20:05:12Z

Expected Behavior

Tekton metrics should match the data provided by kubectl.

Actual Behavior

I have been trying to gather the taskrun duration using tekton_taskrun_duration_seconds_sum and tekton_taskrun_duration_seconds_countby doing

sum(rate(tekton_taskrun_duration_seconds_sum{cluster_name="dev",namespace="tekton",taskrun="task-run-tekton5g444"}[1h])) / sum(rate(tekton_taskrun_duration_seconds_count{cluster_name="dev",namespace="tekton",taskrun="task-run-tekton5g444"}[1h]))

I am seeing the value being reported twice in prometheus. Sometimes the value is reported 2-3 times resulting in duplicate values for taskrun duration seconds.

Here is another example where the duration_seconds value is being reported multiple times. As if the controller is resetting and then incrementing the value again ?

The query used here is sum (rate(tekton_taskrun_duration_seconds_sum{namespace="tekton",cluster_name="dev",taskrun="task-run-tektonhfvhq"}[5m]))

Another issue is the timestamp being reported.

Here is the taskspec

`
status:
completionTime: "2021-02-01T15:00:32Z"
conditions:

lastTransitionTime: "2021-02-01T15:00:32Z"
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
podName: task-run-tekton5g444-pod-cwbvx
startTime: "2021-02-01T15:00:27Z"
`

As per above the job ran at 15:00 GMT but based on the prometheus screenshot above the timestamp being reported is 16:36 which does not quiet add up to the time being reported in kubectl get taskrun taskrun_name -o yaml

Steps to Reproduce the Problem

Enable Tekton metrics.
Enable prom scrape.
Use the above sum/count method to track taskrun duration.

Additional Info

Kubernetes version:

*Output of kubectl version:

$ kubectl version Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-14T05:13:35Z", GoVersion:"go1.15.6", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.20", GitCommit:"0ac5f81eecab42bff5ef74f18b99d8896ba7b89b", GitTreeState:"clean", BuildDate:"2020-09-09T00:48:20Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version:

Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

$ tkn version Client version: 0.15.0 Pipeline version: v0.10.2 Triggers version: v0.3.1

The text was updated successfully, but these errors were encountered:

tekton-robot · 2021-05-02T21:40:40Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-06-01T22:12:08Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

ghost · 2021-06-15T16:45:18Z

/assign sbwsg

bobcatfish · 2021-08-10T16:45:02Z

Seems like a legit bug - we should at least investigate if we can reproduce before closing.

/remove-lifecycle rotten

tekton-robot · 2021-11-08T17:28:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-12-08T17:41:27Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

lbernick · 2021-12-13T20:21:17Z

/priority important-soon
/remove-lifecycle rotten
There's a TEP for improving our metrics, this should be addressed there

wlynch · 2022-02-22T20:18:28Z

/assign @khrm

Should be fixed with #4468

tekton-robot · 2022-02-22T20:18:29Z

@wlynch: GitHub didn't allow me to assign the following users: khrm.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @khrm

Should be fixed with #4468

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wlynch · 2022-02-22T20:19:25Z

/unassign

khrm · 2022-02-23T13:49:14Z

/assign khrm

khrm · 2022-02-23T13:49:40Z

This should be fixed with #4468 or #4469

ibexmonj · 2022-02-23T13:58:03Z

Thank you for working on this.

tekton-robot · 2022-05-24T15:56:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

pritidesai · 2022-05-26T18:33:05Z

Both the PRs with the fixes are merged:

/close

ibexmonj added the kind/bug Categorizes issue or PR as related to a bug. label Feb 1, 2021

This was referenced Feb 3, 2021

webhook_request_latencies_bucket metric keeps adding new data series and became unusable #3171

Open

Tekton metrics are reporting wrong values #2844

Closed

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 1, 2021

tekton-robot assigned ghost Jun 15, 2021

ghost removed their assignment Jun 21, 2021

bobcatfish removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 10, 2021

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 8, 2021

tekton-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Dec 13, 2021

jerop mentioned this issue Jan 7, 2022

taskrun counter metrics grow up without failed taskrun #4454

Closed

jerop assigned wlynch and lbernick Jan 10, 2022

lbernick removed their assignment Feb 7, 2022

tekton-robot unassigned wlynch Feb 22, 2022

tekton-robot assigned khrm Feb 23, 2022

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2022

pritidesai closed this as completed May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tekton Taskrun Metrics are not accurate #3739

Tekton Taskrun Metrics are not accurate #3739

ibexmonj commented Feb 1, 2021 •

edited

Loading

tekton-robot commented May 2, 2021

tekton-robot commented Jun 1, 2021

ghost commented Jun 15, 2021

bobcatfish commented Aug 10, 2021

tekton-robot commented Nov 8, 2021

tekton-robot commented Dec 8, 2021

lbernick commented Dec 13, 2021

wlynch commented Feb 22, 2022

tekton-robot commented Feb 22, 2022

wlynch commented Feb 22, 2022

khrm commented Feb 23, 2022

khrm commented Feb 23, 2022 •

edited

Loading

ibexmonj commented Feb 23, 2022

tekton-robot commented May 24, 2022

pritidesai commented May 26, 2022

Tekton Taskrun Metrics are not accurate #3739

Tekton Taskrun Metrics are not accurate #3739

Comments

ibexmonj commented Feb 1, 2021 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

tekton-robot commented May 2, 2021

tekton-robot commented Jun 1, 2021

ghost commented Jun 15, 2021

bobcatfish commented Aug 10, 2021

tekton-robot commented Nov 8, 2021

tekton-robot commented Dec 8, 2021

lbernick commented Dec 13, 2021

wlynch commented Feb 22, 2022

tekton-robot commented Feb 22, 2022

wlynch commented Feb 22, 2022

khrm commented Feb 23, 2022

khrm commented Feb 23, 2022 • edited Loading

ibexmonj commented Feb 23, 2022

tekton-robot commented May 24, 2022

pritidesai commented May 26, 2022

ibexmonj commented Feb 1, 2021 •

edited

Loading

khrm commented Feb 23, 2022 •

edited

Loading