Tekton metrics are reporting wrong values #2844

RafaeLeal · 2020-06-22T17:08:25Z

Expected Behavior

Tekton metrics coherent with data gather with kubectl

Actual Behavior

I have two examples, for the tekton_pipelinerun_count metric, it's currently reporting 22 failed and 58 success, but the cluster only have 20 PipelineRuns

$ kubectl get pipelinerun --all-namespaces -o json | jq '.items | length'
20

The second example, tekton_pipelinerun_duration_seconds_sum metric seems to increase indefinitely. I expected to raise until the pipelinerun is completed, than stabilize in the same value as the duration showed with tkn pr describe <pipelinerun_name>

I also noticed that

It doesn’t raise at the same rate.
Always raises a multiple of the actual duration of the pipeline.

I suspect that every time the controller reconciles it’s adding the duration of the pipeline.

Steps to Reproduce the Problem

Enable tekton metrics
Make Prometheus scrape tekton
Run some pipelineruns

Additional Info

Kubernetes version:

Output of kubectl version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-26T06:17:09Z", GoVersion:"go1.14", Compiler:"gc",Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version: v0.13.2

Output of tkn version

$ tkn version
Client version: 0.8.0
Pipeline version: unknown

Output of kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

$ kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
v0.13.2

The text was updated successfully, but these errors were encountered:

vincent-pli · 2020-06-23T02:06:42Z

The tekton_pipelinerun_count is incorrect, that's caused by reconcile enter this block more than once:

pipeline/pkg/reconciler/pipelinerun/pipelinerun.go

Line 149 in 4fb1078

if pr.IsDone() {

vincent-pli · 2020-06-23T02:06:50Z

/kind bug

RafaeLeal · 2020-06-23T13:18:01Z

Do you think that could be the case for the duration metrics as well?
Since this block is running more than once and it calls metrics.DurationAndCount

pipeline/pkg/reconciler/pipelinerun/pipelinerun.go

Line 169 in 4fb1078

err := metrics.DurationAndCount(pr)

vincent-pli · 2020-06-24T00:06:37Z

@RafaeLeal
I think so, you could apply my fix and take a try.

pritidesai · 2020-07-29T22:29:38Z

@vincent-pli @RafaeLeal we have similar issue reported for tekton_taskrun_count, see #3029

tekton-robot · 2020-10-27T23:04:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2020-11-26T23:57:05Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

pritidesai · 2020-11-30T17:01:47Z

/remove-lifecycle rotten
@RafaeLeal you still experiencing this issue?

ibexmonj · 2021-02-03T02:54:28Z

hey folks, I am curious if what i reported here #3739 about some of the pipeline metrics is relevant to this discussion.

tekton-robot · 2021-05-04T03:40:41Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-06-03T04:14:27Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-07-03T04:32:09Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2021-07-03T04:32:10Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2020

vincent-pli mentioned this issue Jun 23, 2020

Metrics: pipelinerun_count is not correct #2848

Closed

4 tasks

pritidesai mentioned this issue Jun 29, 2020

Finish setting up events for pipeline runs #2874

Merged

4 tasks

pritidesai mentioned this issue Jul 29, 2020

Pipeline metrics reporting incorrect values #3029

Closed

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2020

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 26, 2020

tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 30, 2020

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2021

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 3, 2021

tekton-robot closed this as completed Jul 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tekton metrics are reporting wrong values #2844

Tekton metrics are reporting wrong values #2844

RafaeLeal commented Jun 22, 2020

vincent-pli commented Jun 23, 2020

vincent-pli commented Jun 23, 2020

RafaeLeal commented Jun 23, 2020 •

edited

Loading

vincent-pli commented Jun 24, 2020

pritidesai commented Jul 29, 2020

tekton-robot commented Oct 27, 2020

tekton-robot commented Nov 26, 2020

pritidesai commented Nov 30, 2020

ibexmonj commented Feb 3, 2021

tekton-robot commented May 4, 2021

tekton-robot commented Jun 3, 2021

tekton-robot commented Jul 3, 2021

tekton-robot commented Jul 3, 2021

Tekton metrics are reporting wrong values #2844

Tekton metrics are reporting wrong values #2844

Comments

RafaeLeal commented Jun 22, 2020

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

vincent-pli commented Jun 23, 2020

vincent-pli commented Jun 23, 2020

RafaeLeal commented Jun 23, 2020 • edited Loading

vincent-pli commented Jun 24, 2020

pritidesai commented Jul 29, 2020

tekton-robot commented Oct 27, 2020

tekton-robot commented Nov 26, 2020

pritidesai commented Nov 30, 2020

ibexmonj commented Feb 3, 2021

tekton-robot commented May 4, 2021

tekton-robot commented Jun 3, 2021

tekton-robot commented Jul 3, 2021

tekton-robot commented Jul 3, 2021

RafaeLeal commented Jun 23, 2020 •

edited

Loading