-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tekton Taskrun Metrics are not accurate #3739
Comments
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
/assign sbwsg |
Seems like a legit bug - we should at least investigate if we can reproduce before closing. /remove-lifecycle rotten |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
/priority important-soon |
@wlynch: GitHub didn't allow me to assign the following users: khrm. Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/unassign |
/assign khrm |
Thank you for working on this. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Both the PRs with the fixes are merged:
/close |
Expected Behavior
Tekton metrics should match the data provided by kubectl.
Actual Behavior
I have been trying to gather the taskrun duration using
tekton_taskrun_duration_seconds_sum
andtekton_taskrun_duration_seconds_count
by doingsum(rate(tekton_taskrun_duration_seconds_sum{cluster_name="dev",namespace="tekton",taskrun="task-run-tekton5g444"}[1h])) / sum(rate(tekton_taskrun_duration_seconds_count{cluster_name="dev",namespace="tekton",taskrun="task-run-tekton5g444"}[1h]))
I am seeing the value being reported twice in prometheus. Sometimes the value is reported 2-3 times resulting in duplicate values for taskrun duration seconds.
Here is another example where the duration_seconds value is being reported multiple times. As if the controller is resetting and then incrementing the value again ?
The query used here is
sum (rate(tekton_taskrun_duration_seconds_sum{namespace="tekton",cluster_name="dev",taskrun="task-run-tektonhfvhq"}[5m]))
Another issue is the timestamp being reported.
Here is the taskspec
`
status:
completionTime: "2021-02-01T15:00:32Z"
conditions:
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
podName: task-run-tekton5g444-pod-cwbvx
startTime: "2021-02-01T15:00:27Z"
`
As per above the job ran at
15:00
GMT but based on the prometheus screenshot above the timestamp being reported is16:36
which does not quiet add up to the time being reported inkubectl get taskrun taskrun_name -o yaml
Steps to Reproduce the Problem
sum/count
method to track taskrun duration.Additional Info
Kubernetes version:
*Output of
kubectl version
:$ kubectl version Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-14T05:13:35Z", GoVersion:"go1.15.6", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.20", GitCommit:"0ac5f81eecab42bff5ef74f18b99d8896ba7b89b", GitTreeState:"clean", BuildDate:"2020-09-09T00:48:20Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Tekton Pipeline version:
Output of
tkn version
orkubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'
$ tkn version Client version: 0.15.0 Pipeline version: v0.10.2 Triggers version: v0.3.1
The text was updated successfully, but these errors were encountered: