Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun/PipelineRuns that are failed on some reasons can't be recored in Metrics #5866

Closed
XinruZhang opened this issue Dec 12, 2022 · 6 comments
Labels
area/metrics Issues related to metrics kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@XinruZhang
Copy link
Member

XinruZhang commented Dec 12, 2022

Because recording metrics -- durationAndCountMetrics -- is only called in reconcile(), taskruns that failed/finished before calling reconcile() won't be recorded in metrics.

Expected Behavior

Metrics should take the following TaskRunsinto account

  • have preparation error
  • are cancelled right after scheduled
  • timed out before calling reconcile() (for example, be pending for a long time.)

Similarly to PipelineRuns that failed before calling reconcile() should also be included in metrics.

Actual Behavior

TaskRuns and PipelineRuns that failed before calling reconcile() won't be counted.

Steps to Reproduce the Problem

WIP

@XinruZhang XinruZhang added the kind/bug Categorizes issue or PR as related to a bug. label Dec 12, 2022
@XinruZhang
Copy link
Member Author

It would be great if we have e2e test on metrics to cover the case described in this issue.

@XinruZhang XinruZhang changed the title Data Miss for Metrics Recorder TaskRuns that are failed on some reasons can't be recored in Metrics Dec 15, 2022
@XinruZhang XinruZhang changed the title TaskRuns that are failed on some reasons can't be recored in Metrics TaskRun/PipelineRuns that are failed on some reasons can't be recored in Metrics Dec 15, 2022
@XinruZhang
Copy link
Member Author

#5853 addresses the this issue for TaskRuns.

We need to open another PR to address the same issue in PipelineRun reconciler. But before that happens, we'd like to get test ready as described in #5875.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2023
@lbernick lbernick added the area/metrics Issues related to metrics label Mar 16, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 15, 2023
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Issues related to metrics kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
Status: Done
Development

No branches or pull requests

3 participants