Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish results when task and pipeline runs fail #3749

Open
afrittoli opened this issue Feb 4, 2021 · 28 comments
Open

Publish results when task and pipeline runs fail #3749

afrittoli opened this issue Feb 4, 2021 · 28 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@afrittoli
Copy link
Member

Feature request

Results should be available for failed task and pipeline runs.

Today, if a TaskRun or PipelineRun is set to failed, we skip the code that grabs the results and adds them to status.

Use case

Results can be used by a failing task or pipeline to provide more context about why the failed, or how to react to the failure.

One specific use case that I'm trying to implement, is to enrich the GitHub update pipelines in Tekton based CI with the ability to publish a comment that gives the developer more context about the failure. Today that can achieved by adding that comment into the logs, however a comment on GitHub provides a more immediate feedback to developers. Specifically I'd like to improve the job that checks for the "kind label".

That would work as follow:

  • the CI TaskRun is executed.
  • if no valid label is found, a message with the list of valid labels is published to a specific result
  • the TaskRun fails
  • a cloud event is sent to the tekton-events trigger. In includes the TaskRun status in the payload
  • the trigger filters the event
  • a binding extracts the result from the payload and passes it the trigger template
  • the trigger template runs the github update pipeline, which sets the status and adds the comment

/cc @bobcatfish @vdemeester

@afrittoli afrittoli added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 4, 2021
@ghost
Copy link

ghost commented Feb 4, 2021

I've discussed this before with users in another issue.

Results are required to be populated. Pipelines fail if they're not. If you introduce an "errorMessage" result to a Task then the Task has to populate it whether there was a failure or not. That means:

# Success!
echo "" > $(results.errorMessage.path)

...in any Tasks that want to provide this.

Further, the error message results will be named differently for every single Task. There could be Tasks that have an errorMessage result, an errorContext result, an err result, etc etc. That means tooling (like the dashboard or CLI) won't have a consistent way to look up these messages / extra context and surface them differently from other results.

I'd like to propose an alternative for your consideration: Provide an explicit status.errorResult field (or named anything else) that is explicitly for the purpose of returning context on Task error. Provide a Tekton variable to its path - It can function almost identically to a Result but will be optional for a Task to return. This leaves the semantics of Results alone but provides a well-defined outlet for this kind of information.

There are other alternatives which could be considered too. But I want to make sure that the knock-on effects of supporting Results for failure scenarios are well documented before we decide to support this.

Edit: Here's the previous discussion: #3439
Edit 2: This was also requested here: #3649

@ghost
Copy link

ghost commented Feb 4, 2021

Another alternative could be to explicitly mark certain results as error-results? Or have a separate section like results just for them?

afrittoli added a commit to afrittoli/plumbing that referenced this issue Feb 5, 2021
The check labels job uses the PR pipeline resource to add a comment
to the PR with instructions for developers in case of failure.
That doesn't work though because in case of failure the sync-up side
of the pipeline resource is not processed, so the commit is not
written.

Fix that by removing the pipeline resource and writing the instructions
to the step log instead. The step log is linked in the github status
details through the tekton dashboard.

In future we might extend the github update service to include
support for comments, but that would require support for results on
failure: tektoncd/pipeline#3749

Signed-off-by: Andrea Frittoli <andrea.frittoli@uk.ibm.com>
tekton-robot pushed a commit to tektoncd/plumbing that referenced this issue Feb 5, 2021
The check labels job uses the PR pipeline resource to add a comment
to the PR with instructions for developers in case of failure.
That doesn't work though because in case of failure the sync-up side
of the pipeline resource is not processed, so the commit is not
written.

Fix that by removing the pipeline resource and writing the instructions
to the step log instead. The step log is linked in the github status
details through the tekton dashboard.

In future we might extend the github update service to include
support for comments, but that would require support for results on
failure: tektoncd/pipeline#3749

Signed-off-by: Andrea Frittoli <andrea.frittoli@uk.ibm.com>
@afrittoli
Copy link
Member Author

afrittoli commented Feb 8, 2021

I've discussed this before with users in another issue.

Results are required to be populated. Pipelines fail if they're not. If you introduce an "errorMessage" result to a Task then the Task has to populate it whether there was a failure or not. That means:

# Success!
echo "" > $(results.errorMessage.path)

This is not what I'm saying, I'm not proposing to have error specific results.
My proposal is to stop ignoring valid results in case of failure, for instance:

# Write a result which is valid for the failure case
echo "This failed. To try again re-run the pipeline" > $(results.githubcomment.path)

# Perform some task
really-useful-task
echo "This was successful" > $(results.githubcomment.path)

Today, if really-useful-task fails, no result is published.

@afrittoli afrittoli changed the title Results for failed task and pipeline runs Publish results when task and pipeline runs fail Feb 8, 2021
@ghost
Copy link

ghost commented Feb 8, 2021

My proposal is to stop ignoring valid results in case of failure, for instance:

I understand the distinction but I'm not sure I totally see the difference in outcome. Once we support emitting results on Task failure they will immediately become available for use as "error results". I'm looking at the proposal both in terms of the described use-case but also the other patterns that become available when implementing that proposal.

I guess my question is: why not also consider the alternative of a more official "error result" or "optional result" too as part of the design work?

@afrittoli
Copy link
Member Author

I didn't think in terms of error results before, I kind of expected results to be there in case of failure too, and I'm not sure why they are not. In case of a task failure we should provide as much context as available - why skip results then?
I was tempted to file this as bug, but I created it as a feature because the code explicitly skips results in case of failure.

Optional results are certainly an interesting topic, this TEP is related: tektoncd/community#240.
Default values for results may address some case but not when the result value needs to be produced by a step.

@ghost
Copy link

ghost commented Feb 8, 2021

There's definitely more discussion that could be had around this but one of the original reasons we didn't emit results on failure was because Tekton can't infer whether the result data is valid or not. Given the original purpose of Task Results was for passing along short bits of data to other Tasks (and there was no Finally on the horizon at that point) the design ultimately omitted those bits of data on failure.

@bobcatfish
Copy link
Collaborator

I didn't think in terms of error results before, I kind of expected results to be there in case of failure too, and I'm not sure why they are not. In case of a task failure we should provide as much context as available - why skip results then?

Note also that if you have a Task that depends on the result of another Task, and the Task producing the result fails, the Task consuming the result will not run.

e.g.:

TaskA
TaskB uses TaskA.results.foo

If TaskA fails or is skipped, TaskB will not run (which is where the optional results come in)

But @afrittoli maybe you're not talking about making the result available to consuming Tasks, you're talking specifically about putting it into the status of the TaskRun?

@afrittoli
Copy link
Member Author

I didn't think in terms of error results before, I kind of expected results to be there in case of failure too, and I'm not sure why they are not. In case of a task failure we should provide as much context as available - why skip results then?

Note also that if you have a Task that depends on the result of another Task, and the Task producing the result fails, the Task consuming the result will not run.

e.g.:

TaskA
TaskB uses TaskA.results.foo

If TaskA fails or is skipped, TaskB will not run (which is where the optional results come in)

But @afrittoli maybe you're not talking about making the result available to consuming Tasks, you're talking specifically about putting it into the status of the TaskRun?

Yes, indeed.

The status will stay in etcd, might be gobbled up by chains, stored in results, sent over cloud events, trigger other parts of the workflow, so there is value in having that result even if it won't be read by other tasks.

It may also be used by tasks in finally, once the corresponding TEP is implemented.

@ghost
Copy link

ghost commented Feb 9, 2021

Thought about this some more the past few days and my resistance has waned. I do think there are downsides to introducing this, and I think we'll see them quickly picked up just for error message delivery, but I can see that the practical utility might outweigh those issues? Paging @pritidesai here too for any thoughts or feedback.

If we get to a point where we see lots of Tasks added to the Catalog which utilize results to specifically emit error messages on failure I would consider that a point to introduce something like an "error result". UIs and tools could then probe and surface those messages differently to regular results. At that point though the Tekton cat might be out of the bag and users would never use them? I'm not really sure.

@bobcatfish
Copy link
Collaborator

If we're talking about ONLY making these results available in the TaskRun status then I think the behavior sounds reasonable. (i.e. the TaskRun is failling, but previous steps - or even the failing step maybe? - have successfully written results to their termination messages, so we might as well expose them in the status). If we're talking about making the results available to other Tasks in a Pipeline then it feels like a bigger feature.

I don't 100% see how that serves your use case - I agree with Scott that if the motivation is to expose info about why the TaskRun failed, results dont seem like the right way to do that, and the best answer there might even be to make it easier to surface related logs.

@pritidesai
Copy link
Member

If we're talking about ONLY making these results available in the TaskRun status then I think the behavior sounds reasonable.

It does sound reasonable but would not justify.

If the results are part of the status, the users would like to consume those results in finally or make them part of pipeline results.

expose info about why the TaskRun failed, results dont seem like the right way to do that

Yup, I agree. Results are mainly produced to be consumed by other tasks in the pipeline. We do not have any other means of communicating from a step to the users. We could introduce a step level output steps.<step-name>.output or similar and make it part of the status to make it available to other workflows/APIs.

@bobcatfish
Copy link
Collaborator

If the results are part of the status, the users would like to consume those results in finally or make them part of pipeline results.

I think it depends on who is consuming this information - if you are consuming it within the pipeline, you'll need some mechanism to provide the info via a variable (and allow for the failure of hte task :O ); but if it's a system built on top of Tekton (e.g. a dashboard) then making the results available in the status would be enough.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 17, 2021
@ppitonak
Copy link
Contributor

/remove-lifecycle stale

It would make sense to at least allow publishing pipeline results that are not depending on results of a failing task, the minimal reproducer is below. In my real-world use-case, I have a pipeline with task A publishing a result, task B using task A's result, pipeline publishes a result based on task A result. When task A succeeds and task B fails, pipeline doesn't publish result.

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: hello-pipeline
spec:
  tasks:
  - name: hello
    taskSpec:
      results:
      - name: greeting
      steps:
      - name: greet
        image: registry.access.redhat.com/ubi8/ubi-minimal
        script: |
          #!/usr/bin/env bash
          set -e
          echo -n "Hello World!" | tee $(results.greeting.path)
          exit 1
  results:
  - name: myresult
    value: absolutely unrelated to hello task

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2021
@afrittoli
Copy link
Member Author

afrittoli commented Sep 16, 2021

Yup, I agree. Results are mainly produced to be consumed by other tasks in the pipeline.

Uhm, I kind of disagree on this statement.
Results are the outputs of taskruns and pipelines.

Results of taskruns, when running within a pipeline are likely to be used by other tasks.
They are also going to be in attestations, they are written in the status and this accessible to other controllers watching the taskrun and they are sent as part of cloudevents for remote systems to use.

Results of standalone taskruns and pipelineruns are mainly meant for other pipelines or systems to use, whether is displaying them to the user or using them as input of the next part of the workflow.

@afrittoli
Copy link
Member Author

I was re-reading the whole thread, and I think there are three different things describe in the thread:

  1. taskruns emitting results in case of failure. We do not today because we don't know if the results are valid. We could change this to emitting them (relatively small code change), we would be telling our users - this is what we go from the task, but since the task fail, it may be or may be not valid.
  2. pipelinerun emitting results in case of failure. Pipelineruns are different because they basically only relay results. They may do some string concatenation but that's about it. We could at least change the current behaviour to emit results that depend on tasks that did not fail. If we implemented (1), then we could emit all results regardless.
  3. error specific result. One option here would be to pass the stderr of the failed step in a result, but that might be large and/or not helpful. We could allow task authors to specify the error, but that would be a lot more complex to implement since we would need to provide authors with a facility to create those results in case of failure, kind of a "finally step" or so. And it would require adding this facility to all existing catalog tasks.

Implementing (2) alone would solve @ppitonak issue and would be easy to implement.
We could include a fixed string (i.e. NOT_AVAILABLE) in results that are not available, but if we did that we would need then to user no breaking changes on it.

@dalbar
Copy link
Contributor

dalbar commented Sep 17, 2021

Yup, I agree. Results are mainly produced to be consumed by other tasks in the pipeline.

Uhm, I kind of disagree on this statement.
Results are the outputs of taskruns and pipelines.

Results of taskruns, when running within a pipeline are likely to be used by other tasks.
They are also going to be in attestations, they are written in the status and this accessible to other controllers watching the taskrun and they are sent as part of cloudevents for remote systems to use.

Results of standalone taskruns and pipelineruns are mainly meant for other pipelines or systems to use, whether is displaying them to the user or using them as input of the next part of the workflow.

I am in a similar situation where the system is build on top of Tekton and its controllers are collecting TaskRun results. So far we are only collecting results for successful outcomes. However, a use-case for error messages has been raised where we would like to give our users a hint for the failure. For example our git-wrapper can report missing credentials within a TaskRun. We would like to collect the errors which is currently not possible in a straightforward way.

I was re-reading the whole thread, and I think there are three different things describe in the thread:

  1. taskruns emitting results in case of failure. We do not today because we don't know if the results are valid. We could change this to emitting them (relatively small code change), we would be telling our users - this is what we go from the task, but since the task fail, it may be or may be not valid.
  2. pipelinerun emitting results in case of failure. Pipelineruns are different because they basically only relay results. They may do some string concatenation but that's about it. We could at least change the current behaviour to emit results that depend on tasks that did not fail. If we implemented (1), then we could emit all results regardless.
  3. error specific result. One option here would be to pass the stderr of the failed step in a result, but that might be large and/or not helpful. We could allow task authors to specify the error, but that would be a lot more complex to implement since we would need to provide authors with a facility to create those results in case of failure, kind of a "finally step" or so. And it would require adding this facility to all existing catalog tasks.

Implementing (2) alone would solve @ppitonak issue and would be easy to implement.
We could include a fixed string (i.e. NOT_AVAILABLE) in results that are not available, but if we did that we would need then to user no breaking changes on it.

The first option would be sufficient for the described use-case and I wouldn't mind contributing to Tekton.

@pritidesai
Copy link
Member

I have opened a discussion addressing pipeline results with a failed pipelineRun.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2022
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@pritidesai
Copy link
Member

@Yablargo onError might be beneficial for this use case. Let me know your thoughts?

@S4lt5
Copy link

S4lt5 commented Apr 28, 2022

Gotcha, Producing task result with onError might actually do the trick

@wzhanw
Copy link

wzhanw commented May 9, 2022

@pritidesai applying onError: continue will change the pipelinerun and taskrun from failed to Succeeded.
In our use case, we don't want that, we only need the task results available.

QuanZhang-William added a commit to QuanZhang-William/pipeline that referenced this issue Jul 12, 2022
Publish Pipeline Results from successful TaskRuns in Failed PipelineRuns: tektoncd#3749

Prior to this change a failed PipelineRun does not emit any PipelineResult even when some PipelineTasks referenced by the PipelineResult succeeded. This commit changes to publish Pipeline Results only from successful PipelineTaskRuns in the failed PipelineRun

Such Results are valid because they are generated by successful TaskRuns. They can provide more context and successful artifacts to users of the partially successful PipelinRun.
@danielfbm
Copy link

I was re-reading the whole thread, and I think there are three different things describe in the thread:

  1. taskruns emitting results in case of failure. We do not today because we don't know if the results are valid. We could change this to emitting them (relatively small code change), we would be telling our users - this is what we go from the task, but since the task fail, it may be or may be not valid.
  2. pipelinerun emitting results in case of failure. Pipelineruns are different because they basically only relay results. They may do some string concatenation but that's about it. We could at least change the current behaviour to emit results that depend on tasks that did not fail. If we implemented (1), then we could emit all results regardless.
  3. error specific result. One option here would be to pass the stderr of the failed step in a result, but that might be large and/or not helpful. We could allow task authors to specify the error, but that would be a lot more complex to implement since we would need to provide authors with a facility to create those results in case of failure, kind of a "finally step" or so. And it would require adding this facility to all existing catalog tasks.

Implementing (2) alone would solve @ppitonak issue and would be easy to implement. We could include a fixed string (i.e. NOT_AVAILABLE) in results that are not available, but if we did that we would need then to user no breaking changes on it.

+1 for emitting taskrun results despite of the status. In our case we want to do execute a task, collect data, and implement validation over the collected data. Although the task failed we really would like to emit the results for clarity of context and display. Currently it is possible to parse the container exit message and achieve the same result but if 1 is implemented this should be easier to integrate

Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Mar 20, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Mar 20, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Mar 20, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Apr 24, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Apr 25, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Apr 26, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Yongxuanzhang added a commit to Yongxuanzhang/pipeline that referenced this issue Apr 27, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
tekton-robot pushed a commit that referenced this issue May 18, 2023
This commit aims to fix #6383, #6139 and supports #3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
tekton-robot pushed a commit to tekton-robot/pipeline that referenced this issue May 18, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
yyzxw pushed a commit to yyzxw/pipeline that referenced this issue May 19, 2023
This commit aims to fix tektoncd#6383, tektoncd#6139 and supports tektoncd#3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
tekton-robot pushed a commit that referenced this issue May 19, 2023
This commit aims to fix #6383, #6139 and supports #3749. This commits
skip the validation if the taskrun is not successful (e.g. failed or skipped) and
omit the pipelinerun coresponding result. This way the skipped taskrun
won't get validation error for pipelinerun. And the pipelinerun error
won't be overwritten by the validation error.

Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
@lcarva
Copy link
Contributor

lcarva commented Sep 11, 2023

#5749 which has been implemented via #6510 seems to achieve the desired state. Should this issue be closed, or is this potentially about something else?

@pritidesai
Copy link
Member

Hey @lcarva We do not have a test/example to support the use of task results from a failed task in pipeline results. At least I can not recollect seeing it and remember that as something pending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Development

No branches or pull requests

10 participants