-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PipelineRun fails too eagerly on missing resources #3378
Comments
As @jlpettersson pointed out in slack, they had to put the PipelineRun at the bottom here for the test to pass: IMO a good signal that this is solid would be that this passes with the PipelineRun first. It's a decent simulation of informer lag, and aims for a better experience for users that might not want to deal with orchestrating multiple applies. |
This is very closely related to #2740. The pipelinerun and taskrun reconcilers use the client to get the Pipeline or Task respectively to avoid lister cache issues. There was a lot of good discussion in that issue. I'm not sure why it was closed. |
There is a lot of nuance, and honestly there are facets of the Tekton resource model that make this challenging (as I highlighted in that thread). The conflict here is between:
Using the client could solve both, but then puts things into conflict with scaling the system. I think an appropriate solution here would be to give the informer caches a grace period before declaring missing references fatal. This could key off of |
As the PipelineRun reconciler executes, it resolves resources using the informer's lister cache. Currently, when that cache is behind the pipeline run will immediately fail. This change builds in a buffer of `resources.MinimumAge` and a helper `resources.IsYoung` that elide this check, returning the error to the controller framework to requeue the key for later processing (with backoff). Fixes: tektoncd#3378
As the PipelineRun reconciler executes, it resolves resources using the informer's lister cache. Currently, when that cache is behind the pipeline run will immediately fail. This change builds in a buffer of `resources.MinimumAge` and a helper `resources.IsYoung` that elide this check, returning the error to the controller framework to requeue the key for later processing (with backoff). Fixes: tektoncd#3378
I have a change, which will hopefully build in a grace period for this here: #3385 Let's see how the e2e tests respond to this 🤞 |
Hi all ,I am not sure I met a same issue. The e2e test case fails on case
Then I checked the events from this e2e test namespces , seems the pod is still under
Can you help me? |
As the PipelineRun reconciler executes, it resolves resources using the informer's lister cache. Currently, when that cache is behind the pipeline run will immediately fail. This change builds in a buffer of `resources.MinimumAge` and a helper `resources.IsYoung` that elide this check, returning the error to the controller framework to requeue the key for later processing (with backoff). Fixes: tektoncd#3378
As the PipelineRun reconciler executes, it resolves resources using the informer's lister cache. Currently, when that cache is behind the pipeline run will immediately fail. This change builds in a buffer of `resources.MinimumAge` and a helper `resources.IsYoung` that elide this check, returning the error to the controller framework to requeue the key for later processing (with backoff). Fixes: #3378
Expected Behavior
PipelineRun only fails when referenced resources do not exist.
Actual Behavior
PipelineRun fails when referenced resources do not YET exist in it's informer cache at the time of reconciliation.
You can see the bug here as the
lister
is passed to fetch the resource, and on error it manifests as a permanent failure of the pipeline resource:pipeline/pkg/reconciler/pipelinerun/pipelinerun.go
Lines 334 to 341 in 1710b68
Steps to Reproduce the Problem
This is an intermittent flake in the e2e testing, which manifests with a message like:
Additional Info
This is roughly at HEAD (I observed this downstream, but the issue is clearly upstream)
The text was updated successfully, but these errors were encountered: