Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sink crashloop when upgrading from 0.1 to 0.2 #369

Merged
merged 2 commits into from
Jan 23, 2020

Conversation

dibyom
Copy link
Member

@dibyom dibyom commented Jan 22, 2020

Changes

Fixes #366

The crashloop is due to 2 reasons:

  1. We deprecated and removed the params field from the EventListener. So, when the reconciler tries to unmarshal an old EventListener, it fails since it cannot parse the deleted params field.

  2. The new event listener sink image expects /etc/config-logging to be mounted. However the reconciler code does not add the volumeMount for any existing sink deployments. So, the new pods go into a crashloop

To fix this, this commit does the following:

  1. Add back the params field. Also, update the mutating webhook to delete any set params field in existing eventlisteners.
  2. Add volumeMount for existing deployments if not present.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Release Notes

fixes a bug where upgrading an existing event listener from Triggers 0.1 to 0.2 causes a crashloop

@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 22, 2020
@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 68.8% -0.7

@dibyom dibyom force-pushed the crashloop branch 2 times, most recently from 573f906 to 4358750 Compare January 22, 2020 18:31
@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 70.0% 0.6

@dibyom dibyom changed the title add deprecatedParams + volumeMount when missing fix sink crashloop when upgrading from 0.1 to 0.2 Jan 22, 2020
@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 70.0% 0.6

@ncskier
Copy link
Member

ncskier commented Jan 22, 2020

I just tested your branch. I can verify that I no longer see the EventListener running into a CrashLoop when I upgrade from v0.1.0 to this branch 👍

@ncskier
Copy link
Member

ncskier commented Jan 22, 2020

When I updated my Triggers version, my EventListener's fields did not update (binding -> bindings and params getting deleted). Also, in the EventListener reconciler file, I don't see Update() called on the EventListener, so where do we update it with the changes from event_listener_defaults.go?

I got this error in the reconciler:

{"level":"error","logger":"controller.eventlistener-controller","caller":"eventlistener/eventlistener.go:364","msg":"Error updating EventListener Deployment: Deployment.apps \"el-listener\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"config-logging\"","knative.dev/controller":"eventlistener-controller","stacktrace":"github.com/tektoncd/triggers/pkg/reconciler/v1alpha1/eventlistener.(*Reconciler).reconcileDeployment\n\tgithub.com/tektoncd/triggers/pkg/reconciler/v1alpha1/eventlistener/eventlistener.go:364\ngithub.com/tektoncd/triggers/pkg/reconciler/v1alpha1/eventlistener.(*Reconciler).reconcile\n\tgithub.com/tektoncd/triggers/pkg/reconciler/v1alpha1/eventlistener/eventlistener.go:152\ngithub.com/tektoncd/triggers/pkg/reconciler/v1alpha1/eventlistener.(*Reconciler).Reconcile\n\tgithub.com/tektoncd/triggers/pkg/reconciler/v1alpha1/eventlistener/eventlistener.go:128\ngithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tgithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller/controller.go:335\ngithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller.(*Impl).Run.func1\n\tgithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller/controller.go:285"}
{"level":"error","logger":"controller.eventlistener-controller","caller":"controller/controller.go:350","msg":"Reconcile error","knative.dev/controller":"eventlistener-controller","error":"Deployment.apps \"el-listener\" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"config-logging\"","stacktrace":"github.com/tektoncd/triggers/vendor/knative.dev/pkg/controller.(*Impl).handleErr\n\tgithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller/controller.go:350\ngithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tgithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller/controller.go:336\ngithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller.(*Impl).Run.func1\n\tgithub.com/tektoncd/triggers/vendor/knative.dev/pkg/controller/controller.go:285"}
{"level":"info","logger":"controller.eventlistener-controller","caller":"controller/controller.go:337","msg":"Reconcile failed. Time taken: 525.004849ms.","knative.dev/controller":"eventlistener-controller","knative.dev/traceid":"587477b4-ad05-450b-9415-c60f1e5fcf45","knative.dev/key":"tekton-pipelines/listener"}

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@dibyom
Copy link
Member Author

dibyom commented Jan 23, 2020

Ok, so the crashloop should be fixed though we still need to address @ncskier 's comment #369 (comment)

@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 70.6% 1.1

@dibyom
Copy link
Member Author

dibyom commented Jan 23, 2020

quick and dirty script I used to test this: https://gist.github.com/dibyom/93b8455ee18e496d408ec4ef51d2b476

@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 70.6% 1.1

@ncskier
Copy link
Member

ncskier commented Jan 23, 2020

@googlebot I consent

The crashloop is due to 2 reasons:
1. We deprecated and removed the `params` field from the EventListener. So, when the reconciler tries to  unmarshal an old EventListener, it fails since it cannot parse the deleted `params` field.

2. The new event listener sink image expects `/etc/config-logging` to be mounted. However the reconciler code does not add the `volumeMount` for any existing sink deployments. So, the new pods go into a crashloop

To fix this, this commit does the following:
1. Add back the `params` field. Also, update the mutating webhook to delete any set `params` field in existing eventlisteners.
2. Add volumeMount for existing deployments if not present.

- Co-Authored-by: Brandon Walker
Fixes tektoncd#366

Signed-off-by: Dibyo Mukherjee <dibyo@google.com>
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 70.6% 1.1

@ncskier
Copy link
Member

ncskier commented Jan 23, 2020

/test pull-tekton-triggers-build-tests

Copy link
Member

@ncskier ncskier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 23, 2020
Comment on lines 84 to 86
ctxFunc := func(ctx context.Context) context.Context {
return ctx
return v1alpha1.WithUpgradeViaDefaulting(ctx)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dibyom I think golangci is complaining about this. It should be

ctxFunc := v1alpha1.WithUpgradeViaDefaulting

@tekton-robot tekton-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2020
This ensures that the automatic upgrades that we do in
`SetDefaults` (e.g. `binding` to `bindings`) are actually
persisted in etcd.

Previously we only did these upgrades on each reconcile
but did not persist them.

Signed-off-by: Dibyo Mukherjee <dibyo@google.com>
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2020
@tekton-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/eventlistener/eventlistener.go 69.4% 70.6% 1.1

Copy link
Member

@ncskier ncskier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2020
@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncskier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2020
@tekton-robot tekton-robot merged commit 1e9396a into tektoncd:master Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to v0.2.0 requires recreating event listeners
4 participants