Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webhook CrashLoopBackOff with "Failed to start informers", "failed to wait for cache at index 4 to sync", when sources.knative.dev/v1beta1 SinkBindings exist #4876

Closed
maschmid opened this issue Feb 12, 2021 · 18 comments
Labels
area/sources kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@maschmid
Copy link
Contributor

Describe the bug
When sources.knative.dev/v1beta1 SinkBindings already exist on the cluster during eventing-webhook startup, the webhook doesn't ever become ready in a CrashLoopBackOff, waiting for eventing-webhook-certs to be populated

It seems that the informers go through the sinkbindings, but listing them calls the webhook itself (for conversion) before it's ready, which fails with

Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"

thus delaying the informers startup, which also delays the time it attempts to create the certs in eventing-webhook-certs, leading into a vicious cycle.

Expected behavior
webhook should become ready even if old-versioned resources exists on the cluster.

To Reproduce
(with the Operator)

  1. Create KnativeEventing (to let the CRDs be created
  2. Remove KnativeEventing
  3. Create a lot of sources.knative.dev/v1beta1 SinkBindings
  4. Create KnativeEventing again
  5. Notice the errors in eventing-webhook

Knative release version
0.19.2

Additional context

2021/02/12 09:04:46 Registering 4 clients
2021/02/12 09:04:46 Registering 4 informer factories
2021/02/12 09:04:46 Registering 6 informers
2021/02/12 09:04:46 Registering 7 controllers
{"level":"info","ts":"2021-02-12T09:04:46.806Z","caller":"logging/config.go:110","msg":"Successfully created the logger."}
{"level":"info","ts":"2021-02-12T09:04:46.806Z","caller":"logging/config.go:111","msg":"Logging level set to: info"}
{"level":"info","ts":"2021-02-12T09:04:46.806Z","caller":"logging/config.go:78","msg":"Fetch GitHub commit ID from kodata failed","error":"\"KO_DATA_PATH\" does not exist or is empty"}
{"level":"info","ts":"2021-02-12T09:04:46.806Z","logger":"eventing-webhook","caller":"profiling/server.go:59","msg":"Profiling enabled: false","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.814Z","logger":"eventing-webhook","caller":"leaderelection/context.go:46","msg":"Running with Standard leader election","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.815Z","logger":"eventing-webhook","caller":"sinkbinding/controller.go:91","msg":"Setting up event handlers","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.824Z","logger":"eventing-webhook","caller":"sharedmain/main.go:209","msg":"Starting configuration manager...","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.config-store","caller":"configmap/store.go:154","msg":"defaults config \"config-br-defaults\" config was added or updated: &config.Defaults{NamespaceDefaultsConfig:map[string]*config.ClassAndBrokerConfig(nil), ClusterDefault:(*config.ClassAndBrokerConfig)(0xc000917580)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.config-store","caller":"configmap/store.go:154","msg":"defaults config \"config-br-defaults\" config was added or updated: &config.Defaults{NamespaceDefaultsConfig:map[string]*config.ClassAndBrokerConfig(nil), ClusterDefault:(*config.ClassAndBrokerConfig)(0xc000917660)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.config-store","caller":"configmap/store.go:154","msg":"defaults config \"config-br-defaults\" config was added or updated: &config.Defaults{NamespaceDefaultsConfig:map[string]*config.ClassAndBrokerConfig(nil), ClusterDefault:(*config.ClassAndBrokerConfig)(0xc000917740)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.channel-config-store","caller":"configmap/store.go:154","msg":"channeldefaults config \"default-ch-webhook\" config was added or updated: &config.ChannelDefaults{NamespaceDefaults:map[string]*config.ChannelTemplateSpec{\"some-namespace\":(*config.ChannelTemplateSpec)(0xc00094cb10)}, ClusterDefault:(*config.ChannelTemplateSpec)(0xc00094cab0)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook.channel-config-store","caller":"configmap/store.go:154","msg":"channeldefaults config \"default-ch-webhook\" config was added or updated: &config.ChannelDefaults{NamespaceDefaults:map[string]*config.ChannelTemplateSpec{\"some-namespace\":(*config.ChannelTemplateSpec)(0xc00094d050)}, ClusterDefault:(*config.ChannelTemplateSpec)(0xc00094cff0)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook.channel-config-store","caller":"configmap/store.go:154","msg":"channeldefaults config \"default-ch-webhook\" config was added or updated: &config.ChannelDefaults{NamespaceDefaults:map[string]*config.ChannelTemplateSpec{\"some-namespace\":(*config.ChannelTemplateSpec)(0xc00094d590)}, ClusterDefault:(*config.ChannelTemplateSpec)(0xc00094d530)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook","caller":"metrics/exporter.go:160","msg":"Flushing the existing exporter before setting up the new exporter.","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook","caller":"metrics/prometheus_exporter.go:50","msg":"Created Opencensus Prometheus exporter with config: &{knative.dev/eventing eventing_webhook prometheus 5000000000 <nil> <nil>  false 9090 0.0.0.0 false   {   false}}. Start the server for Prometheus exporter.","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook","caller":"metrics/exporter.go:173","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/eventing eventing_webhook prometheus 5000000000 <nil> <nil>  false 9090 0.0.0.0 false   {   false}}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":1613120686.9246924,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
E0212 09:04:46.932309       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:47 http: TLS handshake error from 10.129.2.1:49178: server key missing
E0212 09:04:48.090654       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:48 http: TLS handshake error from 10.129.2.1:49198: server key missing
2021/02/12 09:04:49 http: TLS handshake error from 10.129.2.1:49214: server key missing
E0212 09:04:49.948761       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:50 http: TLS handshake error from 10.129.2.1:49226: server key missing
2021/02/12 09:04:51 http: TLS handshake error from 10.129.2.1:49238: server key missing
2021/02/12 09:04:52 http: TLS handshake error from 10.129.2.1:49274: server key missing
E0212 09:04:53.467969       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:53 http: TLS handshake error from 10.129.2.1:49286: server key missing
2021/02/12 09:04:54 http: TLS handshake error from 10.129.2.1:49306: server key missing
2021/02/12 09:04:55 http: TLS handshake error from 10.129.2.1:49326: server key missing
2021/02/12 09:04:56 http: TLS handshake error from 10.129.2.1:49340: server key missing
2021/02/12 09:04:57 http: TLS handshake error from 10.129.2.1:49354: server key missing
2021/02/12 09:04:58 http: TLS handshake error from 10.129.2.1:49368: server key missing
2021/02/12 09:04:59 http: TLS handshake error from 10.129.2.1:49382: server key missing
2021/02/12 09:05:00 http: TLS handshake error from 10.129.2.1:49406: server key missing
2021/02/12 09:05:01 http: TLS handshake error from 10.129.2.1:49420: server key missing
E0212 09:05:01.800201       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:05:02 http: TLS handshake error from 10.129.2.1:49456: server key missing
2021/02/12 09:05:03 http: TLS handshake error from 10.129.2.1:49470: server key missing
2021/02/12 09:05:04 http: TLS handshake error from 10.129.2.1:49484: server key missing
2021/02/12 09:05:05 http: TLS handshake error from 10.129.2.1:49504: server key missing
2021/02/12 09:05:06 http: TLS handshake error from 10.129.2.1:49510: server key missing
2021/02/12 09:05:06 http: TLS handshake error from 10.129.2.1:49518: server key missing
2021/02/12 09:05:07 http: TLS handshake error from 10.129.2.1:49526: server key missing
2021/02/12 09:05:07 http: TLS handshake error from 10.129.2.1:49536: server key missing
2021/02/12 09:05:08 http: TLS handshake error from 10.129.2.1:49546: server key missing
{"level":"fatal","ts":1613120708.1635342,"logger":"fallback","caller":"injection/injection.go:63","msg":"Failed to start informers","error":"failed to wait for cache at index 4 to sync","stacktrace":"knative.dev/pkg/injection.EnableInjectionOrDie.func1\n\t/opt/app-root/src/go/src/knative.dev/eventing/vendor/knative.dev/pkg/injection/injection.go:63\nknative.dev/pkg/injection/sharedmain.MainWithConfig\n\t/opt/app-root/src/go/src/knative.dev/eventing/vendor/knative.dev/pkg/injection/sharedmain/main.go:231\nknative.dev/pkg/injection/sharedmain.MainWithContext\n\t/opt/app-root/src/go/src/knative.dev/eventing/vendor/knative.dev/pkg/injection/sharedmain/main.go:142\nmain.main\n\t/opt/app-root/src/go/src/knative.dev/eventing/cmd/webhook/main.go:377\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:203"}
@maschmid maschmid added the kind/bug Categorizes issue or PR as related to a bug. label Feb 12, 2021
@matzew
Copy link
Member

matzew commented Feb 12, 2021

Is it just with SinkBindings ?

What about upgrades with the Operator (e.g. changing the version field, on the KnativeEventing CRD) ?

@antoineco
Copy link
Contributor

Seems very much related to knative/operator#292

@maschmid
Copy link
Contributor Author

maschmid commented Feb 12, 2021

I don't think so, the original issue in knative/operator#292 was a lack of liveness probe initial delay (the time before liveness probe failure was shorter than the time it took to acquire the lease).

In this case the informers will give up even if we lenghten the liveness probe delay, and it doesn't have to do anything with the lease (we can reproduce this one even if we delete the lease).

@matzew
Copy link
Member

matzew commented Feb 16, 2021

@maschmid do you have a yaml for

Create a lot of sources.knative.dev/v1beta1 SinkBindings

?

Not sure why I'd create Sinkibings when the KnativeEventing is de-installed ? what's the case ?

@matzew
Copy link
Member

matzew commented Feb 18, 2021

@maschmid anything for a reproducer yaml or so ?

@matzew
Copy link
Member

matzew commented Feb 18, 2021

I get

Error from server: error when creating "/home/matzew/sinkbinding_v1beta1.yaml": conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found

when I do step 3) after I deleted the KnativeEventing (with the knative/operator master branch)

@maschmid
Copy link
Contributor Author

With OpenShift serverless 1.13 , https://gist.github.com/maschmid/e0b04e9f4a6341ebf50d4076bb63a6c8

@maschmid
Copy link
Contributor Author

On eventing master, sinkbindings are stored as v1, so you cannot create v1beta1 without conversion (which will also mean that the issue is probably not present anymore on master, as the stored version and the version the webhook lists are the same, so the conversion is no longer attempted)

@matzew
Copy link
Member

matzew commented Feb 18, 2021 via email

@davidkarlsen
Copy link

+1 sitting with a broken cluster as well, OCP 4.6.16, serverless operator 1.13.0. Is there anyway to get rid of the serverless component?

@maschmid
Copy link
Contributor Author

To remove all v1beta1 sinkbindings on OpenShift,

oc delete sinkbindings.v1beta1.sources.knative.dev --all-namespaces --all

If KnativeEventing is not installed, you may need to remove their finalizers as well.

#!/usr/bin/env bash

IFS=$'\n'
for line in $(oc get sinkbinding.v1beta1.sources.knative.dev --all-namespaces --no-headers=true)
do
  unset IFS

  namespace=$(echo $line | awk '{ print $1 }')
  name=$(echo $line | awk '{print $2}')

  oc patch -n $namespace sinkbinding.v1beta1.sources.knative.dev $name --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
done

@lberk lberk added area/sources priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 1, 2021
@lberk lberk added this to the v0.22.0 milestone Mar 1, 2021
@matzew
Copy link
Member

matzew commented Mar 5, 2021

OK, I've looked at this again.

This does only happen if you use knative/operator (or distribution) 0.19.x WITH the v1beta1 of the Sinkbindig.

However, using v1 does not cause this problem!

@matzew
Copy link
Member

matzew commented Mar 5, 2021

On Sinkbinding v1, its prevented to create sinkbinding CRs, after the KnativeEventing is deleted.

See:

knativeeventing.operator.knative.dev "knative-eventing" deleted
namespace/foobar1 created
service.serving.knative.dev/event-display created
cronjob.batch/heartbeat-cron created
Error from server: error when creating "STDIN": conversion webhook for sources.knative.dev/v1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found
namespace/foobar2 created
service.serving.knative.dev/event-display created
cronjob.batch/heartbeat-cron created
Error from server: error when creating "STDIN": conversion webhook for sources.knative.dev/v1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found
namespace/foobar3 created
service.serving.knative.dev/event-display created
cronjob.batch/heartbeat-cron created
Error from server: error when creating "STDIN": conversion webhook for sources.knative.dev/v1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found
...
...

@matzew
Copy link
Member

matzew commented Mar 8, 2021

The lisiting of the Sinkbindings v1 were added in b2387db by @capri-xiyue

Should that be kept at v1beta1 ? 🤔

@guimou
Copy link

guimou commented Mar 25, 2021

Note in case it helps: this happened to me when a KafkaSource object had been created in a project without the CRD and the controller being deployed before (the user had skipped this step when configuring Serverless).
Thanks again @maschmid for the workaround, which was to delete the KafkaSource object. As it was impossible to do this directly, I deleted the CRD, then reinstalled the component properly.

@aliok
Copy link
Member

aliok commented Apr 1, 2021

I did some investigation here: knative-extensions/eventing-kafka#494 (comment)

@vaikas
Copy link
Contributor

vaikas commented Apr 26, 2021

@matzew @aliok would you mind taking a look at this and update as necessary?

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sources kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

8 participants