webhook CrashLoopBackOff with "Failed to start informers", "failed to wait for cache at index 4 to sync", when sources.knative.dev/v1beta1 SinkBindings exist #4876

maschmid · 2021-02-12T09:24:18Z

Describe the bug
When sources.knative.dev/v1beta1 SinkBindings already exist on the cluster during eventing-webhook startup, the webhook doesn't ever become ready in a CrashLoopBackOff, waiting for eventing-webhook-certs to be populated

It seems that the informers go through the sinkbindings, but listing them calls the webhook itself (for conversion) before it's ready, which fails with

Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"

thus delaying the informers startup, which also delays the time it attempts to create the certs in eventing-webhook-certs, leading into a vicious cycle.

Expected behavior
webhook should become ready even if old-versioned resources exists on the cluster.

To Reproduce
(with the Operator)

Create KnativeEventing (to let the CRDs be created
Remove KnativeEventing
Create a lot of sources.knative.dev/v1beta1 SinkBindings
Create KnativeEventing again
Notice the errors in eventing-webhook

Knative release version
0.19.2

Additional context

2021/02/12 09:04:46 Registering 4 clients
2021/02/12 09:04:46 Registering 4 informer factories
2021/02/12 09:04:46 Registering 6 informers
2021/02/12 09:04:46 Registering 7 controllers
{"level":"info","ts":"2021-02-12T09:04:46.806Z","caller":"logging/config.go:110","msg":"Successfully created the logger."}
{"level":"info","ts":"2021-02-12T09:04:46.806Z","caller":"logging/config.go:111","msg":"Logging level set to: info"}
{"level":"info","ts":"2021-02-12T09:04:46.806Z","caller":"logging/config.go:78","msg":"Fetch GitHub commit ID from kodata failed","error":"\"KO_DATA_PATH\" does not exist or is empty"}
{"level":"info","ts":"2021-02-12T09:04:46.806Z","logger":"eventing-webhook","caller":"profiling/server.go:59","msg":"Profiling enabled: false","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.814Z","logger":"eventing-webhook","caller":"leaderelection/context.go:46","msg":"Running with Standard leader election","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.815Z","logger":"eventing-webhook","caller":"sinkbinding/controller.go:91","msg":"Setting up event handlers","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.824Z","logger":"eventing-webhook","caller":"sharedmain/main.go:209","msg":"Starting configuration manager...","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.config-store","caller":"configmap/store.go:154","msg":"defaults config \"config-br-defaults\" config was added or updated: &config.Defaults{NamespaceDefaultsConfig:map[string]*config.ClassAndBrokerConfig(nil), ClusterDefault:(*config.ClassAndBrokerConfig)(0xc000917580)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.config-store","caller":"configmap/store.go:154","msg":"defaults config \"config-br-defaults\" config was added or updated: &config.Defaults{NamespaceDefaultsConfig:map[string]*config.ClassAndBrokerConfig(nil), ClusterDefault:(*config.ClassAndBrokerConfig)(0xc000917660)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.config-store","caller":"configmap/store.go:154","msg":"defaults config \"config-br-defaults\" config was added or updated: &config.Defaults{NamespaceDefaultsConfig:map[string]*config.ClassAndBrokerConfig(nil), ClusterDefault:(*config.ClassAndBrokerConfig)(0xc000917740)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.829Z","logger":"eventing-webhook.channel-config-store","caller":"configmap/store.go:154","msg":"channeldefaults config \"default-ch-webhook\" config was added or updated: &config.ChannelDefaults{NamespaceDefaults:map[string]*config.ChannelTemplateSpec{\"some-namespace\":(*config.ChannelTemplateSpec)(0xc00094cb10)}, ClusterDefault:(*config.ChannelTemplateSpec)(0xc00094cab0)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook.channel-config-store","caller":"configmap/store.go:154","msg":"channeldefaults config \"default-ch-webhook\" config was added or updated: &config.ChannelDefaults{NamespaceDefaults:map[string]*config.ChannelTemplateSpec{\"some-namespace\":(*config.ChannelTemplateSpec)(0xc00094d050)}, ClusterDefault:(*config.ChannelTemplateSpec)(0xc00094cff0)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook.channel-config-store","caller":"configmap/store.go:154","msg":"channeldefaults config \"default-ch-webhook\" config was added or updated: &config.ChannelDefaults{NamespaceDefaults:map[string]*config.ChannelTemplateSpec{\"some-namespace\":(*config.ChannelTemplateSpec)(0xc00094d590)}, ClusterDefault:(*config.ChannelTemplateSpec)(0xc00094d530)}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook","caller":"metrics/exporter.go:160","msg":"Flushing the existing exporter before setting up the new exporter.","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook","caller":"metrics/prometheus_exporter.go:50","msg":"Created Opencensus Prometheus exporter with config: &{knative.dev/eventing eventing_webhook prometheus 5000000000 <nil> <nil>  false 9090 0.0.0.0 false   {   false}}. Start the server for Prometheus exporter.","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":"2021-02-12T09:04:46.830Z","logger":"eventing-webhook","caller":"metrics/exporter.go:173","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/eventing eventing_webhook prometheus 5000000000 <nil> <nil>  false 9090 0.0.0.0 false   {   false}}","knative.dev/pod":"eventing-webhook-5c5fb5765c-r49xv"}
{"level":"info","ts":1613120686.9246924,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
E0212 09:04:46.932309       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:47 http: TLS handshake error from 10.129.2.1:49178: server key missing
E0212 09:04:48.090654       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:48 http: TLS handshake error from 10.129.2.1:49198: server key missing
2021/02/12 09:04:49 http: TLS handshake error from 10.129.2.1:49214: server key missing
E0212 09:04:49.948761       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:50 http: TLS handshake error from 10.129.2.1:49226: server key missing
2021/02/12 09:04:51 http: TLS handshake error from 10.129.2.1:49238: server key missing
2021/02/12 09:04:52 http: TLS handshake error from 10.129.2.1:49274: server key missing
E0212 09:04:53.467969       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:04:53 http: TLS handshake error from 10.129.2.1:49286: server key missing
2021/02/12 09:04:54 http: TLS handshake error from 10.129.2.1:49306: server key missing
2021/02/12 09:04:55 http: TLS handshake error from 10.129.2.1:49326: server key missing
2021/02/12 09:04:56 http: TLS handshake error from 10.129.2.1:49340: server key missing
2021/02/12 09:04:57 http: TLS handshake error from 10.129.2.1:49354: server key missing
2021/02/12 09:04:58 http: TLS handshake error from 10.129.2.1:49368: server key missing
2021/02/12 09:04:59 http: TLS handshake error from 10.129.2.1:49382: server key missing
2021/02/12 09:05:00 http: TLS handshake error from 10.129.2.1:49406: server key missing
2021/02/12 09:05:01 http: TLS handshake error from 10.129.2.1:49420: server key missing
E0212 09:05:01.800201       1 reflector.go:178] knative.dev/pkg/controller/controller.go:619: Failed to list *v1.SinkBinding: conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": no endpoints available for service "eventing-webhook"
2021/02/12 09:05:02 http: TLS handshake error from 10.129.2.1:49456: server key missing
2021/02/12 09:05:03 http: TLS handshake error from 10.129.2.1:49470: server key missing
2021/02/12 09:05:04 http: TLS handshake error from 10.129.2.1:49484: server key missing
2021/02/12 09:05:05 http: TLS handshake error from 10.129.2.1:49504: server key missing
2021/02/12 09:05:06 http: TLS handshake error from 10.129.2.1:49510: server key missing
2021/02/12 09:05:06 http: TLS handshake error from 10.129.2.1:49518: server key missing
2021/02/12 09:05:07 http: TLS handshake error from 10.129.2.1:49526: server key missing
2021/02/12 09:05:07 http: TLS handshake error from 10.129.2.1:49536: server key missing
2021/02/12 09:05:08 http: TLS handshake error from 10.129.2.1:49546: server key missing
{"level":"fatal","ts":1613120708.1635342,"logger":"fallback","caller":"injection/injection.go:63","msg":"Failed to start informers","error":"failed to wait for cache at index 4 to sync","stacktrace":"knative.dev/pkg/injection.EnableInjectionOrDie.func1\n\t/opt/app-root/src/go/src/knative.dev/eventing/vendor/knative.dev/pkg/injection/injection.go:63\nknative.dev/pkg/injection/sharedmain.MainWithConfig\n\t/opt/app-root/src/go/src/knative.dev/eventing/vendor/knative.dev/pkg/injection/sharedmain/main.go:231\nknative.dev/pkg/injection/sharedmain.MainWithContext\n\t/opt/app-root/src/go/src/knative.dev/eventing/vendor/knative.dev/pkg/injection/sharedmain/main.go:142\nmain.main\n\t/opt/app-root/src/go/src/knative.dev/eventing/cmd/webhook/main.go:377\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:203"}

The text was updated successfully, but these errors were encountered:

matzew · 2021-02-12T09:30:56Z

Is it just with SinkBindings ?

What about upgrades with the Operator (e.g. changing the version field, on the KnativeEventing CRD) ?

antoineco · 2021-02-12T09:37:44Z

Seems very much related to knative/operator#292

maschmid · 2021-02-12T10:13:26Z

I don't think so, the original issue in knative/operator#292 was a lack of liveness probe initial delay (the time before liveness probe failure was shorter than the time it took to acquire the lease).

In this case the informers will give up even if we lenghten the liveness probe delay, and it doesn't have to do anything with the lease (we can reproduce this one even if we delete the lease).

matzew · 2021-02-16T16:13:45Z

@maschmid do you have a yaml for

Create a lot of sources.knative.dev/v1beta1 SinkBindings

?

Not sure why I'd create Sinkibings when the KnativeEventing is de-installed ? what's the case ?

matzew · 2021-02-18T11:26:54Z

@maschmid anything for a reproducer yaml or so ?

matzew · 2021-02-18T11:45:16Z

I get

Error from server: error when creating "/home/matzew/sinkbinding_v1beta1.yaml": conversion webhook for sources.knative.dev/v1beta1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found

when I do step 3) after I deleted the KnativeEventing (with the knative/operator master branch)

maschmid · 2021-02-18T11:58:04Z

With OpenShift serverless 1.13 , https://gist.github.com/maschmid/e0b04e9f4a6341ebf50d4076bb63a6c8

maschmid · 2021-02-18T11:59:46Z

On eventing master, sinkbindings are stored as v1, so you cannot create v1beta1 without conversion (which will also mean that the issue is probably not present anymore on master, as the stored version and the version the webhook lists are the same, so the conversion is no longer attempted)

matzew · 2021-02-18T13:15:16Z

master of knative/operator

On Thu 18. Feb 2021 at 13:00, Marek Schmidt ***@***.***> wrote: On eventing master, sinkbindings are stored as v1, so you cannot create v1beta1 without conversion (which will also mean that the issue is probably not present anymore on master, as the stored version and the version the webhook lists are the same, so the conversion is no longer attempted) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4876 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABGPTT7NHL4RZJQTXC3VSTS7T6MFANCNFSM4XQLIZFQ> .

-- Sent from Gmail Mobile

davidkarlsen · 2021-02-19T00:25:10Z

+1 sitting with a broken cluster as well, OCP 4.6.16, serverless operator 1.13.0. Is there anyway to get rid of the serverless component?

maschmid · 2021-02-19T08:45:07Z

To remove all v1beta1 sinkbindings on OpenShift,

oc delete sinkbindings.v1beta1.sources.knative.dev --all-namespaces --all

If KnativeEventing is not installed, you may need to remove their finalizers as well.

#!/usr/bin/env bash

IFS=$'\n'
for line in $(oc get sinkbinding.v1beta1.sources.knative.dev --all-namespaces --no-headers=true)
do
  unset IFS

  namespace=$(echo $line | awk '{ print $1 }')
  name=$(echo $line | awk '{print $2}')

  oc patch -n $namespace sinkbinding.v1beta1.sources.knative.dev $name --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
done

matzew · 2021-03-05T10:53:21Z

OK, I've looked at this again.

This does only happen if you use knative/operator (or distribution) 0.19.x WITH the v1beta1 of the Sinkbindig.

However, using v1 does not cause this problem!

matzew · 2021-03-05T13:58:55Z

On Sinkbinding v1, its prevented to create sinkbinding CRs, after the KnativeEventing is deleted.

See:

knativeeventing.operator.knative.dev "knative-eventing" deleted
namespace/foobar1 created
service.serving.knative.dev/event-display created
cronjob.batch/heartbeat-cron created
Error from server: error when creating "STDIN": conversion webhook for sources.knative.dev/v1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found
namespace/foobar2 created
service.serving.knative.dev/event-display created
cronjob.batch/heartbeat-cron created
Error from server: error when creating "STDIN": conversion webhook for sources.knative.dev/v1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found
namespace/foobar3 created
service.serving.knative.dev/event-display created
cronjob.batch/heartbeat-cron created
Error from server: error when creating "STDIN": conversion webhook for sources.knative.dev/v1, Kind=SinkBinding failed: Post "https://eventing-webhook.knative-eventing.svc:443/resource-conversion?timeout=30s": service "eventing-webhook" not found
...
...

matzew · 2021-03-08T13:04:47Z

The lisiting of the Sinkbindings v1 were added in b2387db by @capri-xiyue

Should that be kept at v1beta1 ? 🤔

guimou · 2021-03-25T14:52:20Z

Note in case it helps: this happened to me when a KafkaSource object had been created in a project without the CRD and the controller being deployed before (the user had skipped this step when configuring Serverless).
Thanks again @maschmid for the workaround, which was to delete the KafkaSource object. As it was impossible to do this directly, I deleted the CRD, then reinstalled the component properly.

aliok · 2021-04-01T14:27:21Z

I did some investigation here: knative-extensions/eventing-kafka#494 (comment)

vaikas · 2021-04-26T17:14:34Z

@matzew @aliok would you mind taking a look at this and update as necessary?

github-actions · 2021-07-26T01:23:38Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

maschmid added the kind/bug Categorizes issue or PR as related to a bug. label Feb 12, 2021

lberk added area/sources priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 1, 2021

lberk added this to the v0.22.0 milestone Mar 1, 2021

aliok mentioned this issue Apr 1, 2021

kafka-controller-manager CrashLoopBackOff when unreconciled KafkaSources already exist during startup knative-extensions/eventing-kafka#494

Open

lberk modified the milestones: v0.22.0, v0.23.0 Apr 12, 2021

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2021

github-actions bot closed this as completed Aug 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webhook CrashLoopBackOff with "Failed to start informers", "failed to wait for cache at index 4 to sync", when sources.knative.dev/v1beta1 SinkBindings exist #4876

webhook CrashLoopBackOff with "Failed to start informers", "failed to wait for cache at index 4 to sync", when sources.knative.dev/v1beta1 SinkBindings exist #4876

maschmid commented Feb 12, 2021

matzew commented Feb 12, 2021

antoineco commented Feb 12, 2021

maschmid commented Feb 12, 2021 •

edited

Loading

matzew commented Feb 16, 2021

matzew commented Feb 18, 2021

matzew commented Feb 18, 2021

maschmid commented Feb 18, 2021

maschmid commented Feb 18, 2021

matzew commented Feb 18, 2021 via email

davidkarlsen commented Feb 19, 2021

maschmid commented Feb 19, 2021

matzew commented Mar 5, 2021

matzew commented Mar 5, 2021

matzew commented Mar 8, 2021

guimou commented Mar 25, 2021

aliok commented Apr 1, 2021 •

edited

Loading

vaikas commented Apr 26, 2021

github-actions bot commented Jul 26, 2021

webhook CrashLoopBackOff with "Failed to start informers", "failed to wait for cache at index 4 to sync", when sources.knative.dev/v1beta1 SinkBindings exist #4876

webhook CrashLoopBackOff with "Failed to start informers", "failed to wait for cache at index 4 to sync", when sources.knative.dev/v1beta1 SinkBindings exist #4876

Comments

maschmid commented Feb 12, 2021

matzew commented Feb 12, 2021

antoineco commented Feb 12, 2021

maschmid commented Feb 12, 2021 • edited Loading

matzew commented Feb 16, 2021

matzew commented Feb 18, 2021

matzew commented Feb 18, 2021

maschmid commented Feb 18, 2021

maschmid commented Feb 18, 2021

matzew commented Feb 18, 2021 via email

davidkarlsen commented Feb 19, 2021

maschmid commented Feb 19, 2021

matzew commented Mar 5, 2021

matzew commented Mar 5, 2021

matzew commented Mar 8, 2021

guimou commented Mar 25, 2021

aliok commented Apr 1, 2021 • edited Loading

vaikas commented Apr 26, 2021

github-actions bot commented Jul 26, 2021

maschmid commented Feb 12, 2021 •

edited

Loading

aliok commented Apr 1, 2021 •

edited

Loading