You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the operator is deployed for the first time the Secret which contains the webhook certificate is populated by the operator (if the webhook certificate is managed by ECK, which is the case by default).
Unfortunately it can take some time for the content of the Secret to be propagated into the container. A wait loop has been introduced in #2312 but my experience while testing 1.2.0-bc2 seems to show that the current timeout (30 seconds) is too low.
I did a few tests to understand what timeout value would be acceptable (ECK version: 1.2.0-bc2 on v1.15.12-gke.2):
79 seconds:
{"log.level":"info","@timestamp":"2020-06-25T16:49:01.036Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/
k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-25T16:49:01.036Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/
tmp/k8s-webhook-server/serving-certs/tls.crt"}
...
{"log.level":"debug","@timestamp":"2020-06-25T16:50:20.037Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-
webhook-server/serving-certs/tls.crt"}
90 seconds:
{"log.level":"info","@timestamp":"2020-06-26T06:17:34.985Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:17:34.985Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
...
{"log.level":"debug","@timestamp":"2020-06-26T06:19:04.985Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
80 seconds:
{"log.level":"info","@timestamp":"2020-06-26T06:22:56.395Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:22:56.395Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:24:16.396Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:24:16.396Z","log.logger":"dynamic-enqueue-request","message":"Adding new handler registration","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","key":"-owner","current_registrations":{}}
68 seconds:
{"log.level":"info","@timestamp":"2020-06-26T06:31:36.446Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:31:36.446Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
...
{"log.level":"debug","@timestamp":"2020-06-26T06:32:42.446Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
A first option would be to increase the timeout to something like 90~100 seconds, but it really looks like a high value to me.
I fear that secret propagation times are dependent on more than k8s cluster version but also size of the cluster, number of secrets or other API resources. So any timeout we pick might fail for some cases.
That means the second approach you suggested sounds more compelling i.e. marking the operator pod as updated to speed up secret propagation. If I am not mistaken the operator already has the necessary permissions (?)
When the operator is deployed for the first time the Secret which contains the webhook certificate is populated by the operator (if the webhook certificate is managed by ECK, which is the case by default).
Unfortunately it can take some time for the content of the Secret to be propagated into the container. A wait loop has been introduced in #2312 but my experience while testing
1.2.0-bc2
seems to show that the current timeout (30 seconds) is too low.I did a few tests to understand what timeout value would be acceptable (ECK version:
1.2.0-bc2
onv1.15.12-gke.2
):A first option would be to increase the timeout to something like 90~100 seconds, but it really looks like a high value to me.
A second solution would be to update the Pod which is running the operator by using the MarkPodsAsUpdated function. An other benefit would be that the certificate is also propagate faster when renewed. But it means that the operator should be able to update its own
Pod
. (see Speedup cert secret propagation by also updating the pod #496 and Make speedup secret propagation a utility function #568 for more context)The text was updated successfully, but these errors were encountered: