Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator may crash on the first start #3321

Closed
barkbay opened this issue Jun 26, 2020 · 1 comment · Fixed by #5519
Closed

Operator may crash on the first start #3321

barkbay opened this issue Jun 26, 2020 · 1 comment · Fixed by #5519
Assignees
Labels
>enhancement Enhancement of existing functionality good first issue Good for newcomers

Comments

@barkbay
Copy link
Contributor

barkbay commented Jun 26, 2020

When the operator is deployed for the first time the Secret which contains the webhook certificate is populated by the operator (if the webhook certificate is managed by ECK, which is the case by default).

Unfortunately it can take some time for the content of the Secret to be propagated into the container. A wait loop has been introduced in #2312 but my experience while testing 1.2.0-bc2 seems to show that the current timeout (30 seconds) is too low.

I did a few tests to understand what timeout value would be acceptable (ECK version: 1.2.0-bc2 on v1.15.12-gke.2):

  • 79 seconds:
{"log.level":"info","@timestamp":"2020-06-25T16:49:01.036Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/
k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-25T16:49:01.036Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/
tmp/k8s-webhook-server/serving-certs/tls.crt"}
...
{"log.level":"debug","@timestamp":"2020-06-25T16:50:20.037Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-
webhook-server/serving-certs/tls.crt"}
  • 90 seconds:
{"log.level":"info","@timestamp":"2020-06-26T06:17:34.985Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:17:34.985Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
...
{"log.level":"debug","@timestamp":"2020-06-26T06:19:04.985Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
  • 80 seconds:
{"log.level":"info","@timestamp":"2020-06-26T06:22:56.395Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:22:56.395Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}

{"log.level":"debug","@timestamp":"2020-06-26T06:24:16.396Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:24:16.396Z","log.logger":"dynamic-enqueue-request","message":"Adding new handler registration","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","key":"-owner","current_registrations":{}}
  • 68 seconds:
{"log.level":"info","@timestamp":"2020-06-26T06:31:36.446Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"debug","@timestamp":"2020-06-26T06:31:36.446Z","log.logger":"manager","message":"Webhook certificate file not present on filesystem yet","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
...
{"log.level":"debug","@timestamp":"2020-06-26T06:32:42.446Z","log.logger":"manager","message":"Webhook certificate file present on filesystem","service.version":"1.2.0-d53bc36a","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
@barkbay barkbay added the >enhancement Enhancement of existing functionality label Jun 26, 2020
@pebrc
Copy link
Collaborator

pebrc commented Jun 26, 2020

I fear that secret propagation times are dependent on more than k8s cluster version but also size of the cluster, number of secrets or other API resources. So any timeout we pick might fail for some cases.

That means the second approach you suggested sounds more compelling i.e. marking the operator pod as updated to speed up secret propagation. If I am not mistaken the operator already has the necessary permissions (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants