-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review Certificate Reconciliation Logic #1841
Comments
Some specific examples called out in this PR: #2541 |
In a single iteration of the Elasticsearch reconciliation loop, the certificates are parsed and encoded multiple times.
A synthetic benchmark (using the fake client) of
CPU and memory profiles are attached. As this requires a careful design and a large refactoring to get rid of the amount of unnecessary work done in each iteration, I'll come back to this later after finishing the more urgent tasks on hand. |
Related to #6954 It offers users a workaround for the problem with too many certificates in the transport certificate secret. They can configure external transport cert provisioning and disable self-signed transport certificates. When using a solution eg. like cert-manager's csi-driver as [documented here ](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-transport-settings.html#k8s-transport-third-party-tools) this should allow for larger node sets of more than 250 nodes. The large cluster scenario is certainly an an edge case but on smaller clusters the disabling of certificate provisioning might still be attractive [reducing the amount of work the operator has to do in this area.](#1841) Note the new option to disable the self-signed transport certificates below: ```yaml apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es spec: version: 8.6.2 transport: tls: certificateAuthorities: configMapName: trust selfSignedCertificates: disabled: true # <<<< new option nodeSets: - name: mixed count: 3 config: xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/cert-manager-certs/tls.key xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/cert-manager-certs/tls.crt node.store.allow_mmap: false podTemplate: spec: containers: - name: elasticsearch env: - name: PRE_STOP_ADDITIONAL_WAIT_SECONDS value: "5" volumeMounts: - name: transport-certs mountPath: /usr/share/elasticsearch/config/cert-manager-certs volumes: - name: transport-certs csi: driver: csi.cert-manager.io readOnly: true volumeAttributes: csi.cert-manager.io/issuer-name: ca-cluster-issuer csi.cert-manager.io/issuer-kind: ClusterIssuer csi.cert-manager.io/dns-names: "${POD_NAME}.${POD_NAMESPACE}.svc.cluster.local" ``` The option does not remove existing certificates from the secret so that the cluster keeps working during the transition if this option is turned on on an existing cluster. I also opted to remove the symlinking of certificates into the `emptyDir` config volume. I tried to figure out why we did this in the first place and am not sure. The reason I could think of was that we wanted to have static and predictable certificate and key file names across all nodes (`transport.tls.crt` and `transport.tls.key`) But we can just use the `POD_NAME` environment variable to link directly into the mounted certificate secret volume. The reason to change this behaviour now is again to support the transition between externally provisioned certs and self-signed certs provisioned by ECK: if a user flips the switch to disable and then re-enable the self-signed certs, but does this accidentally without also configuring the config settings for the transport layer there is an edge case where an Elasticsearch pod will crashloop and cannot recover if we use symlinking: 1. disable self-signed transport certs 2. scale the cluster up by one or more nodes 3. new nodes won't come up because certs are missing (user error) 4. user tries to recover by re-enabling self-signed certs 5. ES keeps bootlooping on the new nodes because the symlink is missing By removing the symlinking the node can recover as soon as the certificates appear in the filesystem. --------- Co-authored-by: Michael Morello <michael.morello@gmail.com> Co-authored-by: Michael Montgomery <mmontg1@gmail.com>
As noted in #1832, the
certificates.Reconcile
method has a large footprint in terms of memory usage. This is probably due to parsing of certificates over and over. We should review the logic to see if we can avoid some of this work.Alloc Space:
The text was updated successfully, but these errors were encountered: