-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] unable to load certificates when cruise control is turned on #3694
Comments
So, just to clarify ... the Cruise Control pod is crash looping after the |
Yes, the cruise control pod is crash looping after the It looks like the sidecar fails as well.
It appears the crt file is empty.
|
It looks like something went wrong when generating the certificates - the
If the Cluster CA secrets look ok, I think you should just delete the |
I deleted my comment because I realized there are were changes that I didn't realize were made to the cluster this afternoon. A cyber security engineer applied this to all svcs: I looked at the persistent volumes and they were all in a terminating state. I'm not sure why. I didn't get to verify if the cluster ca looked okay. ( but all consumers and producers were operational) I'll stand the cluster back up and turn on cruise control as I did before and see if it works this time. |
Hmm, that sounds weird. Normally if you apply a label or annotation to a service it should not touch the pods at all unless it causes the loadbalancer to recreate (which would roll them but not delete them). |
I think at 1:30 EST we had a broker down and 2:30 EST cyber security came by and applied that tag while broker was still down. I then looked at cluster and noticed brokers were missing and persistent volumes were in a terminating state. |
I think I probably asked about it already ... but in your storage configuration, you do not use the |
|
strange - I wiped out the cluster.
I can't even seem to put the cluster back...
Can the CRDs get corrupted somehow or in some sort of state? |
Can you share the whole log? Do the pods exist? When the cluster is deleted, it can happen that it is deleted during the reconciliation ... in which case the Cluster Operator might need some time to figure it out and finish the old reconciliation of the deleted cluster and start the new one with the new cluster. This could be the case maybe? I would need the full log to confirm.
I haven't seen anything like that. But TBH we use the CRDs and know how they work from the user perspective ... but I cannot say I understand the details of the CRD implementation in Kubernetes and all the was how they can go wrong. In any case, deleting the CRD triggers deletion of the CRs which trigger deletion of the cluster. So if there would be something wrong, everything is possible. |
oh boy. The cyber security engineer came by and applied a yaml where where requests/limits on kafka were flipped. This was the cause of the kafka cluster dying and the volumes going away. I believe the cruise control bug was valid though, but can't collect information on it because of the aforementioned problem. |
Yeah, if the secret was empty it was probably some bug. Hard to find the exact cause without the logs. Did deleting the secret helped? Or did it this resolved by the other problem? |
yeah, we resolved this by causing a bigger problem and having to reset the cluster unfortunately. I'll try to reproduce this at some point before we go into production with it. |
@jrivers96 Did you managed to reproduce this again? |
Triaged on 7th July 2022: Should be closed. Seems like it never happened again? |
Certificate problem when cruise control turned on
I have a 35 kafka broker 5 zookeeper cluster on strimzi 0.19 that has been running a month on AWS EKS.
I did
k edit Kafka kafka-cluster
and turned on cruise control with default settings{}
. The brokers roll over and I see the below error. The cluster is setup with oauth authentication with external and internal certs.Any ideas? The cluster seems to be fully operational otherwise.
k get secrets
NAME TYPE DATA AGE
broker-oauth-secret Opaque 1 20d
ca-truststore Opaque 1 20d
default-token-9xhxm kubernetes.io/service-account-token 3 20d
external-cert-secret Opaque 2 20d
kafka-cluster-clients-ca Opaque 1 7d7h
kafka-cluster-clients-ca-cert Opaque 3 7d7h
kafka-cluster-cluster-ca Opaque 1 7d7h
kafka-cluster-cluster-ca-cert Opaque 3 7d7h
kafka-cluster-cluster-operator-certs Opaque 4 7d7h
kafka-cluster-cruise-control-certs Opaque 4 41m
kafka-cluster-cruise-control-token-wlxz5 kubernetes.io/service-account-token 3 41m
kafka-cluster-entity-operator-certs Opaque 4 7d7h
kafka-cluster-entity-operator-token-vp2js kubernetes.io/service-account-token 3 7d7h
kafka-cluster-kafka-brokers Opaque 140 7d7h
kafka-cluster-kafka-exporter-certs Opaque 4 7d7h
kafka-cluster-kafka-exporter-token-wvm92 kubernetes.io/service-account-token 3 7d7h
kafka-cluster-kafka-token-vpj7l kubernetes.io/service-account-token 3 7d7h
kafka-cluster-zookeeper-nodes Opaque 20 7d7h
kafka-cluster-zookeeper-token-rf5tv kubernetes.io/service-account-token 3 7d7h
The text was updated successfully, but these errors were encountered: