Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda Kafka got error "remote error: tls: bad certificate" #5473

Closed
yaakov-berkovitch opened this issue Feb 2, 2024 · 14 comments
Closed

Keda Kafka got error "remote error: tls: bad certificate" #5473

yaakov-berkovitch opened this issue Feb 2, 2024 · 14 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@yaakov-berkovitch
Copy link

Report

I deployed a Kafka scaled-object on EKS cluster v1.27, and get an error "remote error: tls: bad certificate". When trying the TLS configuration with a simple python code it works.

Expected Behavior

Should load correctly the Kafka certificates and initialize the HPA correctly.

Actual Behavior

The Keda scaledobject is created, but the keda-operator complains for remote error - bad certificates, and the HPA failed to create.

Steps to Reproduce the Problem

1.Create the below kafka ScaledObject

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ms-metro
spec:
  advanced:
    restoreToOriginalReplicaCount: true
    scalingModifiers:
      formula: '(kafka) > 0 ? 1 : 0'
      metricType: Value
      target: '1'
  cooldownPeriod: 120
  maxReplicaCount: 1
  minReplicaCount: 0
  pollingInterval: 30
  scaleTargetRef:
    name: ms-metro
  triggers:
    - metadata:
        bootstrapServers: >-
          kafka1:9094,kafka2:9094,kafka3:9094
        ca: |-
          -----BEGIN CERTIFICATE-----
          MIIDPzCCAiegAwIBAgIJALpKfrM78xDRMA0GCSqGSIb3DQEBCwUAMCwxCzAJBgNV
          BAYTAlVTMQswCQYDVQQHDAJOSjEQMA4GA1UECgwHQWxnb3NlYzAeFw0yMDA3MTcx
          -----END CERTIFICATE-----
        cert: |-
          -----BEGIN CERTIFICATE-----
          MIICzDCCAbQCCQDCwlC0lc/xhzANBgkqhkiG9w0BAQsFADAsMQswCQYDVQQGEwJV
          UzELMAkGA1UEBwwCTkoxEDAOBgNVBAoMB0FsZ29zZWMwHhcNMjMwNjA3MDYxMTMy
          -----END CERTIFICATE-----
        consumerGroup: group-3082561150640452
        key: |-
          -----BEGIN RSA PRIVATE KEY-----
          Proc-Type: 4,ENCRYPTED
          DEK-Info: AES-256-CBC,23328C7353BEFE355877A4219D5E22D9

          KMsPV0rcqT1+6JeV286L2IXKmpsvyIcLP25M/DmWtJANQ31s5NOaS4UqSNAahEDo
          D1quFzwcH0B048nW5fLg2OwyFBBjkljJhviAOTkfYTwevSLsvSiIpa/Ej4EKjgek
          -----END RSA PRIVATE KEY-----
        keyPassword: abcde123
        tls: enable
        topic: cf-to-algonext
        unsafeSsl: 'true'
      name: kafka
      type: kafka

The certificates in the ScaledObject are not the right one, and only the first are shown for security purpose.
2.
3.

Logs from KEDA operator

2024-02-02T12:21:15Z	ERROR	scale_handler	error resolving auth params	{"type": "ScaledObject", "namespace": "citenant114", "name": "ms-metro", "triggerIndex": 2, "error": "error creating kafka client: kafka: client has run out of available brokers to talk to: 3 errors occurred:\n\t* remote error: tls: bad certificate\n\t* remote error: tls: bad certificate\n\t* remote error: tls: bad certificate\n"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).buildScalers
	/workspace/pkg/scaling/scalers_builder.go:99
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache
	/workspace/pkg/scaling/scale_handler.go:357
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScalersCacheForScaledObject
	/workspace/pkg/scaling/scale_handler.go:290
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
	/workspace/pkg/scaling/scale_handler.go:429
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
	/workspace/pkg/metricsservice/server.go:47
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
	/workspace/pkg/metricsservice/api/metrics_grpc.pb.go:99
google.golang.org/grpc.(*Server).processUnaryRPC
	/workspace/vendor/google.golang.org/grpc/server.go:1372
google.golang.org/grpc.(*Server).handleStream
	/workspace/vendor/google.golang.org/grpc/server.go:1783
google.golang.org/grpc.(*Server).serveStreams.func2.1
	/workspace/vendor/google.golang.org/grpc/server.go:1016

KEDA Version

2.13.0

Kubernetes Version

1.27

Platform

Amazon Web Services

Scaler Details

Kafka

Anything else?

  1. Helm chart version used to deploy keda is v2.13.1.
  2. I understand that the error comes from the Kafka side, but perhaps the certificates were not loaded correctly from the Keda Side ? Or a different certificate format is required ?
  3. Is there is way to run keda with higher log verbosity / debug to get more details about the error ?
@yaakov-berkovitch yaakov-berkovitch added the bug Something isn't working label Feb 2, 2024
@JorTurFer
Copy link
Member

You can't pass the certs inline in the ScaledObject as they are sensitive information, you have to use TriggerAuthentication. KEDA just ignores them from ScaledObject, that's why you see the error.
In the bottom of the scaler docs there are some examples using certs: https://keda.sh/docs/2.13/scalers/apache-kafka/#your-kafka-cluster-turns-on-sasltls-auth

@yaakov-berkovitch
Copy link
Author

Thanks @JorTurFer, I will do that. In the documentation it says "you can use a TriggerAuthentication" so I thought it is not a must but an option.

@JorTurFer
Copy link
Member

mmm.. maybe we should improve docs to be more clear 🤔
As a rule, if you see the parameters under the section Authentication Parameters (https://keda.sh/docs/2.13/scalers/apache-kafka/#authentication-parameters) they can be set as plain text in scaledobject. This applies to all the scalers, if a parameter is described within Authentication Parameters it has to be read from pod (using EnvFrom when it's supported) or from TriggerAuthentication

@yaakov-berkovitch
Copy link
Author

yaakov-berkovitch commented Feb 3, 2024

I did as you suggested, but now I fail with this error (extracted from the keda-operator POD):
2024-02-03T19:35:36Z ERROR Reconciler error {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"ms-metro","namespace":"citenant114"}, "namespace": "citenant114", "name": "ms-metro", "reconcileID": "c23dca9a-7251-4ef1-bf29-f618399f2c7f", "error": "error decrypt X509Key: pkcs8: only PKCS #5 v2.0 supported"}
I will try to convert the key using something like : openssl pkcs8 -in key.pem -topk8 -v2 des3 -out enckey.pem

Should it help ?

@JorTurFer
Copy link
Member

I'm not an expert with certificate stuff. After a quick review it looks that your command generates a PKCS v2...
Could you share all the commands for generating a certificate like yours?
Do you have any idea @zroubalik ?

@yaakov-berkovitch
Copy link
Author

After converting my original key I got another error:
2024-02-03T19:55:08Z ERROR failed to ensure HPA is correctly created for ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"ms-metro","namespace":"citenant114"}, "namespace": "citenant114", "name": "ms-metro", "reconcileID": "dfb2145c-ac77-40d8-90d8-8280de1e63c9", "error": "error parse X509KeyPair: tls: private key does not match public key"}

I did perhaps something wrong, or this conversion trick is not possible. I have to say that I'm also not an expert with certificate stuff.

@JorTurFer
Copy link
Member

They error looks as a mismatch between the crt and key. Could you share how you generate them?
I mean, the openssl commands

@yaakov-berkovitch
Copy link
Author

yaakov-berkovitch commented Feb 3, 2024

The key and pem are generated using:

key: openssl genrsa -aes256....
openssl req -new -key <key generated before>.key --> this create the csr used by the next openssl command
cert: openssl x509 -req ....

BTW we can't change easily because it is already in production ....
Just a reminder, using a simple python code and giving the original pem/key/ca all is working fine.

@yaakov-berkovitch
Copy link
Author

yaakov-berkovitch commented Feb 4, 2024

@JorTurFer I looked again to the initial error error decrypt X509Key: pkcs8: only PKCS #5 v2.0 supported

My key looks like:
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,7EF293A7B0C43A20
bwz7TPBN2Xr6AxW9y7PRkaQjXYOS3bireDgyD0lBfKMqQ9AV2oTNUcrI2MtaquBH
QaK+bZY0XBpviceXPrfl73cFrBLBZM7/QhyxINWvcuJiq/hyHFwkT/kEOPWg3g+B
....
-----END RSA PRIVATE KEY-----

In this paper Encrypted private key in PKCS#8 format not supported, it says that there is the "only PKCS youmark/pkcs8#5 v2.0 supported" error happens when the library can't parse the ASN.1 structure".

So i run the command openssl asn1parse -in and indeed got the error
140235125307280:error:0D07207B:asn1 encoding routines:ASN1_get_object:header too long:asn1_lib.c:157:
that is explained by
here

So my understanding is that the RSA PEM format is not support by the golang library pkcs8 you are using.

Does it make sense for you ?

I then remove the DEK-Info header using the command openssl pkcs8 -topk8 -inform pem -in -out and a new error in keda raise error decrypt X509Key: pkcs8: only PBES2 supported

@yaakov-berkovitch
Copy link
Author

yaakov-berkovitch commented Feb 4, 2024

Bottom line - I re-generate the private/public keys and it stated working :-)
The change consists of replace this private key command
openssl genrsa -aes256 -passout pass:${CLIENT_KEY_PWD} -out ${CLIENT_NAME}.key 2048 by
openssl genpkey -algorithm RSA -aes256 -passout pass:${CLIENT_KEY_PWD} -out ${CLIENT_NAME}.key -pkeyopt rsa_keygen_bits:2048

The difference between the generated keys in the 2 line headers:

Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,7EF293A7B0C43A20

I didn't find a way to convert the existing keys, but if you have any ideas I will happy to hear.
Anyway, I'm now able to progress with the integration with Keda in our system.

@JorTurFer
Copy link
Member

Wow, nice research, I didn't know about this cert limitation in the upstream libraries :/

I've checked the library and I'd not expect support for this, so I think that adding a note in docs could be awesome to explain this limitation. Would you be willing to add a pr there explaining this? FAQ could be a good place for it

@zroubalik
Copy link
Member

Or maybe add it directly to Kafka scaler docs, if it is kafka only issue

Copy link

stale bot commented Apr 13, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Apr 13, 2024
Copy link

stale bot commented Apr 20, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Apr 20, 2024
@github-project-automation github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

3 participants