-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kafka] keda-operator-metrics-apiserver begins failing SSL handshake #2490
Comments
CRDs used for scaling: apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-kafka-credential
namespace: default
spec:
secretTargetRef:
- parameter: sasl
name: keda-kafka-secrets
key: sasl
- parameter: username
name: keda-kafka-secrets
key: username
- parameter: password
name: keda-kafka-secrets
key: password
- parameter: tls
name: keda-kafka-secrets
key: tls
- parameter: ca
name: keda-kafka-secrets
key: ca
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaledobject-content-crawler
namespace: default
spec:
scaleTargetRef:
name: avro-crawler
pollingInterval: 55
idleReplicaCount: 0
minReplicaCount: 1 # Optional Default 0
maxReplicaCount: 1800 # Optional Default 100
fallback: # Optional. Section to specify fallback options
failureThreshold: 10
replicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: broker-svc
# Make sure that this consumer group name is the same one as the one that is consuming topics
consumerGroup: crawler-group-normal-prod
topic: fsn1.crawler.cmd.process.normal.0
# Optional
lagThreshold: "10"
offsetResetPolicy: latest
version: 3.0.0
authenticationRef:
name: keda-trigger-auth-kafka-credential
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaledobject-directory-crawler
namespace: default
spec:
scaleTargetRef:
name: avro-high-crawler
pollingInterval: 50
idleReplicaCount: 0
minReplicaCount: 1 # Optional Default 0
maxReplicaCount: 300 # Optional Default 100
fallback: # Optional. Section to specify fallback options
failureThreshold: 10
replicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: broker-svc
# Make sure that this consumer group name is the same one as the one that is consuming topics
consumerGroup: crawler-group-directory-prod
topic: fsn1.crawler.cmd.process.high.0
# Optional
lagThreshold: "10"
offsetResetPolicy: latest
version: 3.0.0
authenticationRef:
name: keda-trigger-auth-kafka-credential
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaledobject-parallel-proxy
namespace: default
spec:
scaleTargetRef:
name: parallel-proxy
pollingInterval: 65
idleReplicaCount: 0
minReplicaCount: 1 # Optional Default 0
maxReplicaCount: 150 # Optional Default 100
fallback: # Optional. Section to specify fallback options
failureThreshold: 10
replicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: broker-svc
# Make sure that this consumer group name is the same one as the one that is consuming topics
consumerGroup: crawler-group-directory-prod
topic: fsn1.crawler.cmd.process.high.0
# Optional
lagThreshold: "10"
offsetResetPolicy: latest
version: 3.0.0
authenticationRef:
name: keda-trigger-auth-kafka-credential
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaledobject-head-crawler
namespace: default
spec:
scaleTargetRef:
name: avro-head-crawler
pollingInterval: 60
idleReplicaCount: 0
minReplicaCount: 1 # Optional Default 0
maxReplicaCount: 1200 # Optional Default 100
fallback: # Optional. Section to specify fallback options
failureThreshold: 10
replicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: broker-svc
# Make sure that this consumer group name is the same one as the one that is consuming topics
consumerGroup: crawler-group-head-prod
topic: fsn1.crawler.cmd.process.head.0
# Optional
lagThreshold: "10"
offsetResetPolicy: latest
version: 3.0.0
authenticationRef:
name: keda-trigger-auth-kafka-credential |
KEDA Metrics Server needs to access Kafka to scrape metrics used for 1<->N scaling. This is an open issure related to the not committed offset: #2033 Still I am not sure why is the Metrics Server not able to authenticate 🤷♂️ |
I'll do some more investigating, maybe try with different versions of Keda as I've been using it for months and now is the first time I encounter such error. |
It might be related to the caching bug introduced in 2.5.0. This will be solved in 2.6.0, which should be released soon. |
I upgraded keda to 2.6.0 last night and the issue didn't come up, cheers! |
Report
I've set up a a few Keda scalers for deployments based off some Kafka topics, which begun scaling the deployment up. After some time, I've observed Kafka logs saying:
On all Kafka nodes I see the same warning about the same IP so all requests are coming from a single node. After finding out which node it was:
Scheduled pods on this node:
After seeing this I've concluded that the "culprit" is
keda-operator-metrics-apiserver-7549b7db99-cp5mr
, but it doesn't make sense for me as why the metrics server would try authenticating with Kafka.Expected Behavior
Keda should properly and consistently scale deployments based off Kafka topics lag
Actual Behavior
After receiving those warnings on Kafka node, one of the scaled objects will constantly fail, causing it to use the
Fallback
Steps to Reproduce the Problem
Logs from KEDA operator
and some more repeating lines about the topic which did not had offsets committed yet, but that's not controlling the scaled object which fails.
KEDA Version
2.5.0
Kubernetes Version
1.22
Platform
Other
Scaler Details
Kafka
Anything else?
I've deployed Keda using the official helm chart with the default values, except http timeout which I've set to 30 seconds:
Another strange thing I've just noticed is that operator metrics are not even enabled in this chart.
The text was updated successfully, but these errors were encountered: