-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka event bus reconnect attempt logic has some issue , its never succeeding and ultimately Sensor gets halted #2711
Comments
it could be also that issue is rooted to ""github.com/Shopify/sarama" as i see many discussion for error": "kafka: response did not contain all the expected topic/partition blocks. |
If connection to kafka fails for whatever reason, the Sensor will continue to attempt to reconnect in a loop forever. I'm wondering if perhaps we should give up on the connection at some point and allow the process to fail. This would trigger a pod restart which sounds like it should resolve the issue. I'm also curious about the underlying problem, do you have any inclination as to why you are seeing this error?
|
This thread is interesting. I'm wondering if we need to filter out some messages if it's possible for our client to receive messages out-of-order. Do you use log compaction @gyanprakash48? |
This issue has been automatically marked as stale because it has not had |
Hey @dfarr, I am having the same issue here. Did you find a way to trigger a restart when the Sensor continue to attempt to reconnect in a loop forever? |
@gyanprakash48 did you find a solution for it? |
Hi @pdellarciprete, have you found a solution for this problem in the meantime? |
Has anyone found any workaround for this yet? |
@pavan02 The workaround we currently use is a cronjob that regularly restarts the sensor pods. It's far from perfect, but the issue hasn't come up again. We were considering implementing a way to add a sidecar to the Sensor-Pods that fails the pod when the messages occur, but that requires additional tooling to inject the sidecar and we haven't had the time/resources to implement that. |
Sensor logs say below but reconnect attempt never succeed though terminating and restarting the pod immediately fix the issue, == This indicates reconnect logic has some issue =====
2023-07-13T07:39:00.541Z INFO argo-events.sensor sensors/listener.go:302 EventBus connection lost, reconnecting... {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger"}
2023-07-13T07:39:00.541Z INFO argo-events.sensor sensors/listener.go:308 reconnected to EventBus. {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger", "connection": "KafkaTriggerConnection{Sensor:staging.dyper-recompute.sensor,Trigger:http-trigger}"}
2023-07-13T07:39:00.541Z DEBUG argo-events.sensor sensors/listener.go:316 sublock not acquired {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger"}
2023-07-13T07:39:00.542Z INFO argo-events.sensor sensors/listener.go:277 started subscribing to events for trigger http-trigger with client connection KafkaTriggerConnection{Sensor:staging.dyper-recompute.sensor,Trigger:http-trigger} {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger"}
2023-07-13T07:39:00.542Z INFO argo-events.sensor sensor/kafka_sensor.go:203 Consuming {"sensorName": "staging.dyper-recompute.sensor", "topics": ["staging.dyper-recompute.eventbus", "staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-trigger", "staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-action"], "group": "staging.dyper-recompute.eventbus.listner"}
2023-07-13T07:39:00.558Z INFO argo-events.sensor sensor/kafka_handler.go:75 Kafka setup {"sensorName": "staging.dyper-recompute.sensor", "claims": {"staging.dyper-recompute.eventbus":[0,1,2,3,4,5],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-action":[0,1,2],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-trigger":[0,1,2]}}
2023-07-13T07:39:00.564Z INFO argo-events.sensor sensor/kafka_handler.go:124 Kafka cleanup {"sensorName": "staging.dyper-recompute.sensor", "claims": {"staging.dyper-recompute.eventbus":[0,1,2,3,4,5],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-action":[0,1,2],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-trigger":[0,1,2]}}
2023-07-13T07:39:00.564Z ERROR argo-events.sensor sensor/kafka_sensor.go:215 Failed to consume {"sensorName": "staging.dyper-recompute.sensor", "error": "kafka: response did not contain all the expected topic/partition blocks"}
github.com/argoproj/argo-events/eventbus/kafka/sensor.(*KafkaSensor).Listen
/home/runner/work/argo-events/argo-events/eventbus/kafka/sensor/kafka_sensor.go:215
2023-07-13T07:39:05.541Z INFO argo-events.sensor sensors/listener.go:302 EventBus connection lost, reconnecting... {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger"}
2023-07-13T07:39:05.541Z INFO argo-events.sensor sensors/listener.go:308 reconnected to EventBus. {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger", "connection": "KafkaTriggerConnection{Sensor:staging.dyper-recompute.sensor,Trigger:http-trigger}"}
2023-07-13T07:39:05.541Z DEBUG argo-events.sensor sensors/listener.go:311 acquired sublock, instructing trigger to shutdown subscription {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger"}
2023-07-13T07:39:05.541Z DEBUG argo-events.sensor sensors/listener.go:285 exiting subscribe goroutine, conn=KafkaTriggerConnection{Sensor:staging.dyper-recompute.sensor,Trigger:http-trigger} {"sensorName": "staging.dyper-recompute.sensor", "triggerName": "http-trigger"}
2023-07-13T07:39:05.541Z INFO argo-events.sensor sensor/kafka_sensor.go:203 Consuming {"sensorName": "staging.dyper-recompute.sensor", "topics": ["staging.dyper-recompute.eventbus", "staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-trigger", "staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-action"], "group": "staging.dyper-recompute.eventbus.listner"}
2023-07-13T07:39:05.555Z INFO argo-events.sensor sensor/kafka_handler.go:75 Kafka setup {"sensorName": "staging.dyper-recompute.sensor", "claims": {"staging.dyper-recompute.eventbus":[0,1,2,3,4,5],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-action":[0,1,2],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-trigger":[0,1,2]}}
2023-07-13T07:39:05.556Z INFO argo-events.sensor sensor/kafka_handler.go:124 Kafka cleanup {"sensorName": "staging.dyper-recompute.sensor", "claims": {"staging.dyper-recompute.eventbus":[0,1,2,3,4,5],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-action":[0,1,2],"staging.dyper-recompute.eventbus-staging.dyper-recompute.sensor-trigger":[0,1,2]}}
2023-07-13T07:39:05.557Z ERROR argo-events.sensor sensor/kafka_sensor.go:215 Failed to consume {"sensorName": "staging.dyper-recompute.sensor", "error": "kafka: response did not contain all the expected topic/partition blocks"}
github.com/argoproj/argo-events/eventbus/kafka/sensor.(*KafkaSensor).Listen
The text was updated successfully, but these errors were encountered: