How to minimize the producer timeout period during partition leader failover? #10894
Unanswered
vereshchagin-d
asked this question in
Q&A
Replies: 1 comment 3 replies
-
@scholzj Hi! Could you please help me with some advice regarding my question? Thank you. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi community!
We're experiencing longer than expected producer timeouts during network partition scenarios in our Strimzi Kafka cluster. Here's our setup and test scenario:
Environment:
Cluster setup:
Topic configuration:
Test scenario:
Producer Configuration:
Failure Simulation:
Using chaos-mesh, we simulated a network partition for the broker that was the partition leader
Observed producer errors lasting approximately 15 seconds before recovery
chaos-mesh CRD snippet:
Logs
I also tried running ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic xk6_kafka_test_topic --describe on one of the brokers in a loop to see how quickly the cluster detects broker loss. Although this was a slightly different Kafka cluster (6 brokers and 3 controllers), the same issue occurred.
Provided logs show Kafka write errors and leader election delays, with a notable lag between 11:03:14 and 11:03:27.
Question:
We don't want to increase the producer timeout at this time, as it is a business requirement. Is there a way to reduce this recovery time? What configurations should we look into to minimize the producer timeout period during leader failover? I've noticed that strimzi operator ignores
broker.
settings.We'd appreciate any insights on:
Thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions