You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We were testing a network outage scenario where one of three data centers became unavailable and noticed strange fluctuations and resets in committed offsets after the data center went back online. I've observed some anomalies in the logs that might be related to it.
Temporary errors in host resolution that result in the resetting of offsets
2023-09-08T00:13:46.246Z // Started at correct offset
Consumer in the group "testbench-1000": "[thrd:main]: Partition test-topic [0] start fetching at offset 57842809 (leader epoch 7007)" Code: "FETCH"; SysLevel: Debug;
2023-09-08T00:13:50.051Z // Race condition? Committed offset and leader epoch for partition 0 is from partition 7 (see log below)
Consumer in the group "testbench-1000": "[thrd:main]: Topic test-topic [0]: stored offset INVALID (leader epoch -1), committed offset 55171745 (leader epoch 6476): not including in commit" Code: "OFFSET"; SysLevel: Debug;
2023-09-08T00:13:50.058Z
Consumer in the group "testbench-1000": "[thrd:main]: Topic test-topic [7]: stored offset 55181424 (leader epoch 6476), committed offset 55171745 (leader epoch 6476): setting stored offset 55181424 (leader epoch 6476) for commit" Code: "OFFSET"; SysLevel: Debug;
2023-09-08T00:13:55.056Z // Back to normal
Consumer in the group "testbench-1000": "[thrd:main]: Topic test-topic [0]: stored offset 57842809 (leader epoch 7007), committed offset 57842809 (leader epoch 7007): not including in commit" Code: "OFFSET";"
Here is the graph displaying the committed offsets by partitions for that consumer group:
Please note that on our test bench the probability of encountering a race condition increases, because the Kubernetes pods running the consumer are constantly being throttled.
Checklist
Please provide the following information:
librdkafka version (release number or git tag): v2.2.0
Apache Kafka version: v2.7.2
librdkafka client configuration:
auto.offset.reset: earliest
Operating system: Debian 11.7
Provide logs (with debug=.. as necessary) from librdkafka
Provide broker log excerpts: N/A
Critical issue
The text was updated successfully, but these errors were encountered:
Description
We were testing a network outage scenario where one of three data centers became unavailable and noticed strange fluctuations and resets in committed offsets after the data center went back online. I've observed some anomalies in the logs that might be related to it.
Temporary errors in host resolution that result in the resetting of offsets
Suspicious updates of committed offsets
2023-09-08T00:13:46.246Z
// Started at correct offset2023-09-08T00:13:50.051Z
// Race condition? Committed offset and leader epoch for partition 0 is from partition 7 (see log below)2023-09-08T00:13:50.058Z
2023-09-08T00:13:55.056Z
// Back to normalHere is the graph displaying the committed offsets by partitions for that consumer group:
Please note that on our test bench the probability of encountering a race condition increases, because the Kubernetes pods running the consumer are constantly being throttled.
Checklist
Please provide the following information:
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: