Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large-scale unrecoverable failure of consumer groups. #2342

Closed
edoger opened this issue Sep 19, 2022 · 5 comments
Closed

Large-scale unrecoverable failure of consumer groups. #2342

edoger opened this issue Sep 19, 2022 · 5 comments

Comments

@edoger
Copy link

edoger commented Sep 19, 2022

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
1.36.0 2.8.12 1.14-1.19
Configuration

What configuration values are you using for Sarama and Kafka?

k := sarama.NewConfig()
k.Consumer.Offsets.Initial = sarama.OffsetNewest
Logs

When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.

logs: CLICK ME

[2022-09-19T11:14:28.416+08:00] DBG Kafka: kafka: error while consuming *****/24: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[2022-09-19T11:14:28.417+08:00] DBG Kafka: kafka: error while consuming *****/3: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[2022-09-19T11:14:28.417+08:00] DBG Kafka: kafka: error while consuming *****/6: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[2022-09-19T11:14:28.417+08:00] DBG Kafka: kafka: error while consuming *****/18: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition
[2022-09-19T11:14:28.417+08:00] DBG Kafka: consumergroup/session/test-1663557258425244539-a9819ae8-4030-4529-a3c8-6a5d5067e684/944 heartbeat loop stopped
[2022-09-19T11:14:28.417+08:00] DBG Kafka: consumergroup/session/test-1663557258425244539-a9819ae8-4030-4529-a3c8-6a5d5067e684/944 released
[2022-09-19T11:14:28.417+08:00] DBG Kafka: Initializing new client
[2022-09-19T11:14:28.417+08:00] DBG Kafka: client/metadata fetching metadata for all topics from broker *****:9092
[2022-09-19T11:14:28.417+08:00] DBG Kafka: Connected to broker at *****:9092 (unregistered)
[2022-09-19T11:14:28.418+08:00] DBG Kafka: client/brokers registered new broker #476156 at *****:9092
[2022-09-19T11:14:28.418+08:00] DBG Kafka: client/brokers registered new broker #476157 at *****:9092
[2022-09-19T11:14:28.418+08:00] DBG Kafka: client/brokers registered new broker #476158 at *****:9092
[2022-09-19T11:14:28.418+08:00] DBG Kafka: Successfully initialized new client
[2022-09-19T11:14:28.418+08:00] DBG Kafka: client/metadata fetching metadata for [*****] from broker *****:9092
[2022-09-19T11:14:28.419+08:00] DBG Kafka: client/coordinator requesting coordinator for consumergroup ***** from *****:9092
[2022-09-19T11:14:28.419+08:00] DBG Kafka: client/coordinator coordinator for consumergroup ***** is #476157 (*****:9092)
[2022-09-19T11:14:28.420+08:00] DBG Kafka: Connected to broker at *****:9092 (registered as #476157)

Problem Description

#2252

Has this new feature been tested at scale? At present, our production environment has experienced large-scale consumption stagnation after upgrading to the latest version, and the failure cannot be recovered by restarting the program. The final analysis determined that it was caused by this new feature (not fully backward compatible, heartbeat loop stopped).

@geberl
Copy link

geberl commented Sep 26, 2022

With k.Consumer.Offsets.Initial = sarama.OffsetNewest we see the same behavior in our staging environment, but thankfully not in prod.

Sarama Kafka Go
1.36.0 2.8.0 1.19

Same versions in both environments. We downgraded Sarama to v1.35.0 and the issue disappeared.

@edoger
Copy link
Author

edoger commented Sep 26, 2022

@geberl We found this problem when graying out to 1% nodes. It is recommended to downgrade to version 1.34.1. According to the feedback from our performance testing team, the performance of version 1.35.0 is lower than that of version 1.34.1.

dnwe added a commit that referenced this issue Sep 27, 2022
A number of people have reported issues with the change in default
behaviour that was introduced in #2252 and the comment for this
configuration value had suggested it would retain the behaviour of
previous Sarama versions. This PR puts the default behaviour back to
resetting to `c.Consumer.Offsets.Initial` when the server returns an out
of range error on a Fetch.

Contributes-to: #2342
@dnwe
Copy link
Collaborator

dnwe commented Sep 27, 2022

I pushed out the above PR to restore the default behaviour of invalid offset resetting and made a release 1.37.0 containing it (amongst other things)

@edoger
Copy link
Author

edoger commented Sep 28, 2022

@dnwe Thanks! We will do a full test with the new version, I think the problem is solved.

@geberl
Copy link

geberl commented Sep 28, 2022

@dnwe thanks for the fix, I can't reproduce the issue any more on my side 🎉

@edoger edoger closed this as completed Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants