-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large-scale unrecoverable failure of consumer groups. #2342
Comments
With
Same versions in both environments. We downgraded Sarama to v1.35.0 and the issue disappeared. |
@geberl We found this problem when graying out to 1% nodes. It is recommended to downgrade to version 1.34.1. According to the feedback from our performance testing team, the performance of version 1.35.0 is lower than that of version 1.34.1. |
A number of people have reported issues with the change in default behaviour that was introduced in #2252 and the comment for this configuration value had suggested it would retain the behaviour of previous Sarama versions. This PR puts the default behaviour back to resetting to `c.Consumer.Offsets.Initial` when the server returns an out of range error on a Fetch. Contributes-to: #2342
I pushed out the above PR to restore the default behaviour of invalid offset resetting and made a release 1.37.0 containing it (amongst other things) |
@dnwe Thanks! We will do a full test with the new version, I think the problem is solved. |
@dnwe thanks for the fix, I can't reproduce the issue any more on my side 🎉 |
Versions
Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.
Configuration
What configuration values are you using for Sarama and Kafka?
Logs
When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set
sarama.Logger
to alog.Logger
to capture Sarama debugoutput.
logs: CLICK ME
Problem Description
#2252
Has this new feature been tested at scale? At present, our production environment has experienced large-scale consumption stagnation after upgrading to the latest version, and the failure cannot be recovered by restarting the program. The final analysis determined that it was caused by this new feature (not fully backward compatible, heartbeat loop stopped).
The text was updated successfully, but these errors were encountered: