-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (Consumed from an unexpected offset) in PartitionMoveInterruption.test_cancelling_partition_move
#17847
Comments
Previously, when force-aborting a reconfiguration, we appended an aborting configuration on all replicas. This can lead to log inconsistencies as on followers the configuration will be duplicated (one from own append, one replicated by the leader). Although these inconsistencies are expected for force-abort, if the leader is alive, we can minimize the chance of their appearance by waiting on followers for the aborting config to be replicated from the leader. Fixes redpanda-data#17847
This was indirectly caused by #17789 that fixed a bug in offset translation of log end offset (and as a result fetch offset validation became stricter). In case of force-abort there is a log discrepancy between leaders and followers that (after a leadership change) leads to offset-out-of-range error and fetch offset reset (previously this wasn't the case because fetch offset validation was incorrect). Although this discrepancy is kind of expected for force-abort, we can minimize the chance of it, see the attached pr. |
Previously, when force-aborting a reconfiguration, we appended an aborting configuration on all replicas. This can lead to log inconsistencies as on followers the configuration will be duplicated (one from own append, one replicated by the leader). Although these inconsistencies are expected for force-abort, if the leader is alive, we can minimize the chance of their appearance by waiting on followers for the aborting config to be replicated from the leader. Fixes redpanda-data#17847 (cherry picked from commit 8e221d3)
Previously, when force-aborting a reconfiguration, we appended an aborting configuration on all replicas. This can lead to log inconsistencies as on followers the configuration will be duplicated (one from own append, one replicated by the leader). Although these inconsistencies are expected for force-abort, if the leader is alive, we can minimize the chance of their appearance by waiting on followers for the aborting config to be replicated from the leader. Fixes redpanda-data#17847 (cherry picked from commit 8e221d3)
PartitionMoveInterruption.test_cancelling_partition_move
PartitionMoveInterruption.test_cancelling_partition_move
Previously, when force-aborting a reconfiguration, we appended an aborting configuration on all replicas. This can lead to log inconsistencies as on followers the configuration will be duplicated (one from own append, one replicated by the leader). Although these inconsistencies are expected for force-abort, if the leader is alive, we can minimize the chance of their appearance by waiting on followers for the aborting config to be replicated from the leader. Fixes redpanda-data#17847 (cherry picked from commit 8e221d3)
It seems that this failure popped up in a PR run today: https://buildkite.com/redpanda/redpanda/builds/48353#018f1bd9-a4db-4853-ad9c-e9b416447aca |
https://buildkite.com/redpanda/redpanda/builds/47713
JIRA Link: CORE-2353
The text was updated successfully, but these errors were encountered: