Partition without leader where all brokers are listed as followers #8978
Labels
kind/bug
Categorizes an issue or PR as a bug
scope/broker
Marks an issue or PR to appear in the broker section of the changelog
severity/mid
Marks a bug as having a noticeable impact but with a known workaround
support
Marks an issue as related to a customer support request
version:8.1.0-alpha1
Marks an issue as being completely or in parts released in 8.1.0-alpha1
version:8.1.0
Marks an issue as being completely or in parts released in 8.1.0
Describe the bug
According to the heap dump, in the Raft layer,
data:image/s3,"s3://crabby-images/aea05/aea057acf5c8e66b7ae6b82ae0262275beaa4871" alt="image"
Broker 1
is leader for partition 2:But when checking the cluster state with
zbctl
, there isn't any leader for partition 2, instead, all brokers are listed as followers:According to the logs, on Broker 1 the
ZeebePartition
tried to transition toLEADER
but that transition got canceled by a transition toINACTIVE
followed up by a transition toFOLLOWER
. So, as a result, theZeebePartition
ends up in theFOLLOWER
role while in the Raft layer it is still in theLEADER
role.The following events happened which lead to the state:
ZeebePartition
sRoleChangeListener
and theSnapshotReplicationListener
results in the following transitions:The registration of the listeners happens in the following order:
https://github.com/camunda/zeebe/blob/41008dde419ba93115c8981fde36b807496d61e3/broker/src/main/java/io/camunda/zeebe/broker/system/partitions/ZeebePartition.java#L289-L293
This means it will submit three transitions in the
ZeebePartition
:LEADER
,INACTIVE
, andFOLLOWER
Transitions 2- and 3- are caused by the fact that at the beginning a snapshot was received:
https://github.com/camunda/zeebe/blob/41008dde419ba93115c8981fde36b807496d61e3/atomix/cluster/src/main/java/io/atomix/raft/impl/RaftContext.java#L472-L488
Impact
Expected behavior
The
ZeebePartition
transitions to theLEADER
role successfully.Possible Solutions:
Change the order in which the listeners are registered, for instance, register first theSnapshotReplicationListener
and then theRoleChangeListener
.Ensure that inSnapshotReplicationListener#onSnapshotReplicationCompleted()
a transition is triggered to the previous role.related to
The text was updated successfully, but these errors were encountered: