-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure in PartitionBalancerTest.test_rack_awareness
#5795
Comments
Rack awareness is forced in redpanda with soft allocation constraint. It is done in such a way to make it possible to allocate partitions even if there are not enough nodes in distinct racks. When the rack awareness test was running the partition balancer was able to calculate movements before the node that was previously stopped was reported alive. This way a partition must have been allocated when two nodes were unavailable when those two nodes happened to be on the same rack the rack awareness constraint could not be held. Fixes: redpanda-data#5795 Signed-off-by: Michal Maslanka <michal@redpanda.com>
Rack awareness is forced in redpanda with soft allocation constraint. It is done in such a way to make it possible to allocate partitions even if there are not enough nodes in distinct racks. When the rack awareness test was running the partition balancer was able to calculate movements before the node that was previously stopped was reported alive. This way a partition must have been allocated when two nodes were unavailable when those two nodes happened to be on the same rack the rack awareness constraint could not be held. Fixes: redpanda-data#5795 Signed-off-by: Michal Maslanka <michal@redpanda.com>
logs and stuff https://buildkite.com/redpanda/vtools/builds/7962#_
|
I looked briefly into this. The problem is that at some point the balancer marked both nodes in rack C as unavailable:
even though node 5 was made available some time before:
From balancer logs it is clear that the balancer didn't notice that node 5 was up (redpanda-1):
This is most probably related to a change where we started relying on Sev/medium as this is a Redpanda bug. |
Assigning @bharathv since he is already looking into this. |
https://buildkite.com/redpanda/redpanda/builds/13399#0182598f-ee62-46da-900f-15a299898705
The text was updated successfully, but these errors were encountered: