-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redpanda:testing: partition_movement_test failing in Github CI #1426
Comments
I did a deep dive into this issue. The whole problem is caused by one of the nodes (there are 5 of them) is not responding for controller (raft0) consensus RPC requests, they all timeout.
this effectively prevents one of the nodes from getting The test asserts state on every node. The node that doesn't get the updates is unable to execute partition configuration update operation hence the test can not continue and fails. I have to investigate why one of the nodes is timeouting all requests during 30 seconds window. |
Fixed bug leading to situation in which caller have to wait for operation requiring log write log until reader where evicted from cache by eviction timer. The problem was related with the fact that when `cached_reader` `entry_guard` was destroyed it didn't scheduled disposing readers that were marked as invalid but only those which were no longer reusable. This might lead to the situation in which reader that was in use during truncate call might not be disposed immediately after it was destroyed even tho it was marked as invalid by `readers_cache::evict_if` method. The reader was reinserted to the readers cache, preventing truncation to finish. The reader was then removed in the next eviction timer iteration. Example leading to this bug occurrence: 1) create reader 2) truncate log in a range for which reader holds a lock 3) destroy a reader 4) truncation should finish immediately after reader was destroyed Fixes: redpanda-data#1426 Signed-off-by: Michal Maslanka <michal@vectorized.io>
This problem is related with readers cache. More details in: |
Fixed bug leading to situation in which caller have to wait for operation requiring log write log until reader where evicted from cache by eviction timer. The problem was related with the fact that when `cached_reader` `entry_guard` was destroyed it didn't scheduled disposing readers that were marked as invalid but only those which were no longer reusable. This might lead to the situation in which reader that was in use during truncate call might not be disposed immediately after it was destroyed even tho it was marked as invalid by `readers_cache::evict_if` method. The reader was reinserted to the readers cache, preventing truncation to finish. The reader was then removed in the next eviction timer iteration. Example leading to this bug occurrence: 1) create reader 2) truncate log in a range for which reader holds a lock 3) destroy a reader 4) truncation should finish immediately after reader was destroyed Fixes: redpanda-data#1426 Signed-off-by: Michal Maslanka <michal@vectorized.io>
https://pipelines.actions.githubusercontent.com/XSI7xPi23IdMcR1QOMsb00ALQsxoSJhYpNBOdFaptPX1RGgaWB/_apis/pipelines/1/runs/16391/signedlogcontent/3?urlExpires=2021-05-19T21%3A50%3A32.5588711Z&urlSigningMethod=HMACV1&urlSignature=9z5cm6zU710JKfGY49qSJ%2BhR0KUyZPE%2BEP16zmK%2BUfg%3D
The text was updated successfully, but these errors were encountered: