You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug in raft #15595 causes etcd members to crash. With only 1 out of three members cluster is not able to proceed.
In this case instead of waiting infinity we should just detect that cluster is unhealthy and abort collecting watch.
Bonus points for implementing timeout for watching on events
Why is this needed?
Robustness tests should not timeout
The text was updated successfully, but these errors were encountered:
Yes, this is the correct function. It is infinite due to recent change in #15575.
However the function by itself is not the problem. I think we should have an external mechanism that checks cluster is totally down and cancels the context passed to collectClusterWatchEvents and watchMember.
My first guess would be that triggerFailpoints function should validate that cluster is healthy between and after injecting failpoint. If it's not it should propagate the signal up as an error. With error the runScenario function can cancel the context passed to collectClusterWatchEvents.
What would you like to be added?
Recent change that wait for all watch events caused nightly robustness tests runs to wait infinitely https://github.com/etcd-io/etcd/actions/runs/4562802681
Bug in raft #15595 causes etcd members to crash. With only 1 out of three members cluster is not able to proceed.
In this case instead of waiting infinity we should just detect that cluster is unhealthy and abort collecting watch.
Bonus points for implementing timeout for watching on events
Why is this needed?
Robustness tests should not timeout
The text was updated successfully, but these errors were encountered: