-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raft: Check suspect info once per suspect interval #1600
Conversation
@@ -60,6 +61,9 @@ func (pc *PeriodicCheck) check() { | |||
} | |||
|
|||
func (pc *PeriodicCheck) conditionNotFulfilled() { | |||
if pc.ReportCleared != nil && pc.conditionHoldsSince != (time.Time{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of:
pc.conditionHoldsSince != (time.Time{}
do:
pc.conditionHoldsSince.IsZero()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure -- that is quite a bit more graceful, will fix.
Today's existing suspect logic has a periodic checker, which checks every 10s if the Raft cluster still has quorum. If the cluster has lost quorum, it marks the time this event begins, then, every 10s checks to see if 'enough' time has elapsed since the quorum was lost to suspect that the OSN has been evicted. If the OSN has not been evicted, or cannot determine its eviction status, then every 10s the OSN attempts to re-check its suspicion status, which can lead to large volumes of network traffic, especially in significiantly multichannel environments. This commit modifies the logic to track the number of times that the suspect checking logic has actually executed, to ensure that we check no more than once every suspect interval (by default every 10m, instead of every 10s). Signed-off-by: Jason Yellick <jyellick@us.ibm.com>
771e51c
to
8b3eef0
Compare
@Mergifyio backport release-2.2 |
@Mergifyio backport release-2.1 |
@Mergifyio backport release-2.0 |
@Mergifyio backport release-1.4 |
Command
|
Today's existing suspect logic has a periodic checker, which checks every 10s if the Raft cluster still has quorum. If the cluster has lost quorum, it marks the time this event begins, then, every 10s checks to see if 'enough' time has elapsed since the quorum was lost to suspect that the OSN has been evicted. If the OSN has not been evicted, or cannot determine its eviction status, then every 10s the OSN attempts to re-check its suspicion status, which can lead to large volumes of network traffic, especially in significiantly multichannel environments. This commit modifies the logic to track the number of times that the suspect checking logic has actually executed, to ensure that we check no more than once every suspect interval (by default every 10m, instead of every 10s). Signed-off-by: Jason Yellick <jyellick@us.ibm.com> (cherry picked from commit c90015c)
Command
|
Command
|
Command
|
Today's existing suspect logic has a periodic checker, which checks every 10s if the Raft cluster still has quorum. If the cluster has lost quorum, it marks the time this event begins, then, every 10s checks to see if 'enough' time has elapsed since the quorum was lost to suspect that the OSN has been evicted. If the OSN has not been evicted, or cannot determine its eviction status, then every 10s the OSN attempts to re-check its suspicion status, which can lead to large volumes of network traffic, especially in significiantly multichannel environments. This commit modifies the logic to track the number of times that the suspect checking logic has actually executed, to ensure that we check no more than once every suspect interval (by default every 10m, instead of every 10s). Signed-off-by: Jason Yellick <jyellick@us.ibm.com> (cherry picked from commit c90015c)
Type of change
Description
Today's existing suspect logic has a periodic checker, which checks
every 10s if the Raft cluster still has quorum. If the cluster has lost
quorum, it marks the time this event begins, then, every 10s checks to
see if 'enough' time has elapsed since the quorum was lost to suspect
that the OSN has been evicted.
If the OSN has not been evicted, or cannot determine its eviction
status, then every 10s the OSN attempts to re-check its suspicion
status, which can lead to large volumes of network traffic, especially
in significiantly multichannel environments.
This commit modifies the logic to track the number of times that the
suspect checking logic has actually executed, to ensure that we check no
more than once every suspect interval (by default every 10m, instead of
every 10s).
-->