-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify corrupted member depending on quorum #14828
Conversation
Codecov Report
@@ Coverage Diff @@
## main #14828 +/- ##
==========================================
- Coverage 75.52% 75.45% -0.07%
==========================================
Files 457 457
Lines 37423 37469 +46
==========================================
+ Hits 28264 28274 +10
- Misses 7386 7413 +27
- Partials 1773 1782 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
With the change in this PR, I reverted the change in #14824 |
Thanks @fuweid for the comments, all look good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM(non-binding)
Note that this is not something that was introduced for compact hash verification, the issue was already present in My suggestion:
What do you think? |
Yes, I was aware of it that
Seems like a good suggestion, but the only concern is that it may break the existing user experience. @ptabor @spzala what's your thought?
Agreed. Let's do similar change for |
This was broken for long time until we fixed only recently #14272. A bug turned out to be a feature :P. I don't see an issue with backporting this. Corruption checks previously set |
@@ -258,57 +259,152 @@ func (cm *corruptionChecker) CompactHashCheck() { | |||
) | |||
hashes := cm.uncheckedRevisions() | |||
// Assume that revisions are ordered from largest to smallest | |||
for i, hash := range hashes { | |||
for _, hash := range hashes { | |||
peers := cm.hasher.PeerHashByRev(hash.Revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name looks like local 'lookup', while it's actually a blocking remote serialized calls to multiple endpoints. How about RequestHashFromPeerByRav
or CallPeerAndGetHash()
'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvements for the backlog:
- getPeerHashKVs() could fetch the hashes in paralle.
ServeHTTP
could populate list of hashes we have (for consumption in v3.7, 2028)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For v3.6 release we should consider hash being negotiated via raft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss & address this in a separate session/PR. Renaming and fetching the hashes in parallel make sense to me. @ptabor where & how is the 2028
coming from? :)
For @serathius 's comment "consider hash being negotiated via raft
", I did not get the point. My immediate feeling there is no need, because the compaction is already coordinated by raft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should consider hash being negotiated via raft.
Is it going to persist the hash result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the 2028 coming from ?
Just extrapolation or release frequency
Rafting hashes ?
I imagine this could work that way that way:
- there is new RAFT message 'start-checksum' that triggers all members to compute checksum at the exact revision. So it's 'simpler' compaction. Compaction stays as doing this implicitly.
- Whenever member finishes the computation it sends to leader their result (pair: rev, hash).
- Leader broadcasts ? the received results through RAFT
- Every-member can react on discrepancy.
Benefit: Does not require custom service and best-effort attempts to check whether we have checks-in-sync.
But I would consider evaluating (on-line) merkle root ( #13839 ) sums design first and thinking what raft changes would be needed for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ptabor for the comment. It's a big topic, let's discuss it separately.
c2c0d0b
to
17cd0e0
Compare
When the leader detects data inconsistency by comparing hashes, currently it assumes that the follower is the corrupted member. It isn't correct, the leader might be the corrupted member as well. We should depend on quorum to identify the corrupted member. For example, for 3 member cluster, if 2 members have the same hash, the the member with different hash is the corrupted one. For 5 member cluster, if 3 members have the same same, the corrupted member is one of the left two members; it's also possible that both the left members are corrupted. Signed-off-by: Benjamin Wang <wachao@vmware.com>
The change did in etcd-io#14824 fixed the test instead of the product code. It isn't correct. After we fixed the product code in this PR, we can revert the change in that PR. Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
…orrupted member If quorum doesn't exist, we don't know which members data are corrupted. In such situation, we intentionally set the memberID as 0, it means it affects the whole cluster. It's align with what we did for 3.4 and 3.5 in etcd-io#14849 Signed-off-by: Benjamin Wang <wachao@vmware.com>
17cd0e0
to
e606d22
Compare
Proposed to rename some tests to make it easier to identify if we are missing any tests, but feel free to skip suggestions if you don't agree with them. Would love to see consistent naming for those scenarios, but maybe in next PR. |
1ba6390
to
15326f0
Compare
…heck Signed-off-by: Benjamin Wang <wachao@vmware.com>
15326f0
to
d545d60
Compare
Resolved all the comments. Renaming isn't a big deal. But you are the original author the unit test case, so I followed all your suggestion. PTAL, thx. |
Currently when the compact hash checker detects hash mismatch, it assumes that the corrupted member is always one of the followers. This isn't correct, because it's also possible that it's the leader's data corrupted. It's also possible that there are multiple members corrupted, for example 2 members out of a 5 member cluster.
The solution is to depend on quorum to identify the corrupted member. For example, for a 3 member cluster, if 2 members have the same compactRevision and hash, then the left one member is the corrupted one. For a 5 member cluster, if at least 3 members have the same CompactRevision and hash, then the left members are the corrupted ones.
If there isn't a quorum, then the least minority are regarded as the corrupted member. For example, for a 5 member cluster, m1 and m2 have the same CompactRevision and hash, m3 and 4 have the same CompactRevision and hash, the m5 is the corrupted member.