-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revision inconsistency caused by panic during defrag #14685
Conversation
c0947e9
to
f5ea4f1
Compare
Codecov Report
@@ Coverage Diff @@
## main #14685 +/- ##
==========================================
- Coverage 75.56% 75.42% -0.14%
==========================================
Files 457 457
Lines 37308 37308
==========================================
- Hits 28191 28141 -50
- Misses 7351 7382 +31
- Partials 1766 1785 +19
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
ping @ahrtr |
tests/linearizability/model.go
Outdated
if state.LastRevision < response.revision { | ||
state.Value = request.putData | ||
state.LastRevision = response.revision | ||
return true, state | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment https://github.com/etcd-io/etcd/pull/14685/files#r1014471776 might still valid.
Since response.err
is nill
, so we should save the request.putData
and response.revision
anyway.
if state.LastRevision < response.revision { | |
state.Value = request.putData | |
state.LastRevision = response.revision | |
return true, state | |
} | |
lastRev := state.LastRevision | |
state.Value = request.putData | |
state.LastRevision = response.revision | |
if lastRev < response.revision { | |
return true, state | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit suggestion you proposed doesn't change the logic at all, you just save data before you validate the state transition is correct.
We might need to update state.LastRevision as long as response.revision != 0
Please remember that revision is a positive number, so checking that revision is increasing like in state.LastRevision < response.revision
already ensures that response.revision != 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as response.err
is nil, then we need to save request.putData
and response.revision
to state. While your code only save them when state.LastRevision < response.revision
and of course response.err
is nil,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as response.err is nil, then we need to save request.putData and response.revision to state
That's not true. Model should only allow operations that would be correct for etcd. Etcd revision should never decrease. That's why I put state.LastRevision < response.revision
check.
f5ea4f1
to
9444106
Compare
The current model found 2 linearizability issues found in CI run https://github.com/etcd-io/etcd/actions/runs/3418916665. Possibly model is too strict but would like to discuss it openly to make sure I didn't miss anything. I'm working on reproducing them locally. First case in single cluster test we have a case of etcd skipping 9 revisions even though there was no other writes. This breaks assumption that revision increases only on successfull write. Second case in 3 node cluster is even stranger, etcd reported different data for the same revision. There was a long running get transaction that returned old data with much newer revision. Conclusion, there seems to be an issue with revision header returned in Get header. cc @ahrtr |
1b9542c
to
63825db
Compare
tests/linearizability/model.go
Outdated
if state.Value == response.getData && state.LastRevision <= response.revision { | ||
return true, state | ||
} | ||
if state.LastRevision >= response.revision { | ||
return false, state | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems not correct, because both if
statements cover the condition state.LastRevision == response.revision
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be correct, let me explain.
First if statement validates that Get read result of last successful put. Value that is stored in state needs to equal the result in put. On the other hand second and third statement handle a put that returned an error to client but was persisted. To "restore" a lost put, there are two requirements:
- Revision must have increased (second if statement)
- There must be a request that failed that stored the data (third if statement).
Let me rewrite the code for this to be more clear.
You are right, but in both cases revision should come from bbolt database. It is expected to be the revision for the latest write. For put requests it means that it is revision of the write itself. For get request it's revision of the request, but revision of latest write that happened before. I still think it should be monotonically increasing and this is what the model tests. |
63825db
to
e000a9f
Compare
I think learnability test only guarantees that read request never reads stale data (of course, the revision in RangeResponse Header will >= the revision in previous write request). What you have done so far (not including this PR) is good. But the revision in RangeResponse header is volatile (although it's always increasing), even there is no any write request on the key you are reading. As long as there are write requests on other keys, revision in RangeResponse header will change. . It means that when two read requests read the same key, one request might get new data but old revision. Please note that in a range request, reading the revision and creating a "snapshot" of the data isn't an atomic operation. See kvstore_txn.go#L46-L54 Probably you can use CreateRevision instead. But per my immediate feeling, it might not be a good idea to test revision in learnability test. Instead, we should verify the revision at the end of the test, as long as all members can converge the same revision, then it's OK. Note: I was planning to add such checker for functional test. |
Implemented #14714 to dump data directory after the tests. I got interesting results, as mentioned before revisions in each instance does not match. Same PUT request I got the result by downloading artifact from https://github.com/etcd-io/etcd/actions/runs/3426561978 and analysing files using Member 0:
Member 1
Member 2:
|
I reproduced this issue locally, it seems like a real issue. Let me have a deep dive on this. Note: please ignore my previous comment, part (the "two read requests" part) of which isn't correct. |
@ahrtr Thanks for looking into this, did you do anything specific to reproduce issue locally? With local reproduction can you validate if v3.4 and v3.5 is affected. If so we should delay v3.5.6 release. |
525a2ea
to
ac78db2
Compare
Ok, I got a local repro. I run different etcd version that is not affected. |
Confirmed that issue is reproducable on main branch and on v3.5.5 release. Looks like v3.4.22 is not affected. Will look into bisecting git history when I have time. |
Confirmed. I have the same finding. |
Git history bisection has pointed out 2dbecea as the cause. |
`unsafeCommit` is called by both `(*batchTxBuffered) commit` and `(*backend) defrag`. When users perform the defragmentation operation, etcd doesn't update the consistent index. If etcd crashes(e.g. panicking) in the process for whatever reason, then etcd replays the WAL entries starting from the latest snapshot, accordingly it may re-apply entries which might have already been applied, eventually the revision isn't consistent with other members. Refer to discussion in etcd-io#14685 Signed-off-by: Benjamin Wang <wachao@vmware.com>
Eventually I figured out the root cause, Please refer to #14730 |
`unsafeCommit` is called by both `(*batchTxBuffered) commit` and `(*backend) defrag`. When users perform the defragmentation operation, etcd doesn't update the consistent index. If etcd crashes(e.g. panicking) in the process for whatever reason, then etcd replays the WAL entries starting from the latest snapshot, accordingly it may re-apply entries which might have already been applied, eventually the revision isn't consistent with other members. Refer to discussion in etcd-io#14685 Signed-off-by: Benjamin Wang <wachao@vmware.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
ac78db2
to
b24e4ea
Compare
With fix merged with can also merge this change. ping @ahrtr |
tests/linearizability/model.go
Outdated
if state.Value == response.getData && state.LastRevision <= response.revision { | ||
return true, state | ||
} | ||
if state.LastRevision >= response.revision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two successive read requests may have the same revision.
if state.LastRevision >= response.revision { | |
if state.LastRevision > response.revision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained it above, this if statements doesn't cover successive reads. It covers cases where a PUT request has failed before the read.
b24e4ea
to
c1ed370
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
I'll leave you to decide whether/how to resolve #14685 (comment)
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
c1ed370
to
ff6c93f
Compare
Resolved the comment and added tests for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @serathius lgtm
All the existing etcd 3.5 releases are affected, including 3.5.0 ~ 3.5.5. Note: etcd 3.4 doesn't have this issue. Note that there is no impact on etcd 3.5 either if performing the defragmentation operation offline using etcdutl. |
`unsafeCommit` is called by both `(*batchTxBuffered) commit` and `(*backend) defrag`. When users perform the defragmentation operation, etcd doesn't update the consistent index. If etcd crashes(e.g. panicking) in the process for whatever reason, then etcd replays the WAL entries starting from the latest snapshot, accordingly it may re-apply entries which might have already been applied, eventually the revision isn't consistent with other members. Refer to discussion in etcd-io#14685 Signed-off-by: Benjamin Wang <wachao@vmware.com>
When working on adding revision support to robustness testing CI found 2 linearizability issues in https://github.com/etcd-io/etcd/actions/runs/3418916665. Possibly model is too strict but would like to discuss it openly to make sure I didn't miss anything. I'm working on reproducing them locally.
First case in single cluster test we have a case of etcd skipping 9 revisions even though there was no other writes. This breaks assumption that revision increases only on successfull write.
Would really love to see the db file, so I think the follow up to that issue is to implement uploading etcd data directory on failed test.
Second case in 3 node cluster is even stranger, etcd reported different data for the same revision. There was a long running get transaction that returned old data with much newer revision.
Conclusion, there seems to be an issue with revision header returned in Get header. cc @ahrtr