Revision inconsistency caused by panic during defrag #14685

serathius · 2022-11-04T10:30:24Z

When working on adding revision support to robustness testing CI found 2 linearizability issues in https://github.com/etcd-io/etcd/actions/runs/3418916665. Possibly model is too strict but would like to discuss it openly to make sure I didn't miss anything. I'm working on reproducing them locally.

First case in single cluster test we have a case of etcd skipping 9 revisions even though there was no other writes. This breaks assumption that revision increases only on successfull write.
Would really love to see the db file, so I think the follow up to that issue is to implement uploading etcd data directory on failed test.

Second case in 3 node cluster is even stranger, etcd reported different data for the same revision. There was a long running get transaction that returned old data with much newer revision.

Conclusion, there seems to be an issue with revision header returned in Get header. cc @ahrtr

serathius · 2022-11-04T15:33:10Z

cc @ahrtr @spzala @ptabor @chaochn47

tests/linearizability/model.go

codecov-commenter · 2022-11-06T15:23:19Z

Codecov Report

Merging #14685 (525a2ea) into main (f482f7b) will decrease coverage by 0.13%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #14685      +/-   ##
==========================================
- Coverage   75.56%   75.42%   -0.14%     
==========================================
  Files         457      457              
  Lines       37308    37308              
==========================================
- Hits        28191    28141      -50     
- Misses       7351     7382      +31     
- Partials     1766     1785      +19

Flag	Coverage Δ
all	`75.42% <ø> (-0.14%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
client/pkg/v3/tlsutil/tlsutil.go	`83.33% <0.00%> (-8.34%)`	⬇️
server/lease/lease.go	`94.87% <0.00%> (-5.13%)`	⬇️
client/v3/concurrency/session.go	`88.63% <0.00%> (-4.55%)`	⬇️
server/etcdserver/txn/util.go	`75.47% <0.00%> (-3.78%)`	⬇️
server/etcdserver/api/v3rpc/util.go	`70.96% <0.00%> (-3.23%)`	⬇️
server/etcdserver/api/v3rpc/member.go	`93.54% <0.00%> (-3.23%)`	⬇️
server/etcdserver/cluster_util.go	`70.35% <0.00%> (-3.17%)`	⬇️
server/etcdserver/api/v3rpc/interceptor.go	`73.43% <0.00%> (-3.13%)`	⬇️
client/v3/concurrency/election.go	`79.68% <0.00%> (-2.35%)`	⬇️
server/etcdserver/api/v3rpc/auth.go	`79.34% <0.00%> (-2.18%)`	⬇️
... and 12 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

serathius · 2022-11-07T13:51:30Z

ping @ahrtr

ahrtr · 2022-11-07T20:35:39Z

tests/linearizability/model.go

+		if state.LastRevision < response.revision {
+			state.Value = request.putData
+			state.LastRevision = response.revision
+			return true, state
 		}


My previous comment https://github.com/etcd-io/etcd/pull/14685/files#r1014471776 might still valid.

Since response.err is nill, so we should save the request.putData and response.revision anyway.

Suggested change

if state.LastRevision < response.revision {

state.Value = request.putData

state.LastRevision = response.revision

return true, state

}

lastRev := state.LastRevision

state.Value = request.putData

state.LastRevision = response.revision

if lastRev < response.revision {

return true, state

}

Commit suggestion you proposed doesn't change the logic at all, you just save data before you validate the state transition is correct.

We might need to update state.LastRevision as long as response.revision != 0

Please remember that revision is a positive number, so checking that revision is increasing like in state.LastRevision < response.revision already ensures that response.revision != 0

As long as response.err is nil, then we need to save request.putData and response.revision to state. While your code only save them when state.LastRevision < response.revision and of course response.err is nil,.

As long as response.err is nil, then we need to save request.putData and response.revision to state

That's not true. Model should only allow operations that would be correct for etcd. Etcd revision should never decrease. That's why I put state.LastRevision < response.revision check.

tests/linearizability/model.go

serathius · 2022-11-08T13:50:07Z

The current model found 2 linearizability issues found in CI run https://github.com/etcd-io/etcd/actions/runs/3418916665. Possibly model is too strict but would like to discuss it openly to make sure I didn't miss anything. I'm working on reproducing them locally.

First case in single cluster test we have a case of etcd skipping 9 revisions even though there was no other writes. This breaks assumption that revision increases only on successfull write.
Would really love to see the db file, so I think the follow up to that issue is to implement uploading etcd data directory on failed test.

Second case in 3 node cluster is even stranger, etcd reported different data for the same revision. There was a long running get transaction that returned old data with much newer revision.

Conclusion, there seems to be an issue with revision header returned in Get header. cc @ahrtr

tests/linearizability/model.go

ahrtr · 2022-11-09T01:44:27Z

tests/linearizability/model.go

+	if state.Value == response.getData && state.LastRevision <= response.revision {
+		return true, state
+	}
+	if state.LastRevision >= response.revision {
+		return false, state
+	}


It seems not correct, because both if statements cover the condition state.LastRevision == response.revision.

This should be correct, let me explain.

First if statement validates that Get read result of last successful put. Value that is stored in state needs to equal the result in put. On the other hand second and third statement handle a put that returned an error to client but was persisted. To "restore" a lost put, there are two requirements:

Revision must have increased (second if statement)

There must be a request that failed that stored the data (third if statement).

Let me rewrite the code for this to be more clear.

ahrtr · 2022-11-09T01:45:03Z

Conclusion, there seems to be an issue with revision header returned in Get header. cc @ahrtr

Please note that the revision in the RangeResponse.ResponseHeader is coming from API instead of raft. We should ignore revision in RangeResponse; instead we should only check revision in PutResponse.

serathius · 2022-11-09T08:14:49Z

Please note that the revision in the RangeResponse.ResponseHeader is coming from API instead of raft.

You are right, but in both cases revision should come from bbolt database. It is expected to be the revision for the latest write. For put requests it means that it is revision of the write itself. For get request it's revision of the request, but revision of latest write that happened before. I still think it should be monotonically increasing and this is what the model tests.

ahrtr · 2022-11-09T09:01:07Z

I think learnability test only guarantees that read request never reads stale data (of course, the revision in RangeResponse Header will >= the revision in previous write request). What you have done so far (not including this PR) is good.

But the revision in RangeResponse header is volatile (although it's always increasing), even there is no any write request on the key you are reading. As long as there are write requests on other keys, revision in RangeResponse header will change. . It means that when two read requests read the same key, one request might get new data but old revision. Please note that in a range request, reading the revision and creating a "snapshot" of the data isn't an atomic operation. See kvstore_txn.go#L46-L54

Probably you can use CreateRevision instead. But per my immediate feeling, it might not be a good idea to test revision in learnability test. Instead, we should verify the revision at the end of the test, as long as all members can converge the same revision, then it's OK. Note: I was planning to add such checker for functional test.

serathius · 2022-11-09T09:59:43Z

Implemented #14714 to dump data directory after the tests. I got interesting results, as mentioned before revisions in each instance does not match. Same PUT request put("key", "1093") in different etcd instances has different revision. 8662 in first member, 8669 in second member, 8656 in third member.

I got the result by downloading artifact from https://github.com/etcd-io/etcd/actions/runs/3426561978 and analysing files using etcd-dump-db iterate-bucket ./TestLinearizability_ClusterOfSize3/TestLinearizabilityClusterOfSize3-test-0/member/snap/db key --decode | head. Results

Member 0:

rev={main:8662 sub:0}, value=[key "key" | val "1093" | created 2 | mod 8662 | ver 8661]
rev={main:8661 sub:0}, value=[key "key" | val "6001081" | created 2 | mod 8661 | ver 8660]
rev={main:8660 sub:0}, value=[key "key" | val "7001068" | created 2 | mod 8660 | ver 8659]
rev={main:8659 sub:0}, value=[key "key" | val "5001144" | created 2 | mod 8659 | ver 8658]
rev={main:8658 sub:0}, value=[key "key" | val "3001079" | created 2 | mod 8658 | ver 8657]
rev={main:8657 sub:0}, value=[key "key" | val "2001142" | created 2 | mod 8657 | ver 8656]
rev={main:8656 sub:0}, value=[key "key" | val "1001060" | created 2 | mod 8656 | ver 8655]
rev={main:8655 sub:0}, value=[key "key" | val "4001071" | created 2 | mod 8655 | ver 8654]
rev={main:8654 sub:0}, value=[key "key" | val "1092" | created 2 | mod 8654 | ver 8653]
rev={main:8653 sub:0}, value=[key "key" | val "6001080" | created 2 | mod 8653 | ver 8652]

Member 1

rev={main:8669 sub:0}, value=[key "key" | val "1093" | created 2 | mod 8669 | ver 8668]
rev={main:8668 sub:0}, value=[key "key" | val "6001081" | created 2 | mod 8668 | ver 8667]
rev={main:8667 sub:0}, value=[key "key" | val "7001068" | created 2 | mod 8667 | ver 8666]
rev={main:8666 sub:0}, value=[key "key" | val "5001144" | created 2 | mod 8666 | ver 8665]
rev={main:8665 sub:0}, value=[key "key" | val "3001079" | created 2 | mod 8665 | ver 8664]
rev={main:8664 sub:0}, value=[key "key" | val "2001142" | created 2 | mod 8664 | ver 8663]
rev={main:8663 sub:0}, value=[key "key" | val "1001060" | created 2 | mod 8663 | ver 8662]
rev={main:8662 sub:0}, value=[key "key" | val "4001071" | created 2 | mod 8662 | ver 8661]
rev={main:8661 sub:0}, value=[key "key" | val "1092" | created 2 | mod 8661 | ver 8660]
rev={main:8660 sub:0}, value=[key "key" | val "6001080" | created 2 | mod 8660 | ver 8659]

Member 2:

rev={main:8656 sub:0}, value=[key "key" | val "1093" | created 2 | mod 8656 | ver 8655]
rev={main:8655 sub:0}, value=[key "key" | val "6001081" | created 2 | mod 8655 | ver 8654]
rev={main:8654 sub:0}, value=[key "key" | val "7001068" | created 2 | mod 8654 | ver 8653]
rev={main:8653 sub:0}, value=[key "key" | val "5001144" | created 2 | mod 8653 | ver 8652]
rev={main:8652 sub:0}, value=[key "key" | val "3001079" | created 2 | mod 8652 | ver 8651]
rev={main:8651 sub:0}, value=[key "key" | val "2001142" | created 2 | mod 8651 | ver 8650]
rev={main:8650 sub:0}, value=[key "key" | val "1001060" | created 2 | mod 8650 | ver 8649]
rev={main:8649 sub:0}, value=[key "key" | val "4001071" | created 2 | mod 8649 | ver 8648]
rev={main:8648 sub:0}, value=[key "key" | val "1092" | created 2 | mod 8648 | ver 8647]
rev={main:8647 sub:0}, value=[key "key" | val "6001080" | created 2 | mod 8647 | ver 8646]

ahrtr · 2022-11-10T07:26:33Z

I reproduced this issue locally, it seems like a real issue. Let me have a deep dive on this. Note: please ignore my previous comment, part (the "two read requests" part) of which isn't correct.

serathius · 2022-11-10T08:06:25Z

@ahrtr Thanks for looking into this, did you do anything specific to reproduce issue locally? With local reproduction can you validate if v3.4 and v3.5 is affected. If so we should delay v3.5.6 release.

ahrtr · 2022-11-10T08:10:37Z

@ahrtr Thanks for looking into this, did you do anything specific to reproduce issue locally? With local reproduction can you validate if v3.4 and v3.5 is affected. If so we should delay v3.5.6 release.

FYI. #14724

serathius · 2022-11-10T13:33:05Z

Ok, I got a local repro. I run different etcd version that is not affected.
Looks like the cause is panic during etcd defrag. Issue was found thanks to defragBeforeRename and defragBeforeCopy failpoints added in #14687.

serathius · 2022-11-10T15:22:08Z

Confirmed that issue is reproducable on main branch and on v3.5.5 release. Looks like v3.4.22 is not affected. Will look into bisecting git history when I have time.

ahrtr · 2022-11-10T21:19:38Z

Looks like the cause is panic during etcd defrag. Issue was found thanks to defragBeforeRename and defragBeforeCopy

Confirmed. I have the same finding.

serathius · 2022-11-10T22:00:30Z

Git history bisection has pointed out 2dbecea as the cause.

ahrtr · 2022-11-10T22:51:39Z

The reason is that etcd reapplied duplicated entries,

Right now I can exclude raft from the criminal list. Most likely it's still caused by consistent index not updated in time

`unsafeCommit` is called by both `(*batchTxBuffered) commit` and `(*backend) defrag`. When users perform the defragmentation operation, etcd doesn't update the consistent index. If etcd crashes(e.g. panicking) in the process for whatever reason, then etcd replays the WAL entries starting from the latest snapshot, accordingly it may re-apply entries which might have already been applied, eventually the revision isn't consistent with other members. Refer to discussion in etcd-io#14685 Signed-off-by: Benjamin Wang <wachao@vmware.com>

ahrtr · 2022-11-11T03:11:22Z

Eventually I figured out the root cause, Please refer to #14730

ahrtr · 2022-11-11T05:42:02Z

FYI. https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency#how-to-workaround-this-issue

`unsafeCommit` is called by both `(*batchTxBuffered) commit` and `(*backend) defrag`. When users perform the defragmentation operation, etcd doesn't update the consistent index. If etcd crashes(e.g. panicking) in the process for whatever reason, then etcd replays the WAL entries starting from the latest snapshot, accordingly it may re-apply entries which might have already been applied, eventually the revision isn't consistent with other members. Refer to discussion in etcd-io#14685 Signed-off-by: Benjamin Wang <wachao@vmware.com>

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius · 2022-11-14T08:56:20Z

With fix merged with can also merge this change. ping @ahrtr

ahrtr · 2022-11-14T09:59:10Z

tests/linearizability/model.go

+	if state.Value == response.getData && state.LastRevision <= response.revision {
+		return true, state
+	}
+	if state.LastRevision >= response.revision {


Two successive read requests may have the same revision.

Suggested change

if state.LastRevision >= response.revision {

if state.LastRevision > response.revision {

I explained it above, this if statements doesn't cover successive reads. It covers cases where a PUT request has failed before the read.

ahrtr

Overall looks good to me.

I'll leave you to decide whether/how to resolve #14685 (comment)

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius · 2022-11-14T11:33:44Z

I'll leave you to decide whether/how to resolve #14685 (comment)

Resolved the comment and added tests for it.

spzala

Thanks @serathius lgtm

ahrtr · 2022-11-14T19:33:54Z

All the existing etcd 3.5 releases are affected, including 3.5.0 ~ 3.5.5.

Note: etcd 3.4 doesn't have this issue. Note that there is no impact on etcd 3.5 either if performing the defragmentation operation offline using etcdutl.

FYI. https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency#how-to-workaround-this-issue

cc @dims @liggitt @deads2k @lavalamp

`unsafeCommit` is called by both `(*batchTxBuffered) commit` and `(*backend) defrag`. When users perform the defragmentation operation, etcd doesn't update the consistent index. If etcd crashes(e.g. panicking) in the process for whatever reason, then etcd replays the WAL entries starting from the latest snapshot, accordingly it may re-apply entries which might have already been applied, eventually the revision isn't consistent with other members. Refer to discussion in etcd-io#14685 Signed-off-by: Benjamin Wang <wachao@vmware.com>

ahrtr reviewed Nov 4, 2022

View reviewed changes

tests/linearizability/model.go Outdated Show resolved Hide resolved

chaochn47 reviewed Nov 4, 2022

View reviewed changes

tests/linearizability/model.go Outdated Show resolved Hide resolved

serathius force-pushed the linearizability-revision branch from c0947e9 to f5ea4f1 Compare November 6, 2022 14:50

ahrtr reviewed Nov 7, 2022

View reviewed changes

serathius force-pushed the linearizability-revision branch from f5ea4f1 to 9444106 Compare November 8, 2022 11:20

serathius force-pushed the linearizability-revision branch 4 times, most recently from 1b9542c to 63825db Compare November 8, 2022 15:35

ahrtr reviewed Nov 9, 2022

View reviewed changes

serathius force-pushed the linearizability-revision branch from 63825db to e000a9f Compare November 9, 2022 08:38

serathius mentioned this pull request Nov 10, 2022

Introduce etcd linearizability tests #14045

Closed

33 tasks

serathius force-pushed the linearizability-revision branch 2 times, most recently from 525a2ea to ac78db2 Compare November 10, 2022 12:59

serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 10, 2022

ahrtr mentioned this pull request Nov 11, 2022

etcdserver: call the OnPreCommitUnsafe in unsafeCommit #14730

Merged

ahrtr mentioned this pull request Nov 11, 2022

[3.5] etcdserver: call the OnPreCommitUnsafe in unsafeCommit #14733

Merged

serathius changed the title ~~Linearizability revision~~ Data inconsistency caused by panic during defrag Nov 11, 2022

serathius changed the title ~~Data inconsistency caused by panic during defrag~~ Revision inconsistency caused by panic during defrag Nov 13, 2022

tests: Optimize checking failed writes

d5a6d25

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius force-pushed the linearizability-revision branch from ac78db2 to b24e4ea Compare November 13, 2022 12:36

ahrtr reviewed Nov 14, 2022

View reviewed changes

serathius force-pushed the linearizability-revision branch from b24e4ea to c1ed370 Compare November 14, 2022 10:27

ahrtr approved these changes Nov 14, 2022

View reviewed changes

tests: Add revision to etcd linearizability model

ff6c93f

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius force-pushed the linearizability-revision branch from c1ed370 to ff6c93f Compare November 14, 2022 11:21

serathius added the stage/merge-when-tests-green label Nov 14, 2022

serathius merged commit ca8baeb into etcd-io:main Nov 14, 2022

serathius mentioned this pull request Nov 14, 2022

Release v3.5.6 #14750

Closed

22 tasks

spzala reviewed Nov 14, 2022

View reviewed changes

ptabor mentioned this pull request Dec 6, 2022

Refactor etcd model #14902

Merged

serathius mentioned this pull request Dec 7, 2022

tests: Reproduce issue 14685 #14909

Closed

chaochn47 mentioned this pull request Apr 10, 2023

After workernode restart, cluster started behaving improperly. #15579

Closed

serathius deleted the linearizability-revision branch June 15, 2023 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revision inconsistency caused by panic during defrag #14685

Revision inconsistency caused by panic during defrag #14685

serathius commented Nov 4, 2022 •

edited

Loading

serathius commented Nov 4, 2022

codecov-commenter commented Nov 6, 2022 •

edited

Loading

serathius commented Nov 7, 2022

ahrtr Nov 7, 2022

serathius Nov 8, 2022

ahrtr Nov 8, 2022

serathius Nov 8, 2022

serathius commented Nov 8, 2022

ahrtr Nov 9, 2022

serathius Nov 9, 2022

ahrtr commented Nov 9, 2022

serathius commented Nov 9, 2022

ahrtr commented Nov 9, 2022

serathius commented Nov 9, 2022 •

edited

Loading

ahrtr commented Nov 10, 2022

serathius commented Nov 10, 2022 •

edited

Loading

ahrtr commented Nov 10, 2022

serathius commented Nov 10, 2022 •

edited

Loading

serathius commented Nov 10, 2022 •

edited

Loading

ahrtr commented Nov 10, 2022

serathius commented Nov 10, 2022

ahrtr commented Nov 10, 2022

ahrtr commented Nov 11, 2022

ahrtr commented Nov 11, 2022

serathius commented Nov 14, 2022

ahrtr Nov 14, 2022

serathius Nov 14, 2022

ahrtr left a comment

serathius commented Nov 14, 2022

spzala left a comment

ahrtr commented Nov 14, 2022

	if state.LastRevision >= response.revision {
	if state.LastRevision > response.revision {

Revision inconsistency caused by panic during defrag #14685

Revision inconsistency caused by panic during defrag #14685

Conversation

serathius commented Nov 4, 2022 • edited Loading

serathius commented Nov 4, 2022

codecov-commenter commented Nov 6, 2022 • edited Loading

Codecov Report

serathius commented Nov 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius commented Nov 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr commented Nov 9, 2022

serathius commented Nov 9, 2022

ahrtr commented Nov 9, 2022

serathius commented Nov 9, 2022 • edited Loading

ahrtr commented Nov 10, 2022

serathius commented Nov 10, 2022 • edited Loading

ahrtr commented Nov 10, 2022

serathius commented Nov 10, 2022 • edited Loading

serathius commented Nov 10, 2022 • edited Loading

ahrtr commented Nov 10, 2022

serathius commented Nov 10, 2022

ahrtr commented Nov 10, 2022

ahrtr commented Nov 11, 2022

ahrtr commented Nov 11, 2022

serathius commented Nov 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahrtr left a comment

Choose a reason for hiding this comment

serathius commented Nov 14, 2022

spzala left a comment

Choose a reason for hiding this comment

ahrtr commented Nov 14, 2022

serathius commented Nov 4, 2022 •

edited

Loading

codecov-commenter commented Nov 6, 2022 •

edited

Loading

serathius commented Nov 9, 2022 •

edited

Loading

serathius commented Nov 10, 2022 •

edited

Loading

serathius commented Nov 10, 2022 •

edited

Loading

serathius commented Nov 10, 2022 •

edited

Loading