Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the backend again after recovering v3 backend from snapshot #13500

Merged
merged 1 commit into from
Dec 21, 2021

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Nov 24, 2021

Fix issues/13494.

When etcd recovers v3 backend from a snapshot, then it closes the old backend, see backend.go#L107. So we should set the backend for the consistentIndex again in this case.

This PR should be back ported to 3.5 as well, will submit a separate PR soon.

cc @ptabor @serathius @jingyih

@ahrtr ahrtr force-pushed the reset_ci_after_reload_db branch from dcec6c7 to a0a1468 Compare November 24, 2021 21:29
@ahrtr
Copy link
Member Author

ahrtr commented Nov 24, 2021

/kind bug

@serathius
Copy link
Member

Could you possibly add a test that would prevent the problem in the future?

@ahrtr ahrtr force-pushed the reset_ci_after_reload_db branch 2 times, most recently from 453fb91 to 79aa860 Compare November 25, 2021 22:17
@ahrtr
Copy link
Member Author

ahrtr commented Nov 25, 2021

Could you possibly add a test that would prevent the problem in the future?

@serathius Thanks for the comment. Added a unit test.

@ahrtr
Copy link
Member Author

ahrtr commented Nov 25, 2021

Should I update the CHANGELOG-3.6 ?

@KelvinHongSea
Copy link

I have an another question,If that's the problem,it happens 100 percent,but the problematic endpoint soon returned to normal,how did this happen ?

@ahrtr
Copy link
Member Author

ahrtr commented Nov 26, 2021

I have an another question,If that's the problem,it happens 100 percent,but the problematic endpoint soon returned to normal,how did this happen ?

Because the original db file has already been replaced with the snapshot db.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, Overall LGTM.
Thanks for adding tests
cc @hexfusion

@ahrtr
Copy link
Member Author

ahrtr commented Nov 26, 2021

Some nits, Overall LGTM. Thanks for adding tests cc @hexfusion

@serathius Thanks for the comments. All resolved.

@ahrtr ahrtr force-pushed the reset_ci_after_reload_db branch from b1d7b68 to 6136c2a Compare November 30, 2021 23:33
@ahrtr
Copy link
Member Author

ahrtr commented Nov 30, 2021

Just updated the CHANGELOG-3.6 as well.

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this.

I'm a little concerned (independent from this CL) why functional (https://github.com/etcd-io/etcd/tree/main/tests/functional) tests had not catched this.
Failure injection should lead to recovery from snapshot from time to time
(but I assume this issue triggers only if 2 conditions are met at the same time:

  • the task was lagging enough to need to recover from snapshot (and not WAL) from peer)
  • the task failed after downloading the file, but before removing it.
    And we might not have such combination.

server/etcdserver/bootstrap_test.go Outdated Show resolved Hide resolved
server/etcdserver/bootstrap_test.go Outdated Show resolved Hide resolved
server/etcdserver/bootstrap_test.go Outdated Show resolved Hide resolved
@ahrtr ahrtr force-pushed the reset_ci_after_reload_db branch from 6136c2a to 1682ce6 Compare December 1, 2021 18:49
@ahrtr
Copy link
Member Author

ahrtr commented Dec 1, 2021

@ptabor resolved all your comments. Thanks.

I can take a look at the functional test and check whether it's possible to add a case to cover this scenario in a separate PR.

@ahrtr ahrtr force-pushed the reset_ci_after_reload_db branch from 1682ce6 to 7be1464 Compare December 2, 2021 21:52
@ahrtr
Copy link
Member Author

ahrtr commented Dec 2, 2021

Just rebased.

@ptabor
Copy link
Contributor

ptabor commented Dec 3, 2021

Thank you. Will merge when the tests are green.

@ptabor ptabor merged commit 5b0bb07 into etcd-io:main Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

etcd3.5.0: assertion failed: tx closed
4 participants