-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Restore Operation to SnapshotResiliencyTests #40634
Add Restore Operation to SnapshotResiliencyTests #40634
Conversation
* Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in elastic#39504
Pinging @elastic/es-distributed |
Jenkins run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left two questions.
And I wonder if we want to add "restore" operation to other tests in this class as well? For example, when emulating cluster disruption?
I also want to better understand the motivation for this PR.
Is it primarily here to be able more different kind of actions in future tests?
} | ||
)); | ||
})))); | ||
runUntil( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need 3 different atomic variables? Is not it enough just to have documentCountVerified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right ... not sure what I was thinking :) my bad fixed in c064db3
server/src/test/java/org/elasticsearch/snapshots/SnapshotResiliencyTests.java
Outdated
Show resolved
Hide resolved
@andrershov The motivation for this is to basically get tests that verify that we're functional with an eventually consistent blob store (the code for that is already written in #39504) and for that we really want/need to verify that the snapshots can actually be restored. Especially in light of the fact that currently we suspect failures from eventually consistent list operations that silently corrupt the repository (#38941).
Well the other ones kinda reproduce broken snapshots and we mainly cared about the fact that these snapshots finished back when we added them since the bugs resolved at the time led to various scenarios where snapshots just got stuck forever. |
@original-brownbear so the idea is to extend testSuccessfulSnapshot with restore and call it testSuccessfulSnapshotAndRestore. And the next PR will add eventual consistent BlobStore emulation. So we can just run testSuccessfulSnapshotAndRestore for long enough time to make sure it does not fail with the eventual consistent store. Do I understand correctly, that currently, we expect it to fail or do we already have logic to work around eventual consistency? |
Yea, exactly.
Correct as well.
Correct as well here too. Currently the mock eventually consistent store fails yes, but I have a fix for all cases the test reproduces (tests are done too in the other PR I linked) in https://github.com/original-brownbear/elasticsearch/tree/delete-lock-via-cs (95% done, would open a PR for that once we have the tests in place). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks @andrershov ! |
…leniency * elastic/master: SQL: Fix deserialisation issue of TimeProcessor (elastic#40776) Improve GCS docs for using keystore (elastic#40605) Add Restore Operation to SnapshotResiliencyTests (elastic#40634) Small refactorings to analysis components (elastic#40745) SQL: Fix display size for DATE/DATETIME (elastic#40669) add HLRC protocol tests for transform state and stats (elastic#40766) Inline TransportReplAction#registerRequestHandlers (elastic#40762) remove experimental label from search_as_you_type documentation (elastic#40744) Remove some abstractions from `TransportReplicationAction` (elastic#40706) Upgrade to latest build scan plugin (elastic#40702) Use default memory lock setting in testing (elastic#40730) Add Bulk Delete Api to BlobStore (elastic#40322) Remove yaml skips older than 7.0 (elastic#40183) Docs: Move id in the java-api (elastic#40748)
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in elastic#39504
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in #39504
* Add Restore Operation to SnapshotResiliencyTests * Expand the successful snapshot test case to also include restoring the snapshop * Add indexing of documents as well to be able to meaningfully verify the restore * This is part of the larger effort to test eventually consistent blob stores in elastic#39504
Note: I had to add putting a mapping before indexing documents here since dynamic mapping updates trigger a blocking operation that blocks the tests. A fix for that is incoming in #39793